Real Data, Live Data, Not just Ethnologue maps

Posted on May 15, 2014 by Hugh Paterson III

There have been several interesting projects which have created language use visualizations over the last few years. The Ethnologue project produces a particular kind of visualization. In the past I have talked about the need to socialize and make the data which the Ethnologue apps are based on more accurate to WGS 84. I talk about that need in two places, on insite here: Geographical Data and on my non-insite blog: https://hugh.thejourneyler.org/2012/some-current-challenges-in-using-gis-information-in-the-sil-international-corporate-knowledge-system/

There are several challenges with the basic assumptions put forward with the current Ethnologue visualizations.

they project a language homogeny which is not necessarily accurate to real life.
they project a geographical display which is not indicative of real language use. That is language use may actually be in digital mediums which can not be heard at certain locations.
Ethnologue maps make no overt claims about digital communications devices and their use by minority language speakers, however, my feeling in general is that SIL (especially in our training programs) does not assume a digital device using minority language user.

One of the tools which SIL could use to inform its business intelligence is the language of use in digital social mediums. For instance Wikipedia allows any ISO 639-3 language community to form their own wikipedia. This means that all of the IP edits are recorded and public. This also means that that would give us a language use location based on IP addresses. This can then be super imposed on additional data collected from Geo-enabled tweets. With such information, prior to a survey the pre survey data available about language use (in certain contexts) just got more interesting. – if of course survey is about questions of language use.

Some people have taken to mapping Wikipedia edits. Such a map shows that there are a lot of people in a lot of places, speakers of minority languages included, who are able to edit content centrally hosted like that which is found on wikipedia. Here is a map created from the English language wikipedia, which is available from http://www.dailydot.com/society/wikipedia-conflict-map-flame-wars/.

As I state previously, the homogeneity of language use within a given geographical region is difficult to map. There are questions of speaker population density, and questions of social environments. While the Ethnologue maps are very detailed in terms of their global scope one of the challenges for this kind of visualization is expressing diversity. Below is a map of language diversity based on tweets in New York City. The power of using tweets to measure the linguistic diversity of a region is that tweets are usually connected between two or more people and reveals the social connection between those people. This is a powerful bit of information. SIL could leverage this data in several ways, one way would be to make this data available to its scripture use partners. Language may not always be a barrier to understanding the gospel but I have yet to see it not be an inroad to a relationships in and through which the gospel can not be shown or presented.

Language Diversity as demonstrated on twitter

Image from http://ny.spatial.ly/

Image from FirstMonday Journal article. http://firstmonday.org/ojs/index.php/fm/article/view/4366/3654

If our conceptualization about language and its geographical distribution is at all reflected in the way that we look at Ethonlogue maps then we can often miss the wide distribution that many language communities have. For instance this language map show the use of Irish as twitter users are using it. Notice that the language is not bound to Ireland.

Irish language Twitter conversations, Kevin Scannell (CC-BY-SA) http://indigenoustweets.blogspot.com/2013/12/mapping-celtic-twittersphere.html

Something fantastic with Webonary data

The UK data explorer has a very interesting set up using a powerful (free and open) visualization software tool called D3.js The tool allows you to type in a word and see how it is spelled in a variety of languages. It uses Google Translate Check it out here: http://ukdataexplorer.com/european-translator/?word=man

WordPress is equally capable to serve up Webonary data if it is configured correctly.

Man Across Europe

Some other thoughts on linguistic cartography and the display of language vitality.

Back in 2011 Lars Huttar and I played around with a heat mapping JavaScript tool called gheat. The idea was to plot the heavily populated towns with a higher gradient than lower populated towns based on speaker population densities I had from Mexican statistics data. The idea was to incorporate two important aspects of analysis, remoteness and vitality. I talk about remoteness on my blog here: https://hugh.thejourneyler.org/2012/remoteness-index/, and I talk about my the visualization here: https://hugh.thejourneyler.org/2011/language-maps-like-heat-maps/. The data may not be perfect, but it was a start. The paper has not gone anywhere since that time. I still have the draft paper, and would like to pursue this with a co-author. If there is someone else who might be interested please comment, I can give more details and the Paterson & Hutter paper draft.

If you just like looking at language maps you might enjoy this post: https://hugh.thejourneyler.org/2012/types-of-linguistic-maps-the-mapping-of-linguistic-features/

One final thought

Here is an interesting set of maps for language use. While the Enthologue maps first language use, second language remains a mystery. These efforts are trying to add visualizations to the second most popularly spoken language for a geographical region.

A second way to look at the earth is what are the places? This as been a recent hot topic in the Language Documentation circles. However, on the single language level there may or may not be a lot of interesting information to a lot of people. However, to look at the earth by which languages are taking about certain places is interesting. One point of large interaction for this conversation is wikipedia.

http://tracemedia.co.uk/portfolio/mapping-wikipedia/

Learning to make Polygons in Google Earth

Posted on December 1, 2013 by Hugh Paterson III

Today I am messing around and making KML and GPX files from our trip to Nigeria.

Reading: https://developers.google.com/kml/documentation/kml_tut#polygons and http://projects.visualstudies.duke.edu/isismapping/sites/default/files/isisguides/earthguide.pdf.
Watching:

http://www.youtube.com/watch?v=OGGpTqkbCWo

I hope to take some of our photos, a polygon of the language area, and our GPX route traces and overlay them on an Open Street Map page in WordPress.

The one thing I don't think is possible with GoogleEarth is to move polygons. I created them and then they were about a mile off, so I just wanted to move them... not possible. - to my knowledge.

The Data Management Space for Linguists

Posted on October 10, 2012 by Hugh Paterson III

This week I have been outlining the types of data that linguists need to be able to use and relate to each other as they do Language Documentation and Linguistic Research. I try to express these things graphically and then also express where some of the leading tools which SIL International is offering sit in the problem space.

The Data Management Space for linguists with SIL software.

iPhone geo-data

Posted on April 28, 2012 by Hugh Paterson III

I have been playing around with data available from the iPhone (and also separately visualizing Map data).

I came across a project, iPhoneTracker which was done to show iPhone users the kind of data that the iPhone collects about a users travel and whereabouts. I downloaded the app and ran it. Looks like about a complete history since I activated the phone… The interesting thing for me was that this app did not collect the data from my phone directly but rather from my computer.

iPhone location history from my iPhone

Types of Linguistic Maps: The Mapping of linguistic Features and Researcher Interactivity

Posted on March 22, 2012 by Hugh Paterson III

A couple of years ago I had a chance meeting with a cartographer in North Dakota. It was interesting because he asked us (a group of linguists) What is a language or linguistic map? So, I grabbed a few examples and put them into a brief for him. This past January at the LSA meeting in Portland, Oregon, I had several interesting conversations with the folks at the LL-Map Project under Linguists’ List. It occurred to me that such a presentation of various kinds of language maps might be useful to a larger audience. So this will be a bit unpolished but should show a wide selection of language and linguistic based maps, and in the last section I will also talk a bit about interactive maps. Continue reading →

World Map Navigation

Posted on March 11, 2012 by Hugh Paterson III

For one of the web projects I am working in we have been throwing around the idea of having a world map as a navigation element. Each country would then be clickable. This kind of navigation has been done with hyperlinked bitmaps like the LL-Map project.

LL-Map Bitmap

Or with flash like the Joshua project.

Joshua Project Front page with World Map

I have not seen any implementations in HTML5 canvas or in SVG. It occurs to me that these technologies could be used. I am not deeply familiar with either technology. So I did some googling.
I found some interesting articles on the matter.

Performance of SVG vs. Canvas ^[1] Boris Smus. 19 January 2009. Performance of Canvas versus SVG. http://smus.com/canvas-vs-svg-performance [Link] [Accessed: 4 March 2012]
How to Choose Between Canvas and SVG ^[2] Patrick Dengler. 28 September 2011. How to Choose Between Canvas and SVG. http://www.sitepoint.com/how-to-choose-between-canvas-and-svg/#fbid=6CJz-eeIXxl [Link] [Accessed: 4 March 2012]
SVG or Canvas? Сhoosing between the two ^[3] Mihai Sucan. 4 February 2010. SVG or Canvas? Сhoosing between the two. http://dev.opera.com/articles/view/svg-or-canvas-choosing-between-the-two/. [Link] [Accessed: 4 March 2012]
CanVG: Using Canvas to render SVG files ^[4] 29 March 2010. CanVG: Using Canvas to render SVG files. http://ajaxian.com/archives/canvg-using-canvas-to-render-svg-files [Link] [Accessed: 4 March 2012]

I am not sure that I have any answers but this is my thought towards the problem space.

There is one map of languages I have found which deserves to be mentioned. I am not sure of the technology used but it seems it would be either of these methods. It is the map of the Languages of California hosted at Berkeley.

California Languages Map

References[+]

References
↑1	Boris Smus. 19 January 2009. Performance of Canvas versus SVG. http://smus.com/canvas-vs-svg-performance [Link] [Accessed: 4 March 2012]
↑2	Patrick Dengler. 28 September 2011. How to Choose Between Canvas and SVG. http://www.sitepoint.com/how-to-choose-between-canvas-and-svg/#fbid=6CJz-eeIXxl [Link] [Accessed: 4 March 2012]
↑3	Mihai Sucan. 4 February 2010. SVG or Canvas? Сhoosing between the two. http://dev.opera.com/articles/view/svg-or-canvas-choosing-between-the-two/. [Link] [Accessed: 4 March 2012]
↑4	29 March 2010. CanVG: Using Canvas to render SVG files. http://ajaxian.com/archives/canvg-using-canvas-to-render-svg-files [Link] [Accessed: 4 March 2012]

Remoteness Index

Posted on February 27, 2012 by Hugh Paterson III

For the last few weeks I have been thinking about how can one measure the impact on a language due to a language communities' contact with other languages. I have been looking for ways that remoteness has been measured in the past. I recently ran across a note on my iPhone from when I was in Mexico dated March 8, 2011.

A metric for measuring the language language shift, contact, and relatedness of indigenous languages of Mexico

The formation of aerial features

Population density

Trade and social networks

Political affiliation

Geographic factors

Roads travel opportunities

I remember writing this note: I was standing in front of a topographical map showing terrain regions. This map also had the language areas of Mexico outlined. It occurred to me (having also recently had a conversation with a local anthropologist on the matter of trade routes and mountain passes) that as a factor in language endangerment that these sorts of factors should be accounted for and if it can be accounted for then it should also be able to be graphed (on a map of course). The major issue being that if one just plots a language area without showing population/speaker density in that area then the viewer of that map will get a warped view of the language situation. Population density also does not solely infer where language attrition will likely not occur. And language contact does not automatically happen on the edges of a language area. That is to say, in a country with mountain passes, there will likely be more language contact in the passes as various groups travel to market than in higher elevated mountain villages. This leads to the issue of language diffusion and the representation of language diffusion. But the issue is not just one of language diffusion, it is also one of population diffusion, and population mobility and accessibility to various areas. So in terms of projecting, assessing and plotting language vitality, considering remoteness should be part of the equation. But remoteness is not just a factor on its own, it is more of an index considering the issues mentioned above but specifically considering the issues of geographical remoteness and considering the issues of social remoteness (or contact, even with other villages and cities in the same language and ethnic communities).

I am not currently aware of any index, much less a project which plots this index to a geographical area. However, I have found some previous work worth mentioning which might be related and relevant.

Modeling Language Diffusion With ArcGIS

There is an interesting paper and project on modeling language diffusion with ArcGIS. It was prepared for Worldmap.org by Christopher Deckert in 2004 and presented at the 24^th ESRI users conference. ^[1]Christopher Deckert. 2004. Modeling Language Diffusion With ArcGIS. Paper published in the proceedings of the 24th Annual Esri International User Conference, August 9–13, 2004. … Continue reading

Remote Areas of the World

The magazine NewScientist has an article from April 2009 ^[2]Caroline Williams. 20 April 2009. NewScientist. Where's the remotest place on Earth?. http://www.newscientist.com/article/mg20227041.500-wheres-the-remotest-place-on-earth.html. [Link] [Accessed: 27 … Continue reading about the Remotes places in the world it has several maps and abstractions showing how remote (with reference to travel time) places in the world are. The following maps come from the NewScientist article.

Map showing the access ability from one point to another.

Detail of roads in west Africa

Map showing the remoteness of the Tibetan Plateau

The ASGC Remoteness Structure

Another promising resource I found is the ASGC Remoteness Structure which Australia has developed to show how remote parts of Australia are. There is a series of papers explaining the methods behind the algorithms used and the purpose of the study. One of the outputs was the map below. ^[3]Commonwealth Department of Health and Aged Care. 2001, Measuring Remoteness: Accessibility/Remoteness Index of Australia (ARIA), Revised Edition, Occasional Papers: New Series No. 14 [PDF] [Link] … Continue reading

Australia Remoteness Map

The Territoriality of Public Health Governance in Mexico

The last resource I am going to mention here is The Territoriality of Public Health Governance in Mexico. A study which plots the Remoteness of Health Care in Mexico. ^[4] Alberto Díaz-Cayeros and Justin Levitt. August 30, 2011. The Territoriality of Public Health Governance in Mexico. http://irps.ucsd.edu/assets/001/502971.pdf [PDF] [Accessed: 12 February 2012]

References[+]

References
↑1	Christopher Deckert. 2004. Modeling Language Diffusion With ArcGIS. Paper published in the proceedings of the 24th Annual Esri International User Conference, August 9–13, 2004. http://proceedings.esri.com/library/userconf/proc04/docs/pap1071.pdf [PDF] [Accessed: 27 February 2011]
↑2	Caroline Williams. 20 April 2009. NewScientist. Where's the remotest place on Earth?. http://www.newscientist.com/article/mg20227041.500-wheres-the-remotest-place-on-earth.html. [Link] [Accessed: 27 February 2011]
↑3	Commonwealth Department of Health and Aged Care. 2001, Measuring Remoteness: Accessibility/Remoteness Index of Australia (ARIA), Revised Edition, Occasional Papers: New Series No. 14 [PDF] [Link] [Accessed: 2 February 2012]
↑4	Alberto Díaz-Cayeros and Justin Levitt. August 30, 2011. The Territoriality of Public Health Governance in Mexico. http://irps.ucsd.edu/assets/001/502971.pdf [PDF] [Accessed: 12 February 2012]

Some current challenges in using GIS Information in the SIL International Corporate Knowledge System

Posted on February 23, 2012 by Hugh Paterson III

Preface

This paper is motivated by an experience in collecting, analyzing, and then redeploying (sharing while making relevant to other corporate SIL functions) corporate intellectual assets. These assets are relevant to both products SIL products and services and corporate processes. This paper attempts to document some of the current challenges presented to the SIL staff person as well as present some items for consideration in overcoming these challenges.
Continue reading →

Ethnologue: the linguistic straw-man

Posted on February 21, 2012 by Hugh Paterson III

The Ethnologue ^[1] M. Paul Lewis. (ed.), 2009. Ethnologue: Languages of the World, 16th Edn. Dallas, Tex.: SIL International. as an academic book, is somewhat of a straw man in linguistics. Many people who write grants for language documentation projects (generally on under described or endangered languages) will cite the Ethnologue and some other resources or lack of resources ^[2] Steven A. Marlett. 2011. Documenting the Me’phaa genus. DEH-NEH fellowship proposal. http://www.neh.gov/grants/guidelines/pdf/DEL_NEH_Marlett.pdf. [PDF] [DEL Awards] [Accessed: 15 February 2011] ^[3] Sadaf Munshi. 2011. Archive of Annotated Burushaski Texts. NSF grant proposal. http://www.neh.gov/grants/guidelines/pdf/DEL_NSF_Munshi.pdf. [PDF] [DEL Awards] [Accessed: 15 February 2011] ^[4]Monica A. Macaulay. 2011. Potawatomi Documentation, Lexical Database, and Dictionary. NEH grant proposal. http://www.neh.gov/grants/guidelines/pdf/DEL_NEH_Macaulay.pdf. [PDF] [DEL Awards] [Accessed: … Continue reading . These efforts seeking funding are usually an effort to get more language data. The rationale for this is two fold:

Because so little is known that we do not know if the Ethnologue is correct.
Because there is a conflict between other published sources and the Ethnologue ^[5]Roger Blench. n.d. Introduction to the Temein languages http://www.rogerblench.info/Language/Nilo-Saharan/Eastern%20Sudanic/Temein%20cluster/Blench%20Temein%20language%20NM%20proceedings.pdf [PDF] … Continue reading .

Continue reading →

References[+]

References
↑1	M. Paul Lewis. (ed.), 2009. Ethnologue: Languages of the World, 16th Edn. Dallas, Tex.: SIL International.
↑2	Steven A. Marlett. 2011. Documenting the Me’phaa genus. DEH-NEH fellowship proposal. http://www.neh.gov/grants/guidelines/pdf/DEL_NEH_Marlett.pdf. [PDF] [DEL Awards] [Accessed: 15 February 2011]
↑3	Sadaf Munshi. 2011. Archive of Annotated Burushaski Texts. NSF grant proposal. http://www.neh.gov/grants/guidelines/pdf/DEL_NSF_Munshi.pdf. [PDF] [DEL Awards] [Accessed: 15 February 2011]
↑4	Monica A. Macaulay. 2011. Potawatomi Documentation, Lexical Database, and Dictionary. NEH grant proposal. http://www.neh.gov/grants/guidelines/pdf/DEL_NEH_Macaulay.pdf. [PDF] [DEL Awards] [Accessed: 15 February 2011]
↑5	Roger Blench. n.d. Introduction to the Temein languages http://www.rogerblench.info/Language/Nilo-Saharan/Eastern%20Sudanic/Temein%20cluster/Blench%20Temein%20language%20NM%20proceedings.pdf [PDF] [Accessed: 15 February 2011]

Language maps like heat maps

Posted on September 18, 2011 by Hugh Paterson III

There is a myriad of difficulties in overlaying language data with geographical data. But it has be done and can be done. While I was working in México on a language documentation project, I learned that some of the language mixing (not quite diglossia, rather the living of two people groups with different languages in the same spaces) was due geographical factors and economical factors pulling them into the same geographic locations. In the particular case I am thinking of there was a mountain pass and a valley on the way to the major center of trade. In this sort of context the interesting things are displayed not when a polygon is drawn showing a territorial overlay of where various language speakers living, but where something is drawn showing what the density or population dispersion per general population is. Some of the most detailed (in terms of global perspective) language maps can be found in the Ethnologue ^[1] Lewis, M. Paul (ed.). 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International. .

Western Central Mexico from the Ethnologue

However, as I was working on the language documentation project I found out how much effort actually goes into that sort of map. ArcGIS, the software used to create the maps can not auto-generate a polygon a certain distance around a combined set of given points. A set of points can be selected and each point can get a 5 mile radius. What this means is that each polygon has to be hand drawn. This sort of graphical overly that is used in the the Ethnologue does not show the density of speakers of a language in an area relative to the total population (in the Ethnologue’s defense I am not sure it is supposed to). For instance, if I wanted to know “What is the density of speakers in the Me’phaa area of México relative to speakers of other languages?” that would show me some dispersion, and by implication the peopling of the area. This sort of geographical overlay may be closer to displaying social networks, not really bilingualism or diglossia. There might be some bilinguals or some average level of bilingualism there, but the heat map method of plotting is looking still at the density of speakers to an area. A simular map might be created of New York City where certain languages are given a color based on their distribution density in the area. Additionally, these sorts of data overlays are probably more prone to lend insights on language attrition patterns or language speaker migration patterns. Also these hand drawn polygons change (a little) from edition to edition. Because the data used to create the polygons is not referenced (cited) it is hard to tell if the change is keeping pace with language attrition and/or population movement or if the changes are due to a better linguistic understanding in a particular area. When looking at the large area maps in the Ethnologue, it is hard to tell if the red dots represent “traditional” language area (or geographical center thereof) or if the points represent the current geographical center of the speaking area. Either way the plotting functions as if it were a heat map showing the diversity of languages over a geographical area.

Americas Map from the Ethnologue

gHeat

I am generally on the look out for web apps and APIs which can be used to overlay data to bring new insights to situations through graphical representations. I recently found a tool for overlaying data on Google Maps. This tool creates heat maps given data from another source. This tool is called gHeat. This tool was brough to my attention by Been O’Steen as he modified gHeat to display some prices for student properties ^[4] Ben O’Steen. 2011. Student Property Heatmap. Random Hacks: Hacks, code and other things. [Accessed: 2 September 2011] http://benosteen.wordpress.com/2011/07/26/student-property-heatmap . [Link] in the UK. My initial thought was: “Wow how can we do language maps like this?”

Student Property Heat Map

Obviously I still think that language based heat maps could prove to provide language workers world wide access to visualizations of data that could really add clarity to the language vitality situation.

References[+]

References
↑1	Lewis, M. Paul (ed.). 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International.
↑2	Map of Languages in Western Mexico in the Ethnologue. [Accessed: 9 September 2011] http://www.ethnologue.com/show_map.asp?name=MX&seq=30. [Link]
↑3	Map of Languages in the Americas in the Ethnologue. [Accessed: 9 September 2011] http://www.ethnologue.com/show_map.asp?name=Americas&seq=10. [Link]
↑4	Ben O’Steen. 2011. Student Property Heatmap. Random Hacks: Hacks, code and other things. [Accessed: 2 September 2011] http://benosteen.wordpress.com/2011/07/26/student-property-heatmap . [Link]

The Journeyler

A walk through: Life, Leadership, Linguistics, Language Documentation, WordPress, and OS X (and a bit of Marketing & Business Administration)

Category Archives: Cartography

Real Data, Live Data, Not just Ethnologue maps

Language Diversity as demonstrated on twitter

Something fantastic with Webonary data

Some other thoughts on linguistic cartography and the display of language vitality.

One final thought

Learning to make Polygons in Google Earth

The Data Management Space for Linguists

iPhone geo-data

Types of Linguistic Maps: The Mapping of linguistic Features and Researcher Interactivity

World Map Navigation

Remoteness Index

A metric for measuring the language language shift, contact, and relatedness of indigenous languages of Mexico

Modeling Language Diffusion With ArcGIS

Remote Areas of the World

The ASGC Remoteness Structure

The Territoriality of Public Health Governance in Mexico

Ethnologue: the linguistic straw-man

Language maps like heat maps

gHeat