Real Data, Live Data, Not just Ethnologue maps

There have been several interesting projects which have created language use visualizations over the last few years. The Ethnologue project produces a particular kind of visualization. In the past I have talked about the need to socialize and make the data which the Ethnologue apps are based on more accurate to WGS 84. I talk about that need in two places, on insite here: Geographical Data and on my non-insite blog: https://hugh.thejourneyler.org/2012/some-current-challenges-in-using-gis-information-in-the-sil-international-corporate-knowledge-system/

There are several challenges with the basic assumptions put forward with the current Ethnologue visualizations. 

  1. they project a language homogeny which is not necessarily accurate to real life.
  2. they project a geographical display which is not indicative of real language use. That is language use may actually be in digital mediums which can not be heard at certain locations. 
  3. Ethnologue maps make no overt claims about digital communications devices and their use by minority language speakers, however, my feeling in general is that SIL (especially in our training programs) does not assume a digital device using minority language user.

One of the tools which SIL could use to inform its business intelligence is the language of use in digital social mediums. For instance Wikipedia allows any ISO 639-3 language community to form their own wikipedia. This means that all of the IP edits are recorded and public. This also means that that would give us a language use location based on IP addresses. This can then be super imposed on additional data collected from Geo-enabled tweets. With such information, prior to a survey the pre survey data available about language use (in certain contexts) just got more interesting. – if of course survey is about questions of language use. 

Some people have taken to mapping Wikipedia edits. Such a map shows that there are a lot of people in a lot of places, speakers of minority languages included, who are able to edit content centrally hosted like that which is found on wikipedia. Here is a map created from the English language wikipedia, which is available from http://www.dailydot.com/society/wikipedia-conflict-map-flame-wars/.

As I state previously, the homogeneity of language use within a given geographical region is difficult to map. There are questions of speaker population density, and questions of social environments.  While the Ethnologue maps are very detailed in terms of their global scope one of the challenges for this kind of visualization is expressing diversity. Below is a map of language diversity based on tweets in New York City. The power of using tweets to measure the linguistic diversity of a region is that tweets are usually connected between two or more people and reveals the social connection between those people. This is a powerful bit of information. SIL could leverage this data in several ways, one way would be to make this data available to its scripture use partners. Language may not always be a barrier to understanding the gospel but I have yet to see it not be an inroad to a relationships in and through which the gospel can not be shown or presented.

Language Diversity as demonstrated on twitter

Image from http://ny.spatial.ly/

If our conceptualization about language and its geographical distribution is at all reflected in the way that we look at Ethonlogue maps then we can often miss the wide distribution that many language communities have. For instance this language map show the use of Irish as twitter users are using it. Notice that the language is not bound to Ireland.

Irish language Twitter conversations, Kevin Scannell (CC-BY-SA) http://indigenoustweets.blogspot.com/2013/12/mapping-celtic-twittersphere.html

Something fantastic with Webonary data

The UK data explorer has a very interesting set up using a powerful (free and open) visualization software tool called D3.js The tool allows you to type in a word and see how it is spelled in a variety of languages. It uses Google Translate Check it out here: http://ukdataexplorer.com/european-translator/?word=man

WordPress is equally capable to serve up Webonary data if it is configured correctly.

Man Across Europe

Some other thoughts on linguistic cartography and the display of language vitality.

Back in 2011 Lars Huttar and I played around with a heat mapping JavaScript tool called gheat. The idea was to plot the heavily populated towns with a higher gradient than lower populated towns based on speaker population densities I had from Mexican statistics data. The idea was to incorporate two important aspects of analysis, remoteness and vitality. I talk about remoteness on my blog here: https://hugh.thejourneyler.org/2012/remoteness-index/, and I talk about my the visualization here: https://hugh.thejourneyler.org/2011/language-maps-like-heat-maps/. The data may not be perfect, but it was a start. The paper has not gone anywhere since that time. I still have the draft paper, and would like to pursue this with a co-author. If there is someone else who might be interested please comment, I can give more details and the Paterson & Hutter paper draft.

If you just like looking at language maps you might enjoy this post: https://hugh.thejourneyler.org/2012/types-of-linguistic-maps-the-mapping-of-linguistic-features/

One final thought

Here is an interesting set of maps for language use. While the Enthologue maps first language use, second language remains a mystery. These efforts are trying to add visualizations to the second most popularly spoken language for a geographical region.

A second way to look at the earth is what are the places? This as been a recent hot topic in the Language Documentation circles. However, on the single language level there may or may not be a lot of interesting information to a lot of people. However, to look at the earth by which languages are taking about certain places is interesting. One point of large interaction for this conversation is wikipedia.

Learning to make Polygons in Google Earth

Today I am messing around and making KML and GPX files from our trip to Nigeria.

Reading: https://developers.google.com/kml/documentation/kml_tut#polygons and http://projects.visualstudies.duke.edu/isismapping/sites/default/files/isisguides/earthguide.pdf.
Watching:

http://www.youtube.com/watch?v=OGGpTqkbCWo

I hope to take some of our photos, a polygon of the language area, and our GPX route traces and overlay them on an Open Street Map page in WordPress.

The one thing I don't think is possible with GoogleEarth is to move polygons. I created them and then they were about a mile off, so I just wanted to move them... not possible. - to my knowledge.

Types of Linguistic Maps: The Mapping of linguistic Features and Researcher Interactivity

A couple of years ago I had a chance meeting with a cartographer in North Dakota. It was interesting because he asked us (a group of linguists) What is a language or linguistic map? So, I grabbed a few examples and put them into a brief for him. This past January at the LSA meeting in Portland, Oregon, I had several interesting conversations with the folks at the LL-Map Project under Linguists’ List. It occurred to me that such a presentation of various kinds of language maps might be useful to a larger audience. So this will be a bit unpolished but should show a wide selection of language and linguistic based maps, and in the last section I will also talk a bit about interactive maps. Continue reading

World Map Navigation

For one of the web projects I am working in we have been throwing around the idea of having a world map as a navigation element. Each country would then be clickable. This kind of navigation has been done with hyperlinked bitmaps like the LL-Map project.

LL-Map Bitmap

LL-Map Bitmap

Or with flash like the Joshua project.
Joshua Project Front page with Map

Joshua Project Front page with World Map

I have not seen any implementations in HTML5 canvas or in SVG. It occurs to me that these technologies could be used. I am not deeply familiar with either technology. So I did some googling.
I found some interesting articles on the matter.

  • Performance of SVG vs. Canvas [1] Boris Smus. 19 January 2009. Performance of Canvas versus SVG. http://smus.com/canvas-vs-svg-performance [Link] [Accessed: 4 March 2012]
  • How to Choose Between Canvas and SVG [2] Patrick Dengler. 28 September 2011. How to Choose Between Canvas and SVG. http://www.sitepoint.com/how-to-choose-between-canvas-and-svg/#fbid=6CJz-eeIXxl [Link] [Accessed: 4 March 2012]
  • SVG or Canvas? Сhoosing between the two [3] Mihai Sucan. 4 February 2010. SVG or Canvas? Сhoosing between the two. http://dev.opera.com/articles/view/svg-or-canvas-choosing-between-the-two/. [Link] [Accessed: 4 March 2012]
  • CanVG: Using Canvas to render SVG files [4] 29 March 2010. CanVG: Using Canvas to render SVG files. http://ajaxian.com/archives/canvg-using-canvas-to-render-svg-files [Link] [Accessed: 4 March 2012]

I am not sure that I have any answers but this is my thought towards the problem space.

There is one map of languages I have found which deserves to be mentioned. I am not sure of the technology used but it seems it would be either of these methods. It is the map of the Languages of California hosted at Berkeley.

California Languages Map

California Languages Map

References

References
1 Boris Smus. 19 January 2009. Performance of Canvas versus SVG. http://smus.com/canvas-vs-svg-performance [Link] [Accessed: 4 March 2012]
2 Patrick Dengler. 28 September 2011. How to Choose Between Canvas and SVG. http://www.sitepoint.com/how-to-choose-between-canvas-and-svg/#fbid=6CJz-eeIXxl [Link] [Accessed: 4 March 2012]
3 Mihai Sucan. 4 February 2010. SVG or Canvas? Сhoosing between the two. http://dev.opera.com/articles/view/svg-or-canvas-choosing-between-the-two/. [Link] [Accessed: 4 March 2012]
4 29 March 2010. CanVG: Using Canvas to render SVG files. http://ajaxian.com/archives/canvg-using-canvas-to-render-svg-files [Link] [Accessed: 4 March 2012]

Remoteness Index

For the last few weeks I have been thinking about how can one measure the impact on a language due to a language communities' contact with other languages. I have been looking for ways that remoteness has been measured in the past. I recently ran across a note on my iPhone from when I was in Mexico dated March 8, 2011.

A metric for measuring the language language shift, contact, and relatedness of indigenous languages of Mexico

  • The formation of aerial features
  • Population density
  • Trade and social networks
  • Political affiliation
  • Geographic factors
  • Roads travel opportunities

I remember writing this note: I was standing in front of a topographical map showing terrain regions. This map also had the language areas of Mexico outlined. It occurred to me (having also recently had a conversation with a local anthropologist on the matter of trade routes and mountain passes) that as a factor in language endangerment that these sorts of factors should be accounted for and if it can be accounted for then it should also be able to be graphed (on a map of course). The major issue being that if one just plots a language area without showing population/speaker density in that area then the viewer of that map will get a warped view of the language situation. Population density also does not solely infer where language attrition will likely not occur. And language contact does not automatically happen on the edges of a language area. That is to say, in a country with mountain passes, there will likely be more language contact in the passes as various groups travel to market than in higher elevated mountain villages. This leads to the issue of language diffusion and the representation of language diffusion. But the issue is not just one of language diffusion, it is also one of population diffusion, and population mobility and accessibility to various areas. So in terms of projecting, assessing and plotting language vitality, considering remoteness should be part of the equation. But remoteness is not just a factor on its own, it is more of an index considering the issues mentioned above but specifically considering the issues of geographical remoteness and considering the issues of social remoteness (or contact, even with other villages and cities in the same language and ethnic communities).

I am not currently aware of any index, much less a project which plots this index to a geographical area. However, I have found some previous work worth mentioning which might be related and relevant.

Modeling Language Diffusion With ArcGIS

There is an interesting paper and project on modeling language diffusion with ArcGIS. It was prepared for Worldmap.org by Christopher Deckert in 2004 and presented at the 24th ESRI users conference. [1]Christopher Deckert. 2004. Modeling Language Diffusion With ArcGIS. Paper published in the proceedings of the 24th Annual Esri International User Conference, August 9–13, 2004.  … Continue reading

Remote Areas of the World

The magazine NewScientist has an article from April 2009 [2]Caroline Williams. 20 April 2009. NewScientist. Where's the remotest place on Earth?. http://www.newscientist.com/article/mg20227041.500-wheres-the-remotest-place-on-earth.html. [Link] [Accessed: 27 … Continue reading about the Remotes places in the world it has several maps and abstractions showing how remote (with reference to travel time) places in the world are. The following maps come from the NewScientist article.

Map showing the access ability from one point to another.

Map showing the access ability from one point to another.

Detail of roads in west Africa

Detail of roads in west Africa

Nowhere three weeks from anywhere

Map showing the remoteness of the Tibetan Plateau

The ASGC Remoteness Structure

Another promising resource I found is the ASGC Remoteness Structure which Australia has developed to show how remote parts of Australia are. There is a series of papers explaining the methods behind the algorithms used and the purpose of the study. One of the outputs was the map below. [3]Commonwealth Department of Health and Aged Care. 2001, Measuring Remoteness: Accessibility/Remoteness Index of Australia (ARIA), Revised Edition, Occasional Papers: New Series No. 14 [PDF] [Link] … Continue reading

Australia Remoteness map

Australia Remoteness Map

The Territoriality of Public Health Governance in Mexico

The last resource I am going to mention here is The Territoriality of Public Health Governance in Mexico. A study which plots the Remoteness of Health Care in Mexico. [4] Alberto Díaz-Cayeros and Justin Levitt. August 30, 2011. The Territoriality of Public Health Governance in Mexico. http://irps.ucsd.edu/assets/001/502971.pdf [PDF] [Accessed: 12 February 2012]

References

References
1 Christopher Deckert. 2004. Modeling Language Diffusion With ArcGIS. Paper published in the proceedings of the 24th Annual Esri International User Conference, August 9–13, 2004. http://proceedings.esri.com/library/userconf/proc04/docs/pap1071.pdf [PDF] [Accessed: 27 February 2011]
2 Caroline Williams. 20 April 2009. NewScientist. Where's the remotest place on Earth?. http://www.newscientist.com/article/mg20227041.500-wheres-the-remotest-place-on-earth.html. [Link] [Accessed: 27 February 2011]
3 Commonwealth Department of Health and Aged Care. 2001, Measuring Remoteness: Accessibility/Remoteness Index of Australia (ARIA), Revised Edition, Occasional Papers: New Series No. 14 [PDF] [Link] [Accessed: 2 February 2012]
4 Alberto Díaz-Cayeros and Justin Levitt. August 30, 2011. The Territoriality of Public Health Governance in Mexico. http://irps.ucsd.edu/assets/001/502971.pdf [PDF] [Accessed: 12 February 2012]

Some current challenges in using GIS Information in the SIL International Corporate Knowledge System

Preface

This paper is motivated by an experience in collecting, analyzing, and then redeploying (sharing while making relevant to other corporate SIL functions) corporate intellectual assets. These assets are relevant to both products SIL products and services and corporate processes. This paper attempts to document some of the current challenges presented to the SIL staff person as well as present some items for consideration in overcoming these challenges.
Continue reading

Language maps like heat maps

There is a myriad of difficulties in overlaying language data with geographical data. But it has be done and can be done. While I was working in México on a language documentation project, I learned that some of the language mixing (not quite diglossia, rather the living of two people groups with different languages in the same spaces) was due geographical factors and economical factors pulling them into the same geographic locations. In the particular case I am thinking of there was a mountain pass and a valley on the way to the major center of trade. In this sort of context the interesting things are displayed not when a polygon is drawn showing a territorial overlay of where various language speakers living, but where something is drawn showing what the density or population dispersion per general population is. Some of the most detailed (in terms of global perspective) language maps can be found in the Ethnologue [1] Lewis, M. Paul (ed.). 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International. .

Western Central Mexico from the Ethnologue

Western Central Mexico from the Ethnologue

However, as I was working on the language documentation project I found out how much effort actually goes into that sort of map. ArcGIS, the software used to create the maps can not auto-generate a polygon a certain distance around a combined set of given points. A set of points can be selected and each point can get a 5 mile radius. What this means is that each polygon has to be hand drawn. This sort of graphical overly that is used in the the Ethnologue [2] Map of Languages in Western Mexico in the Ethnologue. [Accessed: 9 September 2011] http://www.ethnologue.com/show_map.asp?name=MX&seq=30. [Link] does not show the density of speakers of a language in an area relative to the total population (in the Ethnologue’s defense I am not sure it is supposed to). For instance, if I wanted to know “What is the density of speakers in the Me’phaa area of México relative to speakers of other languages?” that would show me some dispersion, and by implication the peopling of the area. This sort of geographical overlay may be closer to displaying social networks, not really bilingualism or diglossia. There might be some bilinguals or some average level of bilingualism there, but the heat map method of plotting is looking still at the density of speakers to an area. A simular map might be created of New York City where certain languages are given a color based on their distribution density in the area. Additionally, these sorts of data overlays are probably more prone to lend insights on language attrition patterns or language speaker migration patterns. Also these hand drawn polygons change (a little) from edition to edition. Because the data used to create the polygons is not referenced (cited) it is hard to tell if the change is keeping pace with language attrition and/or population movement or if the changes are due to a better linguistic understanding in a particular area. When looking at the large area maps in the Ethnologue, [3] Map of Languages in the Americas in the Ethnologue. [Accessed: 9 September 2011] http://www.ethnologue.com/show_map.asp?name=Americas&seq=10. [Link] it is hard to tell if the red dots represent “traditional” language area (or geographical center thereof) or if the points represent the current geographical center of the speaking area. Either way the plotting functions as if it were a heat map showing the diversity of languages over a geographical area.

Americas Map from the Ethnologue

Americas Map from the Ethnologue

gHeat

I am generally on the look out for web apps and APIs which can be used to overlay data to bring new insights to situations through graphical representations. I recently found a tool for overlaying data on Google Maps. This tool creates heat maps given data from another source. This tool is called gHeat. This tool was brough to my attention by Been O’Steen as he modified gHeat to display some prices for student properties [4] Ben O’Steen. 2011. Student Property Heatmap. Random Hacks: Hacks, code and other things. [Accessed: 2 September 2011] http://benosteen.wordpress.com/2011/07/26/student-property-heatmap . [Link] in the UK. My initial thought was: “Wow how can we do language maps like this?”

Student Property Heat Map

Student Property Heat Map

Obviously I still think that language based heat maps could prove to provide language workers world wide access to visualizations of data that could really add clarity to the language vitality situation.

References

References
1 Lewis, M. Paul (ed.). 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International.
2 Map of Languages in Western Mexico in the Ethnologue. [Accessed: 9 September 2011] http://www.ethnologue.com/show_map.asp?name=MX&seq=30. [Link]
3 Map of Languages in the Americas in the Ethnologue. [Accessed: 9 September 2011] http://www.ethnologue.com/show_map.asp?name=Americas&seq=10. [Link]
4 Ben O’Steen. 2011. Student Property Heatmap. Random Hacks: Hacks, code and other things. [Accessed: 2 September 2011] http://benosteen.wordpress.com/2011/07/26/student-property-heatmap . [Link]