There is a myriad of difficulties in overlaying language data with geographical data. But it has be done and can be done. While I was working in México on a language documentation project, I learned that some of the language mixing (not quite diglossia, rather the living of two people groups with different languages in the same spaces) was due geographical factors and economical factors pulling them into the same geographic locations. In the particular case I am thinking of there was a mountain pass and a valley on the way to the major center of trade. In this sort of context the interesting things are displayed not when a polygon is drawn showing a territorial overlay of where various language speakers living, but where something is drawn showing what the density or population dispersion per general population is. Some of the most detailed (in terms of global perspective) language maps can be found in the Ethnologue[ref 1] .

Western Central Mexico from the Ethnologue

However, as I was working on the language documentation project I found out how much effort actually goes into that sort of map. ArcGIS, the software used to create the maps can not auto-generate a polygon a certain distance around a combined set of given points. A set of points can be selected and each point can get a 5 mile radius. What this means is that each polygon has to be hand drawn. This sort of graphical overly that is used in the the Ethnologue[ref 2] does not show the density of speakers of a language in an area relative to the total population (in the Ethnologue’s defense I am not sure it is supposed to). For instance, if I wanted to know “What is the density of speakers in the Me’phaa area of México relative to speakers of other languages?” that would show me some dispersion, and by implication the peopling of the area. This sort of geographical overlay may be closer to displaying social networks, not really bilingualism or diglossia. There might be some bilinguals or some average level of bilingualism there, but the heat map method of plotting is looking still at the density of speakers to an area. A simular map might be created of New York City where certain languages are given a color based on their distribution density in the area. Additionally, these sorts of data overlays are probably more prone to lend insights on language attrition patterns or language speaker migration patterns. Also these hand drawn polygons change (a little) from edition to edition. Because the data used to create the polygons is not referenced (cited) it is hard to tell if the change is keeping pace with language attrition and/or population movement or if the changes are due to a better linguistic understanding in a particular area. When looking at the large area maps in the Ethnologue,[ref 3] it is hard to tell if the red dots represent “traditional” language area (or geographical center thereof) or if the points represent the current geographical center of the speaking area. Either way the plotting functions as if it were a heat map showing the diversity of languages over a geographical area.

Americas Map from the Ethnologue

I am generally on the look out for web apps and APIs which can be used to overlay data to bring new insights to situations through graphical representations. I recently found a tool for overlaying data on Google Maps. This tool creates heat maps given data from another source. This tool is called gHeat. This tool was brough to my attention by Been O’Steen as he modified gHeat to display some prices for student properties[ref 4] in the UK. My initial thought was: “Wow how can we do language maps like this?”

Student Property Heat Map

Obviously I still think that language based heat maps could prove to provide language workers world wide access to visualizations of data that could really add clarity to the language vitality situation.


  1. Lewis, M. Paul (ed.). 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International.
  2. Map of Languages in Western Mexico in the Ethnologue. [Accessed: 9 September 2011] [Link]
  3. Map of Languages in the Americas in the Ethnologue. [Accessed: 9 September 2011] [Link]
  4. Ben O’Steen. 2011. Student Property Heatmap. Random Hacks: Hacks, code and other things. [Accessed: 2 September 2011] . [Link]

  1. I’ve thought about this as well, particularly since I’m interested in cartography. The challenge, I think, would be to get the data. It would take a lot of time with questionnaires and a GPS to get the data. The Atlas of North American English is probably the most sophistical dialect mapping project in recent years, but their sampling is still pretty coarse.

    Bilingualism and LWC would be especially helpful. As would geography, so that the physical barriers giving rise to the divisions could be appreciated.

    I sense a thesis topic! ;-) 

    • My problem is that I seem to be able to come up with enough thesis topics to outfit an army of graduate students.  Perhaps I should focus on getting the ability to be an thesis advisor….

  2. In some recent reevaluating of the heat map and languages landscape, I have been looking at several different technologies. I have been looking at open heat map on read write web. I have yet to actually play with the online service. There is evidently even a heatmap API in Google Maps, and it has been around since 2009.

    There are also some other projects which deserve note:

