This week I have been outlining the types of data that linguists need to be able to use and relate to each other as they do Language Documentation and Linguistic Research. I try to express these things graphically and then also express where some of the leading tools which SIL International is offering sit in the problem space.
The Data Management Space for linguists with SIL software.
I have been playing around with data available from the iPhone (and also separately visualizing Map data).
I came across a project, iPhoneTracker which was done to show iPhone users the kind of data that the iPhone collects about a users travel and whereabouts. I downloaded the app and ran it. Looks like about a complete history since I activated the phone… The interesting thing for me was that this app did not collect the data from my phone directly but rather from my computer.
A couple of years ago I had a chance meeting with a cartographer in North Dakota. It was interesting because he asked us (a group of linguists) What is a language or linguistic map? So, I grabbed a few examples and put them into a brief for him. This past January at the LSA meeting in Portland, Oregon, I had several interesting conversations with the folks at the LL-Map Project under Linguists’ List. It occurred to me that such a presentation of various kinds of language maps might be useful to a larger audience. So this will be a bit unpolished but should show a wide selection of language and linguistic based maps, and in the last section I will also talk a bit about interactive maps. Continue reading →
For one of the web projects I am working in we have been throwing around the idea of having a world map as a navigation element. Each country would then be clickable. This kind of navigation has been done with hyperlinked bitmaps like the LL-Map project.
I have not seen any implementations in HTML5 canvas or in SVG. It occurs to me that these technologies could be used. I am not deeply familiar with either technology. So I did some googling.
I found some interesting articles on the matter.
I am not sure that I have any answers but this is my thought towards the problem space.
There is one map of languages I have found which deserves to be mentioned. I am not sure of the technology used but it seems it would be either of these methods. It is the map of the Languages of California hosted at Berkeley.
For the last few weeks I have been thinking about how can one measure the impact on a language due to a language communities' contact with other languages. I have been looking for ways that remoteness has been measured in the past. I recently ran across a note on my iPhone from when I was in Mexico dated March 8, 2011.
A metric for measuring the language language shift, contact, and relatedness of indigenous languages of Mexico
The formation of aerial features
Trade and social networks
Roads travel opportunities
I remember writing this note: I was standing in front of a topographical map showing terrain regions. This map also had the language areas of Mexico outlined. It occurred to me (having also recently had a conversation with a local anthropologist on the matter of trade routes and mountain passes) that as a factor in language endangerment that these sorts of factors should be accounted for and if it can be accounted for then it should also be able to be graphed (on a map of course). The major issue being that if one just plots a language area without showing population/speaker density in that area then the viewer of that map will get a warped view of the language situation. Population density also does not solely infer where language attrition will likely not occur. And language contact does not automatically happen on the edges of a language area. That is to say, in a country with mountain passes, there will likely be more language contact in the passes as various groups travel to market than in higher elevated mountain villages. This leads to the issue of language diffusion and the representation of language diffusion. But the issue is not just one of language diffusion, it is also one of population diffusion, and population mobility and accessibility to various areas. So in terms of projecting, assessing and plotting language vitality, considering remoteness should be part of the equation. But remoteness is not just a factor on its own, it is more of an index considering the issues mentioned above but specifically considering the issues of geographical remoteness and considering the issues of social remoteness (or contact, even with other villages and cities in the same language and ethnic communities).
I am not currently aware of any index, much less a project which plots this index to a geographical area. However, I have found some previous work worth mentioning which might be related and relevant.
Modeling Language Diffusion With ArcGIS
There is an interesting paper and project on modeling language diffusion with ArcGIS. It was prepared for Worldmap.org by Christopher Deckert in 2004 and presented at the 24th ESRI users conference.
Remote Areas of the World
The magazine NewScientist has an article from April 2009 about the Remotes places in the world it has several maps and abstractions showing how remote (with reference to travel time) places in the world are. The following maps come from the NewScientist article.
Map showing the access ability from one point to another.
Detail of roads in west Africa
Map showing the remoteness of the Tibetan Plateau
The ASGC Remoteness Structure
Another promising resource I found is the ASGC Remoteness Structure which Australia has developed to show how remote parts of Australia are. There is a series of papers explaining the methods behind the algorithms used and the purpose of the study. One of the outputs was the map below.
Australia Remoteness Map
The Territoriality of Public Health Governance in Mexico
The last resource I am going to mention here is The Territoriality of Public Health Governance in Mexico. A study which plots the Remoteness of Health Care in Mexico.
This paper is motivated by an experience in collecting, analyzing, and then redeploying (sharing while making relevant to other corporate SIL functions) corporate intellectual assets. These assets are relevant to both products SIL products and services and corporate processes. This paper attempts to document some of the current challenges presented to the SIL staff person as well as present some items for consideration in overcoming these challenges. Continue reading →
The Ethnologue as an academic book, is somewhat of a straw man in linguistics. Many people who write grants for language documentation projects (generally on under described or endangered languages) will cite the Ethnologue and some other resources or lack of resources . These efforts seeking funding are usually an effort to get more language data. The rationale for this is two fold:
Because so little is known that we do not know if the Ethnologue is correct.
Because there is a conflict between other published sources and the Ethnologue .
There is a myriad of difficulties in overlaying language data with geographical data. But it has be done and can be done. While I was working in México on a language documentation project, I learned that some of the language mixing (not quite diglossia, rather the living of two people groups with different languages in the same spaces) was due geographical factors and economical factors pulling them into the same geographic locations. In the particular case I am thinking of there was a mountain pass and a valley on the way to the major center of trade. In this sort of context the interesting things are displayed not when a polygon is drawn showing a territorial overlay of where various language speakers living, but where something is drawn showing what the density or population dispersion per general population is. Some of the most detailed (in terms of global perspective) language maps can be found in the Ethnologue .
Western Central Mexico from the Ethnologue
However, as I was working on the language documentation project I found out how much effort actually goes into that sort of map. ArcGIS, the software used to create the maps can not auto-generate a polygon a certain distance around a combined set of given points. A set of points can be selected and each point can get a 5 mile radius. What this means is that each polygon has to be hand drawn. This sort of graphical overly that is used in the the Ethnologue does not show the density of speakers of a language in an area relative to the total population (in the Ethnologue’s defense I am not sure it is supposed to). For instance, if I wanted to know “What is the density of speakers in the Me’phaa area of México relative to speakers of other languages?” that would show me some dispersion, and by implication the peopling of the area. This sort of geographical overlay may be closer to displaying social networks, not really bilingualism or diglossia. There might be some bilinguals or some average level of bilingualism there, but the heat map method of plotting is looking still at the density of speakers to an area. A simular map might be created of New York City where certain languages are given a color based on their distribution density in the area. Additionally, these sorts of data overlays are probably more prone to lend insights on language attrition patterns or language speaker migration patterns. Also these hand drawn polygons change (a little) from edition to edition. Because the data used to create the polygons is not referenced (cited) it is hard to tell if the change is keeping pace with language attrition and/or population movement or if the changes are due to a better linguistic understanding in a particular area. When looking at the large area maps in the Ethnologue, it is hard to tell if the red dots represent “traditional” language area (or geographical center thereof) or if the points represent the current geographical center of the speaking area. Either way the plotting functions as if it were a heat map showing the diversity of languages over a geographical area.
Americas Map from the Ethnologue
I am generally on the look out for web apps and APIs which can be used to overlay data to bring new insights to situations through graphical representations. I recently found a tool for overlaying data on Google Maps. This tool creates heat maps given data from another source. This tool is called gHeat. This tool was brough to my attention by Been O’Steen as he modified gHeat to display some prices for student properties in the UK. My initial thought was: “Wow how can we do language maps like this?”
Student Property Heat Map
Obviously I still think that language based heat maps could prove to provide language workers world wide access to visualizations of data that could really add clarity to the language vitality situation.
The importance of knowing about the Datum recently came to my attention as I was working with GIS data on a Language Documentation project. We were collecting GPS coordinates with a handheld GPS unit and comparing these coordinates with data supplied by the national cartographic office. End goal was to compare data samples collected with conclusions proposed by the national cartographic office.
So, what am I talking about?
GIS data is used in a Geographical Information System. Basically, you can think of maps and what you might want to show with a map: rivers, towns, roads, language features, dialect markers, etc. Well, maps are shapes superimposed with a grid. And coordinates are a way of naming where on a particular grid a given point is located.