Real Data, Live Data, Not just Ethnologue maps

There have been several interesting projects which have created language use visualizations over the last few years. The Ethnologue project produces a particular kind of visualization. In the past I have talked about the need to socialize and make the data which the Ethnologue apps are based on more accurate to WGS 84. I talk about that need in two places, on insite here: Geographical Data and on my non-insite blog: https://hugh.thejourneyler.org/2012/some-current-challenges-in-using-gis-information-in-the-sil-international-corporate-knowledge-system/

There are several challenges with the basic assumptions put forward with the current Ethnologue visualizations. 

  1. they project a language homogeny which is not necessarily accurate to real life.
  2. they project a geographical display which is not indicative of real language use. That is language use may actually be in digital mediums which can not be heard at certain locations. 
  3. Ethnologue maps make no overt claims about digital communications devices and their use by minority language speakers, however, my feeling in general is that SIL (especially in our training programs) does not assume a digital device using minority language user.

One of the tools which SIL could use to inform its business intelligence is the language of use in digital social mediums. For instance Wikipedia allows any ISO 639-3 language community to form their own wikipedia. This means that all of the IP edits are recorded and public. This also means that that would give us a language use location based on IP addresses. This can then be super imposed on additional data collected from Geo-enabled tweets. With such information, prior to a survey the pre survey data available about language use (in certain contexts) just got more interesting. – if of course survey is about questions of language use. 

Some people have taken to mapping Wikipedia edits. Such a map shows that there are a lot of people in a lot of places, speakers of minority languages included, who are able to edit content centrally hosted like that which is found on wikipedia. Here is a map created from the English language wikipedia, which is available from http://www.dailydot.com/society/wikipedia-conflict-map-flame-wars/.

As I state previously, the homogeneity of language use within a given geographical region is difficult to map. There are questions of speaker population density, and questions of social environments.  While the Ethnologue maps are very detailed in terms of their global scope one of the challenges for this kind of visualization is expressing diversity. Below is a map of language diversity based on tweets in New York City. The power of using tweets to measure the linguistic diversity of a region is that tweets are usually connected between two or more people and reveals the social connection between those people. This is a powerful bit of information. SIL could leverage this data in several ways, one way would be to make this data available to its scripture use partners. Language may not always be a barrier to understanding the gospel but I have yet to see it not be an inroad to a relationships in and through which the gospel can not be shown or presented.

Language Diversity as demonstrated on twitter

Image from http://ny.spatial.ly/

If our conceptualization about language and its geographical distribution is at all reflected in the way that we look at Ethonlogue maps then we can often miss the wide distribution that many language communities have. For instance this language map show the use of Irish as twitter users are using it. Notice that the language is not bound to Ireland.

Irish language Twitter conversations, Kevin Scannell (CC-BY-SA) http://indigenoustweets.blogspot.com/2013/12/mapping-celtic-twittersphere.html

Something fantastic with Webonary data

The UK data explorer has a very interesting set up using a powerful (free and open) visualization software tool called D3.js The tool allows you to type in a word and see how it is spelled in a variety of languages. It uses Google Translate Check it out here: http://ukdataexplorer.com/european-translator/?word=man

WordPress is equally capable to serve up Webonary data if it is configured correctly.

Man Across Europe

Some other thoughts on linguistic cartography and the display of language vitality.

Back in 2011 Lars Huttar and I played around with a heat mapping JavaScript tool called gheat. The idea was to plot the heavily populated towns with a higher gradient than lower populated towns based on speaker population densities I had from Mexican statistics data. The idea was to incorporate two important aspects of analysis, remoteness and vitality. I talk about remoteness on my blog here: https://hugh.thejourneyler.org/2012/remoteness-index/, and I talk about my the visualization here: https://hugh.thejourneyler.org/2011/language-maps-like-heat-maps/. The data may not be perfect, but it was a start. The paper has not gone anywhere since that time. I still have the draft paper, and would like to pursue this with a co-author. If there is someone else who might be interested please comment, I can give more details and the Paterson & Hutter paper draft.

If you just like looking at language maps you might enjoy this post: https://hugh.thejourneyler.org/2012/types-of-linguistic-maps-the-mapping-of-linguistic-features/

One final thought

Here is an interesting set of maps for language use. While the Enthologue maps first language use, second language remains a mystery. These efforts are trying to add visualizations to the second most popularly spoken language for a geographical region.

A second way to look at the earth is what are the places? This as been a recent hot topic in the Language Documentation circles. However, on the single language level there may or may not be a lot of interesting information to a lot of people. However, to look at the earth by which languages are taking about certain places is interesting. One point of large interaction for this conversation is wikipedia.

Lexical Database Archiving Questionnaire

Featured

It's true!

I am asking around on different mailing lists to gain some insight into the archiving habits of linguists who use lexical databases. I am specifically interested in databases created by tools like FLEx, ToolBox, Lexus, TshwaneLex, etc.

Background Story Continue reading

Lexical Data Management helps (with SIL software)

This is a quick note to record some of the things I have learned this week about working with lexical data within SIL's software options.

  1. There is information scattered all over the place:
  2. What should the purpose of the websites be? to distribute the product or to build community around the product's existence?

Software Needs for a Language Documentation Project

In this post I take a look at some of the software needs of a language documentation team. One of my ongoing concerns of linguistic software development teams (like SIL International's Palaso or LSDev, or MPI's archive software group, or a host of other niche software products adapted from main stream open-source projects) is the approach they take in communicating how to use the various elements of their software together to create useful workflows for linguists participating in field research on minority languages. Many of these software development teams do not take the approach that potential software users coming to their website want to be oriented to how these software solutions work together to solve specific problems in the language documentation problem space. Now, it is true that every language documentation program is different and will have different goals and outputs, but many of these goals are the same across projects. New users to software want to know top level organizational assumptions made by software developers. That is, they want to evaluate how software will work in a given scenario (problem space) and to understand and make informed decisions based on the eco-system that the software will lead them into. This is not too unlike users asking which is better Android or iPhone, and then deciding what works not just with a given device but where they will buy their music, their digital books, and how they will get those digital assets to a new device, when the phone they are about to buy no-longer serves them. These digital consequences are not in the mind of every consumer... but they are nonetheless real consequences.
Continue reading

Audio Dominant Texts and Text Dominant Audio

As linguistics and language documentation interface with digital humanities there has been a lot of effort to time-align texts and audio/video materials. At one level this is rather trivial to do and has the backing of comercial media processes like subtitles in movies. However, at another level this task is often done in XML for every project (digital corpus curation) slightly differently. At the macro-scale the argument is that if the annotation of the audio is in XML and someone wants to do something else with it, then they can just convert the XML to whatever schema they desire. This is true.

However, one antidotal point that I have not heard in discussion of time aligned texts is specifications for Audio Dominant Text vs. Text Dominant Audio. This may not initially seem very important, so let me explain what I mean.
Continue reading

Leaf in Meꞌphaa

When I was in México, working with a team doing language documentation we visited a community workshop where the community organizer was promoting the language through a dictionary creation effort. I was interesting to see the various bi-lingual teachers come together and discuss a proposed entry and the definition.

Meꞌphaa group working on dictionary

Meꞌphaa group working on dictionary

There were several interesting aspects of the social interaction: there was the political unity in the perception that they were all there for the good of their language, there was the social unity because they were mostly there because they were in state jobs as teachers or school administrators. But perhaps more socially significant was the perception that the workshop leader had skills in organizing a dictionary. (Nothing wrong with this perception and it is probably an accurate perception.) Yet, it was not the only perception which was at play in the social interactions. There was also the cultural age based and social ranking based way of coming to a consensus about what did a particular Meꞌphaa (or any given) word mean. It is kind of this unspoken tension between the eldest in the group who would culturally have the authority or provide a stamp of approval, the workshop "dictionary expert", and the average participant who has to decide if they agree or disagree with whom and if they are going to show it.
Continue reading

to Hospital

This interesting conversation took place on Facebook:

I wonder how it happened that in American English we say “…have to go to THE hospital”, but in British English they say, “…have to go to hospital”. – Trevor Lee Deck

When I say, “go to school”, it’s so general and it’s what everyone else is doing. But if I need to see my History teacher specifically I would say, “I’ll stop by the school to see him.” …but I can also choose to say, “I’ll stop by school to see him.”

Trevor Lee Deck Maybe British English speakers use hospital/the hospital the same way.

Lucy Baber: If we were writing it, it’s as if we would be saying, “I need go to go School” or “I’m going to Church”, like that’s the proper name of it. But then if we are stopping by the building for an informal purpose, it feels more like we are stopping by just the building and not the institution of it. Does that make sense? So in the case of the hospital, maybe we would say “I’m going to Hospital” if we were being admitted or having a procedure done, but I’m going to “the hospital” if we were visiting someone else or just picking up some results??

Elsen E. Portugal Yes, I always find that curious. . . . find us an answer, will you?

Trevor Lee Deck I like it, Lucy. But why do the British think of it differently than we Americans do? Because I don’t think an American English speaker would ever say “…to Hospital”.

Elsen E. Portugal Hmmmm, I’m wondering if perhaps the idea in the British mind is of an adjective with an understood noun, like: he is in hospital (care), in which case the article would be inappropriate. Plus, I think the establishment of ‘hospitals’ is younger than the colonization of the US. This divide probably split the meanings also, unlike school and church that have been terms used for much longer.

Trevor Lee Deck Elsen, I think you have the best answer yet. If they (even subconsciously) think of hospital as an adjective, then you’re right they’d never add a definite article. So this is a case of noun elision? Let’s think of another.

Josh Boyd or maybe a verb? like going to get schooled, going to do church, maybe going to the hospital is a phrase that implies the action being treated? or maybe I’m just really hung over and only think I make sense…

Jennifer Mann I say THE hospital… but then I am not pure British anymore, so who knows what is real and what is not!

Hugh Paterson III The Brits are more dative and americans are more Indirect object oriented.

Trevor Lee Deck Thanks, Hugh. Good observation. But do you have a suggestion as to why this could be? Why didn’t we bring that with us. It’s only been a few generations…?

Hugh Paterson IIIwell, some say that the Brits have innovated since the U.S. Colonies were established, and that in some respects we (in the U.S.) hold the older forms or pronunciations. But in this case I think we have innovated (I think without proof) but German, another germanic language like English, has dative prepositions, and they behave the same way as the British English. In German the gender and case is also shown on the preposition. Theses ideas were there in Old English, and some in middle English and today show up in our pronominal system. But when English stopped using case, it became harder to tell a dative object from an indirect object. If we look at how languages move from overt marking to syntactic ordering then there might be some answers there. If I had to take a stab at it from a cognitive perspective; there is this idea of motion and in indo-eurpean it is expressed with the dative. And if I tell someone am going somewhere, that Where should be defined as a place of mutual understanding between the two interlocutors. Maybe not in that particular sentence, but in their common experience. So, definiteness as it functions in English is not needed… and we get English phrases like “I am going to school” (even in the U.S.). There is only one school which is salient between the parties of the conversation. But if I am a detective looking for a fugitive, and I say to some of my team “you check the station” and others of my team “you go to school”. – That second part doesn’t work because the school is not common to the experience of the interlocutors. So, some of this is the difference between how as a culture we understand common experience, some how we express and use the idea of definiteness. If we as interlocutors want to express a more tight knit relational closeness with our interlocutor we might refer to things in a manner which infers more common experience than what is actually a fact. – I reject the idea that Brits think of Hospital as an adjective.