A Story Breeds A Story

While I was in Malaysia, I had the honor to meet and talk to quite a bit with Professor Emeritus Howard McKaughan. We talked a about his linguistics based work in Mexico, the Philippines, and in Malaysia. He can tell stories, interesting stories.

Howard - Story Telling

Howard - Story Telling

There is something unique about his generation of Americans (currently in their 80s and 90s). It is their ability to craft and tell stories. I feel that this is a cultural point I don’t have. It could be because I am third culture, or because I talk to much of the macro-details, or it might simply be because I am long winded.
Continue reading

Language maps like heat maps

There is a myriad of difficulties in overlaying language data with geographical data. But it has be done and can be done. While I was working in México on a language documentation project, I learned that some of the language mixing (not quite diglossia, rather the living of two people groups with different languages in the same spaces) was due geographical factors and economical factors pulling them into the same geographic locations. In the particular case I am thinking of there was a mountain pass and a valley on the way to the major center of trade. In this sort of context the interesting things are displayed not when a polygon is drawn showing a territorial overlay of where various language speakers living, but where something is drawn showing what the density or population dispersion per general population is. Some of the most detailed (in terms of global perspective) language maps can be found in the Ethnologue [1] Lewis, M. Paul (ed.). 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International. .

Western Central Mexico from the Ethnologue

Western Central Mexico from the Ethnologue

However, as I was working on the language documentation project I found out how much effort actually goes into that sort of map. ArcGIS, the software used to create the maps can not auto-generate a polygon a certain distance around a combined set of given points. A set of points can be selected and each point can get a 5 mile radius. What this means is that each polygon has to be hand drawn. This sort of graphical overly that is used in the the Ethnologue [2] Map of Languages in Western Mexico in the Ethnologue. [Accessed: 9 September 2011] http://www.ethnologue.com/show_map.asp?name=MX&seq=30. [Link] does not show the density of speakers of a language in an area relative to the total population (in the Ethnologue’s defense I am not sure it is supposed to). For instance, if I wanted to know “What is the density of speakers in the Me’phaa area of México relative to speakers of other languages?” that would show me some dispersion, and by implication the peopling of the area. This sort of geographical overlay may be closer to displaying social networks, not really bilingualism or diglossia. There might be some bilinguals or some average level of bilingualism there, but the heat map method of plotting is looking still at the density of speakers to an area. A simular map might be created of New York City where certain languages are given a color based on their distribution density in the area. Additionally, these sorts of data overlays are probably more prone to lend insights on language attrition patterns or language speaker migration patterns. Also these hand drawn polygons change (a little) from edition to edition. Because the data used to create the polygons is not referenced (cited) it is hard to tell if the change is keeping pace with language attrition and/or population movement or if the changes are due to a better linguistic understanding in a particular area. When looking at the large area maps in the Ethnologue, [3] Map of Languages in the Americas in the Ethnologue. [Accessed: 9 September 2011] http://www.ethnologue.com/show_map.asp?name=Americas&seq=10. [Link] it is hard to tell if the red dots represent “traditional” language area (or geographical center thereof) or if the points represent the current geographical center of the speaking area. Either way the plotting functions as if it were a heat map showing the diversity of languages over a geographical area.

Americas Map from the Ethnologue

Americas Map from the Ethnologue

gHeat

I am generally on the look out for web apps and APIs which can be used to overlay data to bring new insights to situations through graphical representations. I recently found a tool for overlaying data on Google Maps. This tool creates heat maps given data from another source. This tool is called gHeat. This tool was brough to my attention by Been O’Steen as he modified gHeat to display some prices for student properties [4] Ben O’Steen. 2011. Student Property Heatmap. Random Hacks: Hacks, code and other things. [Accessed: 2 September 2011] http://benosteen.wordpress.com/2011/07/26/student-property-heatmap . [Link] in the UK. My initial thought was: “Wow how can we do language maps like this?”

Student Property Heat Map

Student Property Heat Map

Obviously I still think that language based heat maps could prove to provide language workers world wide access to visualizations of data that could really add clarity to the language vitality situation.

References

References
1 Lewis, M. Paul (ed.). 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International.
2 Map of Languages in Western Mexico in the Ethnologue. [Accessed: 9 September 2011] http://www.ethnologue.com/show_map.asp?name=MX&seq=30. [Link]
3 Map of Languages in the Americas in the Ethnologue. [Accessed: 9 September 2011] http://www.ethnologue.com/show_map.asp?name=Americas&seq=10. [Link]
4 Ben O’Steen. 2011. Student Property Heatmap. Random Hacks: Hacks, code and other things. [Accessed: 2 September 2011] http://benosteen.wordpress.com/2011/07/26/student-property-heatmap . [Link]

Impossible English Grammar

While I was in Mexico, I was walking to the store with a friend, who is also a fellow linguistics student. He was telling me a story. In the course of that story a naturally occurring sentence flowed “out of his mouth”. After he said that sentence I let him finish his thought and I asked him if the sentence was gramatical.

Here is the sentence:

Yesterday, I saw the latin version of one of my friend’s husbands in Sorinana.Sorinana chain of stores in Mexico.

My contention was that the “s” on “husbands” was ungrammatical.

Of course, if the sentence is read:

Yesterday, I saw the latin version of one of my friend’s husband in Sorinana.

The sentence sounds awkward. Perhaps it is not a well formed sentence. But is it ungrammatical? What is the violation which makes the sentence sound awkward? Is it the constrained unit [one of my friend’s] which is embedded in another gramatical unit, which is apparently unconstrained [the latin version of…]?

We tried to move the gramatical units around and did not find a satisfying solution.

Yesterday, I saw the latin version of the husband of one of my friends in Soriana.

Yesterday, I saw the latin version of one of my friend’s husband in Soriana.

Yesterday, I saw a man who looked like my friend’s husband in Soriana.

Yesterday, I saw a man who could have passed as the latin version of one of my friend’s husband in Soriana.

Yesterday, I saw a man who could have passed as the latin version of the husband of one of my friends in Soriana.

Yesterday, in Soriana, I saw a latino version of my friend’s husband.

Yesterday, I saw a the latin version of the husband of one of my friends in Soriana.

Yesterday, I saw in Soriana the latin version of the husband of one of my friends.

All this variation in options of for information ordering has led me to ask three questions of English:

  1. How is Time, Manner and place naturally ordered in English?
  2. What is the prominent element of information in each option and why?
  3. What are the Elements?

SSH, Unix commands & RegEx

This summer I am sitting in on a computational linguistics course. It is the first instruction I have had about UNIX. Pretty Awesome.
This has required me to do some googling looking from terminal commands.

This is kind of a sketch of where I have been.

UNIX:
http://www.osxfaq.com/Tutorials/LearningCenter/

SSH:
http://kimmo.suominen.com/docs/ssh/
http://ss64.com/osx/

TERMINAL:
http://homepage.mac.com/rgriff/files/TerminalBasics.pdf

grep:
http://www.thegeekstuff.com/2009/03/15-practical-unix-grep-command-examples/
http://en.wikipedia.org/wiki/Grep
http://www.computerhope.com/unix/ugrep.htm

Regular Expressions:
http://www.zytrax.com/tech/web/regex.htm
http://www.regular-expressions.info/tutorial.html
http://gnosis.cx/publish/programming/regular_expressions.html

RegEx and Unicode:
One of the issues that I have had with RegEx has been what is a natural class? i.e. [A-Z], [A-Za-z], [0-9], etc. As a linguist I deal a lot with IPA characters, subscripts, superscripts, unicode, and diacritics. How am I to define a natural class with these? Can I define a natural class based on the phonology of the language?

So I did some more searching:
http://unicode.org/reports/tr18/
http://unicode.org/reports/tr18/tr18-5.1.html
http://icu-project.org/docs/papers/iuc26_regexp.pdf
http://courses.ischool.berkeley.edu/i256/f06/papers/regexps_tutorial.pdf
http://wapedia.mobi/en/Regular_expression?t=5.

RegEx+PERL+Unicode:
http://perldoc.perl.org/perlretut.html

PERL:
http://www.enginsite.com/Library-Perl-Regular-Expressions-Tutorial.htm
http://www.cgi101.com/book/connect/mac.html
http://www.mactech.com/articles/mactech/Vol.18/18.09/PerlforMacOSX/index.html

Python:
http://www.amk.ca/python/howto/regex/