Lexical Data Management helps (with SIL software)

This is a quick note to record some of the things I have learned this week about working with lexical data within SIL's software options.

  1. There is information scattered all over the place:
  2. What should the purpose of the websites be? to distribute the product or to build community around the product's existence?

Software Needs for a Language Documentation Project

In this post I take a look at some of the software needs of a language documentation team. One of my ongoing concerns of linguistic software development teams (like SIL International's Palaso or LSDev, or MPI's archive software group, or a host of other niche software products adapted from main stream open-source projects) is the approach they take in communicating how to use the various elements of their software together to create useful workflows for linguists participating in field research on minority languages. Many of these software development teams do not take the approach that potential software users coming to their website want to be oriented to how these software solutions work together to solve specific problems in the language documentation problem space. Now, it is true that every language documentation program is different and will have different goals and outputs, but many of these goals are the same across projects. New users to software want to know top level organizational assumptions made by software developers. That is, they want to evaluate how software will work in a given scenario (problem space) and to understand and make informed decisions based on the eco-system that the software will lead them into. This is not too unlike users asking which is better Android or iPhone, and then deciding what works not just with a given device but where they will buy their music, their digital books, and how they will get those digital assets to a new device, when the phone they are about to buy no-longer serves them. These digital consequences are not in the mind of every consumer... but they are nonetheless real consequences.
Continue reading

Audio Dominant Texts and Text Dominant Audio

As linguistics and language documentation interface with digital humanities there has been a lot of effort to time-align texts and audio/video materials. At one level this is rather trivial to do and has the backing of comercial media processes like subtitles in movies. However, at another level this task is often done in XML for every project (digital corpus curation) slightly differently. At the macro-scale the argument is that if the annotation of the audio is in XML and someone wants to do something else with it, then they can just convert the XML to whatever schema they desire. This is true.

However, one antidotal point that I have not heard in discussion of time aligned texts is specifications for Audio Dominant Text vs. Text Dominant Audio. This may not initially seem very important, so let me explain what I mean.
Continue reading

The Look of Language Archive Websites

This the start of a cross-language archive look at the current state of UX design presenting Content generated in Language Documentation.

http://www.rnld.org/archives
http://www.mpi.nl/DOBES/language_archives

http://paradisec.org.au/
http://repository.digiarch.sinica.edu.tw/index.jsp?lang=en

http://alma.matrix.msu.edu/

http://www.thlib.org/

http://www.thlib.org/

http://www.ailla.utexas.org/site/welcome.html

Reflections on CRASSH

In July I presented a paper at CRASSH in Cambridge. It was a small conference, but being in Europe it was good to see many of the various kinds of projects which are going on in Digital Humanities and Linguists, or also Cloud Computing and Linguistics. One particular project, TypeCraft, stands out as being rather well done and promising was presented by Dorothee Beermann Hellan. I think the ideas presented in this project are well thought out and seem to be well implemented. It would be nice to see this product integrated with some other linguistics and language documentation cloud offerings. i.e. Project Lego from the Linguist’s List or the Max Planck Institute’s LEXUS project. While TypeCraft does allow for round tripping of data with XML, what I am talking about is a consolidated User Experience for both professional linguists and for Minority language users.

A note on foundational technologies:

  • It appears that Lexus is is built on BaseX with Cocoon and XML.
  • The front page of TypeCraft has a very Wikipedia like feel, but this might not be the true foundational technology.
  • Linguist’s List often does their work in ColdFusion and the LEGO project definitely has this feel about it.

Types of Linguistic Maps: The Mapping of linguistic Features and Researcher Interactivity

A couple of years ago I had a chance meeting with a cartographer in North Dakota. It was interesting because he asked us (a group of linguists) What is a language or linguistic map? So, I grabbed a few examples and put them into a brief for him. This past January at the LSA meeting in Portland, Oregon, I had several interesting conversations with the folks at the LL-Map Project under Linguists’ List. It occurred to me that such a presentation of various kinds of language maps might be useful to a larger audience. So this will be a bit unpolished but should show a wide selection of language and linguistic based maps, and in the last section I will also talk a bit about interactive maps. Continue reading

Remoteness Index

For the last few weeks I have been thinking about how can one measure the impact on a language due to a language communities' contact with other languages. I have been looking for ways that remoteness has been measured in the past. I recently ran across a note on my iPhone from when I was in Mexico dated March 8, 2011.

A metric for measuring the language language shift, contact, and relatedness of indigenous languages of Mexico

  • The formation of aerial features
  • Population density
  • Trade and social networks
  • Political affiliation
  • Geographic factors
  • Roads travel opportunities

I remember writing this note: I was standing in front of a topographical map showing terrain regions. This map also had the language areas of Mexico outlined. It occurred to me (having also recently had a conversation with a local anthropologist on the matter of trade routes and mountain passes) that as a factor in language endangerment that these sorts of factors should be accounted for and if it can be accounted for then it should also be able to be graphed (on a map of course). The major issue being that if one just plots a language area without showing population/speaker density in that area then the viewer of that map will get a warped view of the language situation. Population density also does not solely infer where language attrition will likely not occur. And language contact does not automatically happen on the edges of a language area. That is to say, in a country with mountain passes, there will likely be more language contact in the passes as various groups travel to market than in higher elevated mountain villages. This leads to the issue of language diffusion and the representation of language diffusion. But the issue is not just one of language diffusion, it is also one of population diffusion, and population mobility and accessibility to various areas. So in terms of projecting, assessing and plotting language vitality, considering remoteness should be part of the equation. But remoteness is not just a factor on its own, it is more of an index considering the issues mentioned above but specifically considering the issues of geographical remoteness and considering the issues of social remoteness (or contact, even with other villages and cities in the same language and ethnic communities).

I am not currently aware of any index, much less a project which plots this index to a geographical area. However, I have found some previous work worth mentioning which might be related and relevant.

Modeling Language Diffusion With ArcGIS

There is an interesting paper and project on modeling language diffusion with ArcGIS. It was prepared for Worldmap.org by Christopher Deckert in 2004 and presented at the 24th ESRI users conference. [1]Christopher Deckert. 2004. Modeling Language Diffusion With ArcGIS. Paper published in the proceedings of the 24th Annual Esri International User Conference, August 9–13, 2004.  … Continue reading

Remote Areas of the World

The magazine NewScientist has an article from April 2009 [2]Caroline Williams. 20 April 2009. NewScientist. Where's the remotest place on Earth?. http://www.newscientist.com/article/mg20227041.500-wheres-the-remotest-place-on-earth.html. [Link] [Accessed: 27 … Continue reading about the Remotes places in the world it has several maps and abstractions showing how remote (with reference to travel time) places in the world are. The following maps come from the NewScientist article.

Map showing the access ability from one point to another.

Map showing the access ability from one point to another.

Detail of roads in west Africa

Detail of roads in west Africa

Nowhere three weeks from anywhere

Map showing the remoteness of the Tibetan Plateau

The ASGC Remoteness Structure

Another promising resource I found is the ASGC Remoteness Structure which Australia has developed to show how remote parts of Australia are. There is a series of papers explaining the methods behind the algorithms used and the purpose of the study. One of the outputs was the map below. [3]Commonwealth Department of Health and Aged Care. 2001, Measuring Remoteness: Accessibility/Remoteness Index of Australia (ARIA), Revised Edition, Occasional Papers: New Series No. 14 [PDF] [Link] … Continue reading

Australia Remoteness map

Australia Remoteness Map

The Territoriality of Public Health Governance in Mexico

The last resource I am going to mention here is The Territoriality of Public Health Governance in Mexico. A study which plots the Remoteness of Health Care in Mexico. [4] Alberto Díaz-Cayeros and Justin Levitt. August 30, 2011. The Territoriality of Public Health Governance in Mexico. http://irps.ucsd.edu/assets/001/502971.pdf [PDF] [Accessed: 12 February 2012]

References

References
1 Christopher Deckert. 2004. Modeling Language Diffusion With ArcGIS. Paper published in the proceedings of the 24th Annual Esri International User Conference, August 9–13, 2004. http://proceedings.esri.com/library/userconf/proc04/docs/pap1071.pdf [PDF] [Accessed: 27 February 2011]
2 Caroline Williams. 20 April 2009. NewScientist. Where's the remotest place on Earth?. http://www.newscientist.com/article/mg20227041.500-wheres-the-remotest-place-on-earth.html. [Link] [Accessed: 27 February 2011]
3 Commonwealth Department of Health and Aged Care. 2001, Measuring Remoteness: Accessibility/Remoteness Index of Australia (ARIA), Revised Edition, Occasional Papers: New Series No. 14 [PDF] [Link] [Accessed: 2 February 2012]
4 Alberto Díaz-Cayeros and Justin Levitt. August 30, 2011. The Territoriality of Public Health Governance in Mexico. http://irps.ucsd.edu/assets/001/502971.pdf [PDF] [Accessed: 12 February 2012]