SIL International has a survey service which operates across the globe in different administrative SIL units. I wonder, if the future of survey is no-longer looking at where indigenous people are living and what language variations they may have, but rather looking at where these people are going. Consider just the migrants from Nigeria according to lucify.com 89,032 Nigerians have immigrated towards Europe in the last 4 years. That is a lot of people. Where do those people come from? what languages do they speak? What linguistic load is being put on European governmental services? What could SIL offer to these governmental agencies? How could various social organizations benefit from SIL's often long standing work in the regions that these immigrants are coming from?
Just a quick thought.
Perception based loosely on facts:
A lot of language documentation money gets pushed towards endangered languages or languages with very few speakers. Is often endowed upon the aspiring academic, who may be promising to create a grammar for a previously un-written or undescribed language.
Sometimes I have the opportunity to read grammars. I read them and have questions about how the described data sounds. Both In context and as elicited. To that end I wonder if it wouldn't be money better spent for language documentation and benefit to the academy, if organizations funding language documentation research for the academy would rather fund the collection of audio texts and video texts of data already described in grammars. In a way provide the support that modern grammars should have.
That is, I find that often the state of grammars about languages (often about African languages) are so fraught with errors, or jaded with theoretical disposition, that it would be immensely helpful if these grammars were supported with audio texts. It seems that the focus on small, often dying, languages, requiring an impetus of "adequate" endangerment for funding, shows a pre-disposition to try and collect specimens of some exotic language. While the collection of rare specimens is good in some sense, it is not always the most gentrifying for the language speakers, nor is it really the most helpful for academic pursuits.
This is a quick note to record some of the things I have learned this week about working with lexical data within SIL's software options.
- There is information scattered all over the place:
- FLEx website: http://fieldworks.sil.org
- Google Group:https://groups.google.com/forum/#!forum/flex-list
- Toolbox website: http://www-01.sil.org/computIng/toolbox/
- Toolbox Google Grouphttps://groups.google.com/forum/#!forum/ShoeboxToolbox-Field-Linguists-Toolbox
- Webonary Website: http://webonary.org/
And then on Webonary about Data transfer: http://webonary.org/data-transfer/
- Wesay: http://wesay.palaso.org/
- A redundancy of the FLEx Google group: http://tiki.lingtransoft.info/tiki-view_forum_thread.php?comments_parentId=27&topics_offset=1
- Various introductions to FLEx: http://tiki.lingtransoft.info/Introduction+to+Flex?structure=Navmenu
- MDF documentation:http://www-01.sil.org/computing/shoebox/mdf.html including this PDF
- The LIFt format: https://code.google.com/p/lift-standard/
- LiftTools: http://downloads.palaso.org/LiftTools/
- xHtml expression of lift: http://pathway.sil.org/features/standards/dictionary-xhtml-proposed-standard/
- FLEx website: http://fieldworks.sil.org
- What should the purpose of the websites be? to distribute the product or to build community around the product's existence?
My friend Ibrahim Tume Ushe and I had several conversations about gestures in NW Nigeria. In these two videos he shows me some of the more common gestures and explains their meanings.
In this post I take a look at some of the software needs of a language documentation team. One of my ongoing concerns of linguistic software development teams (like SIL International's Palaso or LSDev, or MPI's archive software group, or a host of other niche software products adapted from main stream open-source projects) is the approach they take in communicating how to use the various elements of their software together to create useful workflows for linguists participating in field research on minority languages. Many of these software development teams do not take the approach that potential software users coming to their website want to be oriented to how these software solutions work together to solve specific problems in the language documentation problem space. Now, it is true that every language documentation program is different and will have different goals and outputs, but many of these goals are the same across projects. New users to software want to know top level organizational assumptions made by software developers. That is, they want to evaluate how software will work in a given scenario (problem space) and to understand and make informed decisions based on the eco-system that the software will lead them into. This is not too unlike users asking which is better Android or iPhone, and then deciding what works not just with a given device but where they will buy their music, their digital books, and how they will get those digital assets to a new device, when the phone they are about to buy no-longer serves them. These digital consequences are not in the mind of every consumer... but they are nonetheless real consequences.
As linguistics and language documentation interface with digital humanities there has been a lot of effort to time-align texts and audio/video materials. At one level this is rather trivial to do and has the backing of comercial media processes like subtitles in movies. However, at another level this task is often done in XML for every project (digital corpus curation) slightly differently. At the macro-scale the argument is that if the annotation of the audio is in XML and someone wants to do something else with it, then they can just convert the XML to whatever schema they desire. This is true.
However, one antidotal point that I have not heard in discussion of time aligned texts is specifications for Audio Dominant Text vs. Text Dominant Audio. This may not initially seem very important, so let me explain what I mean.
When I was in México, working with a team doing language documentation we visited a community workshop where the community organizer was promoting the language through a dictionary creation effort. I was interesting to see the various bi-lingual teachers come together and discuss a proposed entry and the definition.There were several interesting aspects of the social interaction: there was the political unity in the perception that they were all there for the good of their language, there was the social unity because they were mostly there because they were in state jobs as teachers or school administrators. But perhaps more socially significant was the perception that the workshop leader had skills in organizing a dictionary. (Nothing wrong with this perception and it is probably an accurate perception.) Yet, it was not the only perception which was at play in the social interactions. There was also the cultural age based and social ranking based way of coming to a consensus about what did a particular Meꞌphaa (or any given) word mean. It is kind of this unspoken tension between the eldest in the group who would culturally have the authority or provide a stamp of approval, the workshop "dictionary expert", and the average participant who has to decide if they agree or disagree with whom and if they are going to show it.
I feel that in the language and culture documentation community that there is a tension between “documenting” and “globalizing”. In the sense that what we as digital natives and cultural technologists think is “living” is in part “documenting”.
Now, in some sense “Language Documentation” is an academic pursuit of its own right independent of linguistics if it has a plan and tries to capture elements of the expression of the culture and language as it is spoken or acted out. I think there is a bit of confusion in the literature as linguists move from linguistics to language development and community development. This is particularly evident with the use of video in language documentation. Continue reading