A lot of language documentation money gets pushed towards endangered languages or languages with very few speakers. Is often endowed upon the aspiring academic, who may be promising to create a grammar for a previously un-written or undescribed language.
Sometimes I have the opportunity to read grammars. I read them and have questions about how the described data sounds. Both In context and as elicited. To that end I wonder if it wouldn't be money better spent for language documentation and benefit to the academy, if organizations funding language documentation research for the academy would rather fund the collection of audio texts and video texts of data already described in grammars. In a way provide the support that modern grammars should have.
That is, I find that often the state of grammars about languages (often about African languages) are so fraught with errors, or jaded with theoretical disposition, that it would be immensely helpful if these grammars were supported with audio texts. It seems that the focus on small, often dying, languages, requiring an impetus of "adequate" endangerment for funding, shows a pre-disposition to try and collect specimens of some exotic language. While the collection of rare specimens is good in some sense, it is not always the most gentrifying for the language speakers, nor is it really the most helpful for academic pursuits.
In this post I take a look at some of the software needs of a language documentation team. One of my ongoing concerns of linguistic software development teams (like SIL International's Palaso or LSDev, or MPI's archive software group, or a host of other niche software products adapted from main stream open-source projects) is the approach they take in communicating how to use the various elements of their software together to create useful workflows for linguists participating in field research on minority languages. Many of these software development teams do not take the approach that potential software users coming to their website want to be oriented to how these software solutions work together to solve specific problems in the language documentation problem space. Now, it is true that every language documentation program is different and will have different goals and outputs, but many of these goals are the same across projects. New users to software want to know top level organizational assumptions made by software developers. That is, they want to evaluate how software will work in a given scenario (problem space) and to understand and make informed decisions based on the eco-system that the software will lead them into. This is not too unlike users asking which is better Android or iPhone, and then deciding what works not just with a given device but where they will buy their music, their digital books, and how they will get those digital assets to a new device, when the phone they are about to buy no-longer serves them. These digital consequences are not in the mind of every consumer... but they are nonetheless real consequences. Continue reading →
As linguistics and language documentation interface with digital humanities there has been a lot of effort to time-align texts and audio/video materials. At one level this is rather trivial to do and has the backing of comercial media processes like subtitles in movies. However, at another level this task is often done in XML for every project (digital corpus curation) slightly differently. At the macro-scale the argument is that if the annotation of the audio is in XML and someone wants to do something else with it, then they can just convert the XML to whatever schema they desire. This is true.
However, one antidotal point that I have not heard in discussion of time aligned texts is specifications for Audio Dominant Text vs. Text Dominant Audio. This may not initially seem very important, so let me explain what I mean. Continue reading →
I have been working on describing the FLEx software eco-system (for both a blog post and an info-graphic). In the process I googled "language documentation" workflow and was promptly directed to resources created for InField and aggregated via ctldc.org. An amazing set of resources. the ctldc.org website is well put together and the content from InField 2010 and 2008 is amazing - I which I could have been there. I am almost convinced that most SIL staff pursuing linguistic fieldwork should just go to InField... But it is true that InField seems to be targeted at someone who has had more than one semester of linguistics training.
I feel that in the language and culture documentation community that there is a tension between “documenting” and “globalizing”. In the sense that what we as digital natives and cultural technologists think is “living” is in part “documenting”.
Now, in some sense “Language Documentation” is an academic pursuit of its own right independent of linguistics if it has a plan and tries to capture elements of the expression of the culture and language as it is spoken or acted out. I think there is a bit of confusion in the literature as linguists move from linguistics to language development and community development. This is particularly evident with the use of video in language documentation. Continue reading →
In a recent paper Jeremy Nordmoe, a friend and colleague, states that:
Because most linguists archive documents infrequently, they will never be experts at doing so, nor will they be experts in the intricacies of metadata schemas.  Jeremy Nordmoe. 2011. Introducing RAMP: an application for packaging metadata and resources offline for submission to an institutional repository. In Proceedings of Workshop on Language … Continue reading
My initial reply is:
You are d@#n right! and it is because archives are not sexy enough!
Jeremy Nordmoe. 2011. Introducing RAMP: an application for packaging metadata and resources offline for submission to an institutional repository. In Proceedings of Workshop on Language Documentation & Archiving 18 November 2011 at SOAS, London. Edited by: David Nathan. p. 27-32. [Preprint PDF]
In a team framework where there are several members of a research team and the job requirements call for the sharing of bibliographic data (of materials referenced) as well as the actual resources being referenced. In this environment there needs to be a central repository for sharing both kinds of data. This is true for small localized (geographically) groups as well as large distributed research teams. New researchers joining a existing team need to be able to “plug-in” to existing foundational work on the project and be able to access bibliographic data as well as the resources those bibliographic details point to. It is my point here to outline some of the current challenges involved in trying to overcoming the collaborative obstacle when working in the fields of Linguistics and Language Documentation Nikolaus P. Himmelmann. 1998. Documentary and Descriptive Linguistics. Linguistics vol. 36:161-195. [PDF] [Accessed 24 Dec. 2010].This sentiment is echoed by many in the world of science. Here is someone on Zetero’s forums [INSERT LINK]. (Though Zetero does claim to combat some of these issues.)
Some researchers in linguistics (in my acquaintance) have been less than excited about the notion of asking for socio-linguistic data or socio-personal data from language informants. The objection has been that it is just bad form. While I am a great advocate of personal privacy (especially in digital formats), I see that one of the most informative parts of the language documentation process is understanding who the speakers being recording or being worked with are. Language variation is fundamentally connected with identity. While crucial elements of how a community segments itself along identity lines may not be known for several years, having a robust socio-cultural or socio-personal questionare about the language informants will later help place the documentation data in perspective of the larger waves of variation in the community.
This is to say, I am thoroughly convinced that a socio-linguistic questionare is important as part of the language documentation process. It might not need to be done first, but it will help researchers and future users of archived material understand where to place these speech samples in context of that speakers society.
The outstanding question, and one with a variable answer is how to appropriately approach the questions in the questionare. Should the questionare be approached formally? Or should it be asked in conversational format? Should it be elicited digitally? One of the interesting things about eliciting things digitally is that they may have the appearance to be less intrusive because they are less formal. While I have no empirical evidence based on years of cross cultural work, I do have the Facebook phenomena. That is minority language users all over the world are using Facebook. And Facebook is collection (and allowing the users to volunteer) and then verifying the users’ provided data.
Facebook User Base Graph from 2010
Below is a list of elements which Facebook is collecting (it is also collecting log-in locations and times). So, some of these questions are certainly in-scope of what language documenters would minimally like to know about their indigenous language speaking informants and collaborators. Others of these questions are certainly not in-scope for the recommended socio-linguistic profile from language documenters or socio-linguists.
[table id=13 /]