This week I have been outlining the types of data that linguists need to be able to use and relate to each other as they do Language Documentation and Linguistic Research. I try to express these things graphically and then also express where some of the leading tools which SIL International is offering sit in the problem space.
Category Archives: Language Documentation
The Look of Language Archive Websites
This the start of a cross-language archive look at the current state of UX design presenting Content generated in Language Documentation.
http://www.rnld.org/archives
http://www.mpi.nl/DOBES/language_archives
http://paradisec.org.au/
http://repository.digiarch.sinica.edu.tw/index.jsp?lang=en
Leave Typology to the Typologists: I am a Linguist
A User Experience look at Linguistic Archiving
In a recent paper Jeremy Nordmoe, a friend and colleague, states that:
Because most linguists archive documents infrequently, they will never be experts at doing so, nor will they be experts in the intricacies of metadata schemas. [1] Jeremy Nordmoe. 2011. Introducing RAMP: an application for packaging metadata and resources offline for submission to an institutional repository. In Proceedings of Workshop on Language … Continue reading
My initial reply is:
You are d@#n right! and it is because archives are not sexy enough!
References
↑1 | Jeremy Nordmoe. 2011. Introducing RAMP: an application for packaging metadata and resources offline for submission to an institutional repository. In Proceedings of Workshop on Language Documentation & Archiving 18 November 2011 at SOAS, London. Edited by: David Nathan. p. 27-32. [Preprint PDF] |
Permanently accessible? to whom?

Bush house: the BBC World Service is leaving its home after 71 years
Photo: Paul Grover via The Telegraph
References
↑1 | Christopher Middleton. 7:30 am BST 10 Jul 2012. For sale: Bush House, a landmark of BBC World Service history. The Telegraph on-line. http://www.telegraph.co.uk/culture/tvandradio/bbc/9386848/For-sale-Bush-House-a-landmark-of-BBC-World-Service-history.html [Link] [Accessed: 19 July 2012] |
↑2 | Jonathan Prynn. 11 July 2012. Buy a bit of BBC radio history… or an entire studio. London Evening Standard on-line. http://www.standard.co.uk/news/uk/buy-a-bit-of-bbc-radio-history-or-an-entire-studio-7935734.html [Link] [Accessed: 19 July 2012] |
↑3 | Paul Ridden. 12:41 pm 12 July 2012. Updated: BBC World Service equipment and memorabilia to go under the auctioneer's hammer. gizmag online. http://www.gizmag.com/bbc-world-service-bush-house-auction/23292/ [Link] [Accessed: 19 July 2012] |
The Citation Problem
In a team framework where there are several members of a research team and the job requirements call for the sharing of bibliographic data (of materials referenced) as well as the actual resources being referenced. In this environment there needs to be a central repository for sharing both kinds of data. This is true for small localized (geographically) groups as well as large distributed research teams. New researchers joining a existing team need to be able to “plug-in” to existing foundational work on the project and be able to access bibliographic data as well as the resources those bibliographic details point to. It is my point here to outline some of the current challenges involved in trying to overcoming the collaborative obstacle when working in the fields of Linguistics and Language Documentation [1]Nikolaus P. Himmelmann. 1998. Documentary and Descriptive Linguistics. Linguistics vol. 36:161-195. [PDF] [Accessed 24 Dec. 2010].This sentiment is echoed by many in the world of science. Here is someone on Zetero’s forums [INSERT LINK]. (Though Zetero does claim to combat some of these issues.)
Bibliographic Data v.s Citation Data
Socio-linguisitc Profiles for Language Documentation
Some researchers in linguistics (in my acquaintance) have been less than excited about the notion of asking for socio-linguistic data or socio-personal data from language informants. The objection has been that it is just bad form. While I am a great advocate of personal privacy (especially in digital formats), I see that one of the most informative parts of the language documentation process is understanding who the speakers being recording or being worked with are. Language variation is fundamentally connected with identity. While crucial elements of how a community segments itself along identity lines may not be known for several years, having a robust socio-cultural or socio-personal questionare about the language informants will later help place the documentation data in perspective of the larger waves of variation in the community.
This is to say, I am thoroughly convinced that a socio-linguistic questionare is important as part of the language documentation process. It might not need to be done first, but it will help researchers and future users of archived material understand where to place these speech samples in context of that speakers society.
The outstanding question, and one with a variable answer is how to appropriately approach the questions in the questionare. Should the questionare be approached formally? Or should it be asked in conversational format? Should it be elicited digitally? One of the interesting things about eliciting things digitally is that they may have the appearance to be less intrusive because they are less formal. While I have no empirical evidence based on years of cross cultural work, I do have the Facebook phenomena. That is minority language users all over the world are using Facebook. And Facebook is collection (and allowing the users to volunteer) and then verifying the users’ provided data.
Below is a list of elements which Facebook is collecting (it is also collecting log-in locations and times). So, some of these questions are certainly in-scope of what language documenters would minimally like to know about their indigenous language speaking informants and collaborators. Others of these questions are certainly not in-scope for the recommended socio-linguistic profile from language documenters or socio-linguists.
[table id=13 /]
Reflections on CRASSH
In July I presented a paper at CRASSH in Cambridge. It was a small conference, but being in Europe it was good to see many of the various kinds of projects which are going on in Digital Humanities and Linguists, or also Cloud Computing and Linguistics. One particular project, TypeCraft, stands out as being rather well done and promising was presented by Dorothee Beermann Hellan. I think the ideas presented in this project are well thought out and seem to be well implemented. It would be nice to see this product integrated with some other linguistics and language documentation cloud offerings. i.e. Project Lego from the Linguist’s List or the Max Planck Institute’s LEXUS project. While TypeCraft does allow for round tripping of data with XML, what I am talking about is a consolidated User Experience for both professional linguists and for Minority language users.
A note on foundational technologies:
- It appears that Lexus is is built on BaseX with Cocoon and XML.
- The front page of TypeCraft has a very Wikipedia like feel, but this might not be the true foundational technology.
- Linguist’s List often does their work in ColdFusion and the LEGO project definitely has this feel about it.
Keyboard Design for Minority languages
This post is a open draft! It might be updated at any time… But was last updated on at .
Pre-Print Draft will not be available through this means, though there is a video of the presentation.
A. Meꞌphaa Text Sample
A̱ ngui̱nꞌ, tsáanꞌ ninimba̱ꞌlaꞌ ju̱ya̱á Jesús, ga̱ju̱ma̱ꞌlaꞌ rí phú gagi juwalaꞌ ído̱ rí nanújngalaꞌ awúun mbaꞌa inii gajmá. Numuu ndu̱ya̱á málaꞌ rí ído̱ rí na̱ꞌnga̱ꞌlaꞌ inuu gajmá, nasngájma ne̱ rí gakon rí jañii a̱kia̱nꞌlaꞌ ju̱ya̱á Ana̱ꞌlóꞌ, jamí naꞌne ne̱ rí ma̱wajún gúkuálaꞌ. I̱ndo̱ó máꞌ gíꞌmaa rí ma̱wajún gúkuálaꞌ xúgíí mbiꞌi, kajngó ma̱jráanꞌlaꞌ jamí ma̱ꞌne rí jañii a̱kia̱nꞌlaꞌ, asndo rí náxáꞌyóo nitháan rí jaꞌyoo ma̱nindxa̱ꞌlaꞌ. [I̱yi̱i̱ꞌ rí niꞌtháán Santiágo̱ 1:2-4]
B. Sochiapam Chinantec Text Sample
Hnoh² reh², ma³hiún¹³ hnoh² honh² lɨ³ua³ cáun² hi³ quiunh³² náh², quí¹ la³ cun³ hi³ má²ca³lɨ³ ñíh¹ hnoh² jáun² hi³ tɨ³ jlánh¹ bíh¹ re² lı̵́²tɨn² tsú² hi³ jmu³ juenh² tsı̵́³, nı̵́¹juáh³ zia³² hi³ cá² lau²³ ca³tɨ²¹ hi³ taunh³² tsú² jáun² ta²¹. Hi³ jáun² né³, chá¹ hnoh² cáun² honh², hi³ jáun² lı̵́¹³ lɨ³tɨn² hnoh² re² hi³ jmúh¹³ náh² juenh² honh², hi³ jáun² hnoh² lı̵́¹³ lı̵́n³ náh² tsá² má²hún¹ tsı̵́³, tsá² má²ca³hiá² ca³táunh³ ca³la³ tán¹ hián² cu³tí³, la³ cun³ tsá² tiá² hi³ lɨ³hniauh²³ hí¹ cáun² ñí¹con² yáh³. [Jacobo Jmu² Cáun² Sí² Hi³ Ca³tɨn¹ Tsá² *Judíos, Tsá² Má²tiáunh¹ Ñí¹ Hliáun³ 1:2-4]
C. Spanish Text Sample
Hermanos míos, gozaos profundamente cuando os halléis en diversas pruebas, sabiendo que la prueba de vuestra fe produce paciencia. Pero tenga la paciencia su obra completa, para que seáis perfectos y cabales, sin que os falte cosa alguna. [Santiago 1:2-4 Reina-Valera 1995 (RVR1995)]
D. English Text Sample
Dear brothers and sisters, when troubles come your way, consider it an opportunity for great joy. For you know that when your faith is tested, your endurance has a chance to grow. So let it grow, for when your endurance is fully developed, you will be perfect and complete, needing nothing. [James 1:2-4 New Living Translation (NLT 2007)]
Linking Minority Language Dictionaries to Open Data
What is the role of a dictionary?
Is the role of a dictionary to regulate or to standardize spelling? Is it to validate a speech variety as being real or a bon fide language? Or is it for documenting and establishing the relationships and connections between things (plants, animals, fish, spirits/gods, medicines, etc.) as they are emicly viewed, for connecting people via collaboration, or connecting related concepts and their classes together into documented sets? Or even connecting these things and relationships as they are viewed in one culture to the same things and relationships as they are viewed in another culture or more broadly cross-culturally? Continue reading
The Look of Language Development Websites
I have been thinking through some of the presentation issues for presenting SIL International’s work on the web. As part of this I have also been looking at other organizations which are part of the language documentation and minority language revitalization movement. I recently ran across several nicely done web sites.
National Geographic Genographic Project
Continue reading