Thoughts on file formats and file names in language documentation projects and archiving

I’ve written about some of these file issues before.

https://hugh.thejourneyler.org/2012/the-workflow-management-for-linguists/

https://hugh.thejourneyler.org/2012/the-data-management-space-for-linguists/

https://hugh.thejourneyler.org/2012/resources-for-digitizing-audio-as-part-of-archiving/

https://hugh.thejourneyler.org/2011/presentation-version-vs-archival-version-of-digital-audio-files/

Lexical Data Management helps (with SIL software)

This is a quick note to record some of the things I have learned this week about working with lexical data within SIL's software options.

  1. There is information scattered all over the place:
  2. What should the purpose of the websites be? to distribute the product or to build community around the product's existence?

Software Needs for a Language Documentation Project

In this post I take a look at some of the software needs of a language documentation team. One of my ongoing concerns of linguistic software development teams (like SIL International's Palaso or LSDev, or MPI's archive software group, or a host of other niche software products adapted from main stream open-source projects) is the approach they take in communicating how to use the various elements of their software together to create useful workflows for linguists participating in field research on minority languages. Many of these software development teams do not take the approach that potential software users coming to their website want to be oriented to how these software solutions work together to solve specific problems in the language documentation problem space. Now, it is true that every language documentation program is different and will have different goals and outputs, but many of these goals are the same across projects. New users to software want to know top level organizational assumptions made by software developers. That is, they want to evaluate how software will work in a given scenario (problem space) and to understand and make informed decisions based on the eco-system that the software will lead them into. This is not too unlike users asking which is better Android or iPhone, and then deciding what works not just with a given device but where they will buy their music, their digital books, and how they will get those digital assets to a new device, when the phone they are about to buy no-longer serves them. These digital consequences are not in the mind of every consumer... but they are nonetheless real consequences.
Continue reading

Audio Dominant Texts and Text Dominant Audio

As linguistics and language documentation interface with digital humanities there has been a lot of effort to time-align texts and audio/video materials. At one level this is rather trivial to do and has the backing of comercial media processes like subtitles in movies. However, at another level this task is often done in XML for every project (digital corpus curation) slightly differently. At the macro-scale the argument is that if the annotation of the audio is in XML and someone wants to do something else with it, then they can just convert the XML to whatever schema they desire. This is true.

However, one antidotal point that I have not heard in discussion of time aligned texts is specifications for Audio Dominant Text vs. Text Dominant Audio. This may not initially seem very important, so let me explain what I mean.
Continue reading

The Look of Language Archive Websites

This the start of a cross-language archive look at the current state of UX design presenting Content generated in Language Documentation.

http://www.rnld.org/archives
http://www.mpi.nl/DOBES/language_archives

http://paradisec.org.au/
http://repository.digiarch.sinica.edu.tw/index.jsp?lang=en

http://alma.matrix.msu.edu/

http://www.thlib.org/

http://www.thlib.org/

http://www.ailla.utexas.org/site/welcome.html

Reflections on CRASSH

In July I presented a paper at CRASSH in Cambridge. It was a small conference, but being in Europe it was good to see many of the various kinds of projects which are going on in Digital Humanities and Linguists, or also Cloud Computing and Linguistics. One particular project, TypeCraft, stands out as being rather well done and promising was presented by Dorothee Beermann Hellan. I think the ideas presented in this project are well thought out and seem to be well implemented. It would be nice to see this product integrated with some other linguistics and language documentation cloud offerings. i.e. Project Lego from the Linguist’s List or the Max Planck Institute’s LEXUS project. While TypeCraft does allow for round tripping of data with XML, what I am talking about is a consolidated User Experience for both professional linguists and for Minority language users.

A note on foundational technologies:

  • It appears that Lexus is is built on BaseX with Cocoon and XML.
  • The front page of TypeCraft has a very Wikipedia like feel, but this might not be the true foundational technology.
  • Linguist’s List often does their work in ColdFusion and the LEGO project definitely has this feel about it.

Types of Linguistic Maps: The Mapping of linguistic Features and Researcher Interactivity

A couple of years ago I had a chance meeting with a cartographer in North Dakota. It was interesting because he asked us (a group of linguists) What is a language or linguistic map? So, I grabbed a few examples and put them into a brief for him. This past January at the LSA meeting in Portland, Oregon, I had several interesting conversations with the folks at the LL-Map Project under Linguists’ List. It occurred to me that such a presentation of various kinds of language maps might be useful to a larger audience. So this will be a bit unpolished but should show a wide selection of language and linguistic based maps, and in the last section I will also talk a bit about interactive maps. Continue reading