In this post I take a look at some of the software needs of a language documentation team. One of my ongoing concerns of linguistic software development teams (like SIL International's Palaso or LSDev, or MPI's archive software group, or a host of other niche software products adapted from main stream open-source projects) is the approach they take in communicating how to use the various elements of their software together to create useful workflows for linguists participating in field research on minority languages. Many of these software development teams do not take the approach that potential software users coming to their website want to be oriented to how these software solutions work together to solve specific problems in the language documentation problem space. Now, it is true that every language documentation program is different and will have different goals and outputs, but many of these goals are the same across projects. New users to software want to know top level organizational assumptions made by software developers. That is, they want to evaluate how software will work in a given scenario (problem space) and to understand and make informed decisions based on the eco-system that the software will lead them into. This is not too unlike users asking which is better Android or iPhone, and then deciding what works not just with a given device but where they will buy their music, their digital books, and how they will get those digital assets to a new device, when the phone they are about to buy no-longer serves them. These digital consequences are not in the mind of every consumer... but they are nonetheless real consequences.
Continue reading
Tag Archives: opendraft
The Look of Language Archive Websites
This the start of a cross-language archive look at the current state of UX design presenting Content generated in Language Documentation.
http://www.rnld.org/archives
http://www.mpi.nl/DOBES/language_archives
http://paradisec.org.au/
http://repository.digiarch.sinica.edu.tw/index.jsp?lang=en
Leave Typology to the Typologists: I am a Linguist
A User Experience look at Linguistic Archiving
In a recent paper Jeremy Nordmoe, a friend and colleague, states that:
Because most linguists archive documents infrequently, they will never be experts at doing so, nor will they be experts in the intricacies of metadata schemas.
My initial reply is:
You are d@#n right! and it is because archives are not sexy enough!
Permanently accessible? to whom?

Bush house: the BBC World Service is leaving its home after 71 years
Photo: Paul Grover via The Telegraph
Useful or Not?
This post is a open draft! It might be updated at any time... But was last updated on at .
The online version of the SIL Bibliography contains a subset of over 29,000 citations from the more than 40,000 publications representing 75 years of SIL International's language research in over 2,700 languages.
Finding Resources through SIL.org's (as of 2 August 2012) Bibliography can be a challenge at times - Maybe even a time-wasting endeavor. Time wasting because it might not be very useful to consult the online Bibliography.
The challenging aspect which affects usefulness is primarily three fold:
- Items known by SIL to have been created by SIL staff may or may not be listed. (The on-line Bibliography is a sub-set.)
- Items listed in the Bibilography may or may not have digitally accessible resources.
- Items created by SIL staff may or may not be in the bibliography because they have not been submitted to the Language and Culture Archive (managing division of the SIL Bibliography).
The Citation Problem
In a team framework where there are several members of a research team and the job requirements call for the sharing of bibliographic data (of materials referenced) as well as the actual resources being referenced. In this environment there needs to be a central repository for sharing both kinds of data. This is true for small localized (geographically) groups as well as large distributed research teams. New researchers joining a existing team need to be able to “plug-in” to existing foundational work on the project and be able to access bibliographic data as well as the resources those bibliographic details point to. It is my point here to outline some of the current challenges involved in trying to overcoming the collaborative obstacle when working in the fields of Linguistics and Language Documentation.This sentiment is echoed by many in the world of science. Here is someone on Zetero’s forums [INSERT LINK]. (Though Zetero does claim to combat some of these issues.)
Bibliographic Data v.s Citation Data
Keyboard Design for Minority languages
This post is a open draft! It might be updated at any time… But was last updated on at .
Pre-Print Draft will not be available through this means, though there is a video of the presentation.
A. Meꞌphaa Text Sample
A̱ ngui̱nꞌ, tsáanꞌ ninimba̱ꞌlaꞌ ju̱ya̱á Jesús, ga̱ju̱ma̱ꞌlaꞌ rí phú gagi juwalaꞌ ído̱ rí nanújngalaꞌ awúun mbaꞌa inii gajmá. Numuu ndu̱ya̱á málaꞌ rí ído̱ rí na̱ꞌnga̱ꞌlaꞌ inuu gajmá, nasngájma ne̱ rí gakon rí jañii a̱kia̱nꞌlaꞌ ju̱ya̱á Ana̱ꞌlóꞌ, jamí naꞌne ne̱ rí ma̱wajún gúkuálaꞌ. I̱ndo̱ó máꞌ gíꞌmaa rí ma̱wajún gúkuálaꞌ xúgíí mbiꞌi, kajngó ma̱jráanꞌlaꞌ jamí ma̱ꞌne rí jañii a̱kia̱nꞌlaꞌ, asndo rí náxáꞌyóo nitháan rí jaꞌyoo ma̱nindxa̱ꞌlaꞌ. [I̱yi̱i̱ꞌ rí niꞌtháán Santiágo̱ 1:2-4]
B. Sochiapam Chinantec Text Sample
Hnoh² reh², ma³hiún¹³ hnoh² honh² lɨ³ua³ cáun² hi³ quiunh³² náh², quí¹ la³ cun³ hi³ má²ca³lɨ³ ñíh¹ hnoh² jáun² hi³ tɨ³ jlánh¹ bíh¹ re² lı̵́²tɨn² tsú² hi³ jmu³ juenh² tsı̵́³, nı̵́¹juáh³ zia³² hi³ cá² lau²³ ca³tɨ²¹ hi³ taunh³² tsú² jáun² ta²¹. Hi³ jáun² né³, chá¹ hnoh² cáun² honh², hi³ jáun² lı̵́¹³ lɨ³tɨn² hnoh² re² hi³ jmúh¹³ náh² juenh² honh², hi³ jáun² hnoh² lı̵́¹³ lı̵́n³ náh² tsá² má²hún¹ tsı̵́³, tsá² má²ca³hiá² ca³táunh³ ca³la³ tán¹ hián² cu³tí³, la³ cun³ tsá² tiá² hi³ lɨ³hniauh²³ hí¹ cáun² ñí¹con² yáh³. [Jacobo Jmu² Cáun² Sí² Hi³ Ca³tɨn¹ Tsá² *Judíos, Tsá² Má²tiáunh¹ Ñí¹ Hliáun³ 1:2-4]
C. Spanish Text Sample
Hermanos míos, gozaos profundamente cuando os halléis en diversas pruebas, sabiendo que la prueba de vuestra fe produce paciencia. Pero tenga la paciencia su obra completa, para que seáis perfectos y cabales, sin que os falte cosa alguna. [Santiago 1:2-4 Reina-Valera 1995 (RVR1995)]
D. English Text Sample
Dear brothers and sisters, when troubles come your way, consider it an opportunity for great joy. For you know that when your faith is tested, your endurance has a chance to grow. So let it grow, for when your endurance is fully developed, you will be perfect and complete, needing nothing. [James 1:2-4 New Living Translation (NLT 2007)]
Metadata and the Target Audience
I have been reviewing applications for library, research and citation metadata. Things like RDF, METS, Dublin Core, .ris and BibTeX. In some ways these things are related – they are metadata. But in other ways they are different animals.
In my search I have found two very different classes of metadata schemes based on two different kinds of end users.
- End users who are machines (Metadata for interoperability or resource discovery).
- End users who are human.
End Users who are machines are usually concerned with the interoperability of metadata for search, storage, and advertisement. These kinds of systems usually are engineered to use metadata schemes like Dublin Core, MODS and METS. Often these systems are able to communicate high level metadata in generic categories.
However, End Users who are human are usually concerned with purposing the metadata in creative processes. And in general, desire to use and appropriate more specific elements of metadata. This is especially true with citation metadata. Students and researchers want to be able to build bibliographies with the data. Additionally, Many of the more detaied metadata elements, that is, overly detailed from a Dublin Core perspective (i.e.
Of those users looking to use metadata to construct bibliographies and citations, they are often looking for that metadata in the interchange formats of either BibTeX, Endnote XML or .ris. Of those users interested in finding things based on technical metadata, such as audio technicians, linguists, ethnographers, and ethnomusicologists, they are looking to use the metadata and the object it describes in a workflow. And in order to purpose that media object as they need to, those users need to make sure that the digital object fits their workflow criteria.
This discrepancy between Metadata for System to System transmission and Metadata for End Users creates a bit of a complext situation, in that delivery systems need to consider both sets of users.
Which information to record?
http://www.jiscdigitalmedia.ac.uk/audio/advice/metadata-and-audio-resources
Structured metadata is divided into four main categories that contain information which is defined by the schemas or extension schemas being used:
- Structural metadata. This is information about the structural relationship with other parent or family files and how the metadata relates to the file.
- Descriptive metadata. This is information about the content of the digital file. The information recorded here is more curatorial than technical, and is the primary portal for users to access your resource. Data including File name, creator, associated dates, description, summary, locations etc should be standardised using a interoperable schema such as Simple DC or MODS.
- Administrative metadata. This contains information about the analogue source material, the rights of the content and any preservation information. Information here provides support to the managerial team of the collection and researchers in organising and providing access to the resource. Information about rights, ownership and usage restrictions is also kept within the administrative metadata.
- Technical metadata. To make good use of the digital object data is required which describes the technical qualities of the physical and/or digital object. This includes information such as channel number, bit-depth, sampling rate, and the unique file identifier. AudioMD, is an XML based schema that has been designed primarily for this purpose. It is soon to be superseded by AES-X098, developed by the Audio Engineering Society, upon its formal release.
Though it is possible to separate out some finer grained metadata categories. Consider the differences from above and those below which were part of my post about Metadata for Socio-linguistic Corpora:
- Descriptive meta-data: supports discovery, attribution and identification of resources created.
- Administrative meta-data: supports management, preservation, and appropriate usage of resources created.
- Technical: About the machinery used to create the resource and the technical aspects of the resource.
- Use (meaning how one may use the objects) and Rights: Copyright, license and moral ownership of the items.
- Structural meta-data: maintains relationships between the parts of complex, multi-part resources (Spanne 2008).
- Situational: this is metadata which describes the events around the creation of the work. Asking questions about the social setting, or the precursory events. It follows ideas put forward by Bergqvist (2007).
- Use metadata: metadata collected from or about the users themselves (e.g. user annotations, number of people accessing a particular resource)
In that post I also said:
I think it is only fair to point out to archivist and to librarians that linguists and language documenters do not see a difference between descriptive and non-descriptive metadata in their workflows. That is sometimes we want to search all the corpora by licenses or by a technical attribute. This elevates the these attributes to the function of discovery metadata. It does not remove the function of descriptive metadata from its role in finding things but it does functionally mean that the other metadata is also viable as discovery metadata.
Compare and match three
My goal here is to compare Doublin Core [http://www.feedforall.com/dublin-core.htm] with BibTeXThere is a nice cross-walk technology for bibTex resources in source-forge: http://bibtexml.sourceforge.net/details.html and with .ris.
“RIS” Format Documentation Adding a “Direct Export” Button to Your Web Page or Web Application
List of Mappings not .ris or Bibtex to DC but many other cross walks.
From Folksonomies to Taxonomies with Linguistic Metadata
This post is a open draft! It might be updated at any time... But was last updated on at .
Metadata is very important - Everyone agrees. However, there is some discussion when it comes to how to develop metadata and also how to ensure that the metadata is accurate. Taxonomies are limited vocabularies (a set number of items) where each term has a predefined definition. A folksonomy is a vocabulary where people, usually users of data, assign their own useful words or metadata to an item. Folksonomies are like taxonomies in that they are both sets but are unlike taxonomies in the sense that they are an open set where taxonomies are closed sets.
An example of a taxonomy might be the colors of a traffic light: Red, Yellow, and Green. If this were a folksonomy people might suggest also the colors of Amber, Orange, Blue-Green and Blue. These additional terms may be accurate to some viewers of traffic lights or in some cases but they do not fit the stereo-typical model for what are the colors of traffic lights.
Continue reading
Types of Linguistic Maps: The Mapping of linguistic Features and Researcher Interactivity
A couple of years ago I had a chance meeting with a cartographer in North Dakota. It was interesting because he asked us (a group of linguists) What is a language or linguistic map? So, I grabbed a few examples and put them into a brief for him. This past January at the LSA meeting in Portland, Oregon, I had several interesting conversations with the folks at the LL-Map Project under Linguists’ List. It occurred to me that such a presentation of various kinds of language maps might be useful to a larger audience. So this will be a bit unpolished but should show a wide selection of language and linguistic based maps, and in the last section I will also talk a bit about interactive maps. Continue reading