This week I have been outlining the types of data that linguists need to be able to use and relate to each other as they do Language Documentation and Linguistic Research. I try to express these things graphically and then also express where some of the leading tools which SIL International is offering sit in the problem space.
Category Archives: Language Documentation
The Look of Language Archive Websites
This the start of a cross-language archive look at the current state of UX design presenting Content generated in Language Documentation.
http://www.rnld.org/archives
http://www.mpi.nl/DOBES/language_archives
http://paradisec.org.au/
http://repository.digiarch.sinica.edu.tw/index.jsp?lang=en
Leave Typology to the Typologists: I am a Linguist
A User Experience look at Linguistic Archiving
In a recent paper Jeremy Nordmoe, a friend and colleague, states that:
Because most linguists archive documents infrequently, they will never be experts at doing so, nor will they be experts in the intricacies of metadata schemas.
My initial reply is:
You are d@#n right! and it is because archives are not sexy enough!
Permanently accessible? to whom?

Bush house: the BBC World Service is leaving its home after 71 years
Photo: Paul Grover via The Telegraph
The Citation Problem
In a team framework where there are several members of a research team and the job requirements call for the sharing of bibliographic data (of materials referenced) as well as the actual resources being referenced. In this environment there needs to be a central repository for sharing both kinds of data. This is true for small localized (geographically) groups as well as large distributed research teams. New researchers joining a existing team need to be able to “plug-in” to existing foundational work on the project and be able to access bibliographic data as well as the resources those bibliographic details point to. It is my point here to outline some of the current challenges involved in trying to overcoming the collaborative obstacle when working in the fields of Linguistics and Language Documentation.This sentiment is echoed by many in the world of science. Here is someone on Zetero’s forums [INSERT LINK]. (Though Zetero does claim to combat some of these issues.)
Bibliographic Data v.s Citation Data
Socio-linguisitc Profiles for Language Documentation
Some researchers in linguistics (in my acquaintance) have been less than excited about the notion of asking for socio-linguistic data or socio-personal data from language informants. The objection has been that it is just bad form. While I am a great advocate of personal privacy (especially in digital formats), I see that one of the most informative parts of the language documentation process is understanding who the speakers being recording or being worked with are. Language variation is fundamentally connected with identity. While crucial elements of how a community segments itself along identity lines may not be known for several years, having a robust socio-cultural or socio-personal questionare about the language informants will later help place the documentation data in perspective of the larger waves of variation in the community.
This is to say, I am thoroughly convinced that a socio-linguistic questionare is important as part of the language documentation process. It might not need to be done first, but it will help researchers and future users of archived material understand where to place these speech samples in context of that speakers society.
The outstanding question, and one with a variable answer is how to appropriately approach the questions in the questionare. Should the questionare be approached formally? Or should it be asked in conversational format? Should it be elicited digitally? One of the interesting things about eliciting things digitally is that they may have the appearance to be less intrusive because they are less formal. While I have no empirical evidence based on years of cross cultural work, I do have the Facebook phenomena. That is minority language users all over the world are using Facebook. And Facebook is collection (and allowing the users to volunteer) and then verifying the users’ provided data.
Below is a list of elements which Facebook is collecting (it is also collecting log-in locations and times). So, some of these questions are certainly in-scope of what language documenters would minimally like to know about their indigenous language speaking informants and collaborators. Others of these questions are certainly not in-scope for the recommended socio-linguistic profile from language documenters or socio-linguists.
FaceBook data catagories on user profiles.
Data Facebook Collects about users through their profile and activities.from: https://www.facebook.com/help/326826564067688 on 23 August 2012.
What info is available? | What is it? | Where can I find it? |
---|---|---|
About Me | Information you added to the About section of your timeline like relationships, work, education, where you live and more. It includes any updates or changes you made in the past and what’s currently in the About section of your timeline. | Activity Log |
Account Status History | The dates when your account was reactivated, deactivated, disabled or deleted. | Expanded Archive |
Address | Your current address or any past addresses you had on your account. | Expanded Archive |
Alternate Name | Any alternate names you have on your account (ex: a maiden name or a nickname). | Expanded Archive |
Apps | All of the apps you subscribe to. | Expanded Archive |
Birthday Visibility | How your birthday appears on your timeline. | Expanded Archive |
Chat | A history of the conversations you’ve had on Facebook Chat. | Downloaded Info |
Check-ins | All of the places you’ve checked into. | Downloaded Info Activity Log |
Connections | The people who have liked your Page or Place, RSVPed to your event, installed your app or checked in to your advertised place within 24 hours of viewing or clicking on an ad or Sponsored Story. | Activity Log |
Currency | Your preferred currency on Facebook. If you use Facebook Payments, this will be used to display prices and charge your credit cards. | Expanded Archive |
Current City | The city you added to the About section of your timeline. | Downloaded Info |
Date of Birth | The date you added to Birthday in the About section of your timeline. | Downloaded Info |
Deleted Friends | The people you’ve unfriended. | Expanded Archive |
Education | Any information you added to Education in the About section of your timeline. | Downloaded Info |
Emails | Email addresses added to your account (even those you may have removed). | Expanded Archive |
Events | Events you’ve joined or been invited to. | Activity Log |
Family | Friends you’ve indicated are family members. | Expanded Archive |
Favorite Quotes | Information you’ve added to the Favorite Quotes section of the About section of your timeline. | Downloaded Info |
Friend Requests | Pending sent and received friend requests. | Expanded Archive |
Friends | A list of your friends. | Downloaded Info |
Gender | The gender you added to the About section of your timeline. | Downloaded Info |
Groups | A list of groups you belong to on Facebook. | Downloaded Info |
Hidden from News Feed | Any friends, apps or pages you’ve hidden from your News Feed. | Expanded Archive |
Hometown | The place you added to hometown in the About section of your timeline (profile). | Downloaded Info |
IP Addresses | A list of addresses where you’ve logged into your Facebook account. | Expanded Archive |
Last Location | The last location associated with an update. | Activity Log |
Likes on Other’s Posts | Posts, photos or other content you’ve liked. | Activity Log |
Likes on Your Posts from others | Likes on your own posts, photos or other content. | Activity Log |
Likes on Other Sites | Likes you’ve made on other sites off of Facebook. | Activity Log |
Locale | The language you see on Facebook is based on where you’re located. | Expanded Archive |
Logins | IP address, date and time associated with logins to your Facebook account. | Expanded Archive |
Logouts | IP address, date and time associated with logouts from your Facebook account. | Expanded Archive |
Messages | Archive of messages you’ve sent and received on Facebook. | Downloaded Info |
Name | The name on your Facebook account. | Downloaded Info |
Name Changes | Any changes you’ve made to the original name you used when you signed up for Facebook. | Expanded Archive |
Networks | Networks (affiliations with schools or workplaces) that you belong to on Facebook. | Expanded Archive |
Notes | Any notes you’ve written and published to your account. | Activity Log |
Notification Settings | A list of all your notifications and whether you have email and text enabled or disabled for each. | Expanded Archive |
Pages You Admin | A list of pages you admin. | Expanded Archive |
Phone Numbers | Mobile phone numbers you’ve added to your account. | Expanded Archive |
Photos | Any photos you’ve uploaded to your account. | Downloaded Info |
Physical Tokens | Badges you’ve added to your account. | Expanded Archive |
Pokes | A list of who’s poked you and who you’ve poked. | Expanded Archive |
Political Views | Any information you added to Political Views in the About section of timeline. | Downloaded Info |
Your Posts | Anything you posted to your own timeline, like photos, videos and status updates. | Activity Log |
Posts by Others | Anything you posted to someone else’s timeline (profile), like photos, videos and status updates. | Activity Log |
Recent Activities | Actions you’ve taken and interactions you’ve recently had. | Activity Log |
Registration Date | The date you joined Facebook. | Activity Log |
Religious Views | The information you added to Religious Views in the About section of your timeline. | Downloaded Info |
Screen Names | The screen names you’ve added to your account, and the service they’re associated with. You can also see if they’re hidden or visible on your account. | Expanded Archive |
Searches | Searches you’ve made on Facebook. | Activity Log |
Spoken Languages | The languages you added to Spoken Languages in the About section of your timeline. | Expanded Archive |
Status Updates | Any status updates you’ve posted. | Activity Log |
Subscribers | A list of people who are subscribed to you. | Expanded Archive |
Subscriptions | A list of people you subscribe to. | Activity Log |
Tag Suggestions Template | A unique number based on a comparison of the photos you're tagged in. We use this template to help your friends tag you in the photos they upload. | Expanded Archive |
Work | Any information you’ve added to Work in the About section of your timeline. | Downloaded Info |
Videos | Videos you’ve posted. | Activity Log |
Reflections on CRASSH
In July I presented a paper at CRASSH in Cambridge. It was a small conference, but being in Europe it was good to see many of the various kinds of projects which are going on in Digital Humanities and Linguists, or also Cloud Computing and Linguistics. One particular project, TypeCraft, stands out as being rather well done and promising was presented by Dorothee Beermann Hellan. I think the ideas presented in this project are well thought out and seem to be well implemented. It would be nice to see this product integrated with some other linguistics and language documentation cloud offerings. i.e. Project Lego from the Linguist’s List or the Max Planck Institute’s LEXUS project. While TypeCraft does allow for round tripping of data with XML, what I am talking about is a consolidated User Experience for both professional linguists and for Minority language users.
A note on foundational technologies:
- It appears that Lexus is is built on BaseX with Cocoon and XML.
- The front page of TypeCraft has a very Wikipedia like feel, but this might not be the true foundational technology.
- Linguist’s List often does their work in ColdFusion and the LEGO project definitely has this feel about it.
Keyboard Design for Minority languages
This post is a open draft! It might be updated at any time… But was last updated on December 19, 2014 at 1:10 am.
Pre-Print Draft will not be available through this means, though there is a video of the presentation.
A. Meꞌphaa Text Sample
A̱ ngui̱nꞌ, tsáanꞌ ninimba̱ꞌlaꞌ ju̱ya̱á Jesús, ga̱ju̱ma̱ꞌlaꞌ rí phú gagi juwalaꞌ ído̱ rí nanújngalaꞌ awúun mbaꞌa inii gajmá. Numuu ndu̱ya̱á málaꞌ rí ído̱ rí na̱ꞌnga̱ꞌlaꞌ inuu gajmá, nasngájma ne̱ rí gakon rí jañii a̱kia̱nꞌlaꞌ ju̱ya̱á Ana̱ꞌlóꞌ, jamí naꞌne ne̱ rí ma̱wajún gúkuálaꞌ. I̱ndo̱ó máꞌ gíꞌmaa rí ma̱wajún gúkuálaꞌ xúgíí mbiꞌi, kajngó ma̱jráanꞌlaꞌ jamí ma̱ꞌne rí jañii a̱kia̱nꞌlaꞌ, asndo rí náxáꞌyóo nitháan rí jaꞌyoo ma̱nindxa̱ꞌlaꞌ. [I̱yi̱i̱ꞌ rí niꞌtháán Santiágo̱ 1:2-4]
B. Sochiapam Chinantec Text Sample
Hnoh² reh², ma³hiún¹³ hnoh² honh² lɨ³ua³ cáun² hi³ quiunh³² náh², quí¹ la³ cun³ hi³ má²ca³lɨ³ ñíh¹ hnoh² jáun² hi³ tɨ³ jlánh¹ bíh¹ re² lı̵́²tɨn² tsú² hi³ jmu³ juenh² tsı̵́³, nı̵́¹juáh³ zia³² hi³ cá² lau²³ ca³tɨ²¹ hi³ taunh³² tsú² jáun² ta²¹. Hi³ jáun² né³, chá¹ hnoh² cáun² honh², hi³ jáun² lı̵́¹³ lɨ³tɨn² hnoh² re² hi³ jmúh¹³ náh² juenh² honh², hi³ jáun² hnoh² lı̵́¹³ lı̵́n³ náh² tsá² má²hún¹ tsı̵́³, tsá² má²ca³hiá² ca³táunh³ ca³la³ tán¹ hián² cu³tí³, la³ cun³ tsá² tiá² hi³ lɨ³hniauh²³ hí¹ cáun² ñí¹con² yáh³. [Jacobo Jmu² Cáun² Sí² Hi³ Ca³tɨn¹ Tsá² *Judíos, Tsá² Má²tiáunh¹ Ñí¹ Hliáun³ 1:2-4]
C. Spanish Text Sample
Hermanos míos, gozaos profundamente cuando os halléis en diversas pruebas, sabiendo que la prueba de vuestra fe produce paciencia. Pero tenga la paciencia su obra completa, para que seáis perfectos y cabales, sin que os falte cosa alguna. [Santiago 1:2-4 Reina-Valera 1995 (RVR1995)]
D. English Text Sample
Dear brothers and sisters, when troubles come your way, consider it an opportunity for great joy. For you know that when your faith is tested, your endurance has a chance to grow. So let it grow, for when your endurance is fully developed, you will be perfect and complete, needing nothing. [James 1:2-4 New Living Translation (NLT 2007)]
Linking Minority Language Dictionaries to Open Data
What is the role of a dictionary?
Is the role of a dictionary to regulate or to standardize spelling? Is it to validate a speech variety as being real or a bon fide language? Or is it for documenting and establishing the relationships and connections between things (plants, animals, fish, spirits/gods, medicines, etc.) as they are emicly viewed, for connecting people via collaboration, or connecting related concepts and their classes together into documented sets? Or even connecting these things and relationships as they are viewed in one culture to the same things and relationships as they are viewed in another culture or more broadly cross-culturally? Continue reading
The Look of Language Development Websites
I have been thinking through some of the presentation issues for presenting SIL International’s work on the web. As part of this I have also been looking at other organizations which are part of the language documentation and minority language revitalization movement. I recently ran across several nicely done web sites.
National Geographic Genographic Project
Continue reading