Lexical Data Management helps (with SIL software)

Posted on November 19, 2013 by Hugh Paterson III

This is a quick note to record some of the things I have learned this week about working with lexical data within SIL's software options.

There is information scattered all over the place:
- FLEx website: http://fieldworks.sil.org
  - Google Group:https://groups.google.com/forum/#!forum/flex-list
- Toolbox website: http://www-01.sil.org/computIng/toolbox/
  - Toolbox Google Grouphttps://groups.google.com/forum/#!forum/ShoeboxToolbox-Field-Linguists-Toolbox
- Webonary Website: http://webonary.org/
  And then on Webonary about Data transfer: http://webonary.org/data-transfer/
- Solid:http://solid.palaso.org/
- Wesay: http://wesay.palaso.org/
- LingTranSoft:
  - A redundancy of the FLEx Google group: http://tiki.lingtransoft.info/tiki-view_forum_thread.php?comments_parentId=27&topics_offset=1
  - Various introductions to FLEx: http://tiki.lingtransoft.info/Introduction+to+Flex?structure=Navmenu
- MDF documentation:http://www-01.sil.org/computing/shoebox/mdf.html including this PDF
- The LIFt format: https://code.google.com/p/lift-standard/
- LiftTweaker:http://projects.palaso.org/projects/show/lifttweaker
- LiftTools: http://downloads.palaso.org/LiftTools/
- xHtml expression of lift: http://pathway.sil.org/features/standards/dictionary-xhtml-proposed-standard/
What should the purpose of the websites be? to distribute the product or to build community around the product's existence?

Audio Dominant Texts and Text Dominant Audio

Posted on March 18, 2013 by Hugh Paterson III

As linguistics and language documentation interface with digital humanities there has been a lot of effort to time-align texts and audio/video materials. At one level this is rather trivial to do and has the backing of comercial media processes like subtitles in movies. However, at another level this task is often done in XML for every project (digital corpus curation) slightly differently. At the macro-scale the argument is that if the annotation of the audio is in XML and someone wants to do something else with it, then they can just convert the XML to whatever schema they desire. This is true.

However, one antidotal point that I have not heard in discussion of time aligned texts is specifications for Audio Dominant Text vs. Text Dominant Audio. This may not initially seem very important, so let me explain what I mean.
Continue reading →

to Hospital

Posted on December 9, 2012 by Hugh Paterson III

This interesting conversation took place on Facebook:

I wonder how it happened that in American English we say “…have to go to THE hospital”, but in British English they say, “…have to go to hospital”. – Trevor Lee Deck

When I say, “go to school”, it’s so general and it’s what everyone else is doing. But if I need to see my History teacher specifically I would say, “I’ll stop by the school to see him.” …but I can also choose to say, “I’ll stop by school to see him.”

Trevor Lee Deck Maybe British English speakers use hospital/the hospital the same way.

Lucy Baber: If we were writing it, it’s as if we would be saying, “I need go to go School” or “I’m going to Church”, like that’s the proper name of it. But then if we are stopping by the building for an informal purpose, it feels more like we are stopping by just the building and not the institution of it. Does that make sense? So in the case of the hospital, maybe we would say “I’m going to Hospital” if we were being admitted or having a procedure done, but I’m going to “the hospital” if we were visiting someone else or just picking up some results??

Elsen E. Portugal Yes, I always find that curious. . . . find us an answer, will you?

Trevor Lee Deck I like it, Lucy. But why do the British think of it differently than we Americans do? Because I don’t think an American English speaker would ever say “…to Hospital”.

Elsen E. Portugal Hmmmm, I’m wondering if perhaps the idea in the British mind is of an adjective with an understood noun, like: he is in hospital (care), in which case the article would be inappropriate. Plus, I think the establishment of ‘hospitals’ is younger than the colonization of the US. This divide probably split the meanings also, unlike school and church that have been terms used for much longer.

Trevor Lee Deck Elsen, I think you have the best answer yet. If they (even subconsciously) think of hospital as an adjective, then you’re right they’d never add a definite article. So this is a case of noun elision? Let’s think of another.

Josh Boyd or maybe a verb? like going to get schooled, going to do church, maybe going to the hospital is a phrase that implies the action being treated? or maybe I’m just really hung over and only think I make sense…

Jennifer Mann I say THE hospital… but then I am not pure British anymore, so who knows what is real and what is not!

Hugh Paterson III The Brits are more dative and americans are more Indirect object oriented.

Trevor Lee Deck Thanks, Hugh. Good observation. But do you have a suggestion as to why this could be? Why didn’t we bring that with us. It’s only been a few generations…?

Hugh Paterson IIIwell, some say that the Brits have innovated since the U.S. Colonies were established, and that in some respects we (in the U.S.) hold the older forms or pronunciations. But in this case I think we have innovated (I think without proof) but German, another germanic language like English, has dative prepositions, and they behave the same way as the British English. In German the gender and case is also shown on the preposition. Theses ideas were there in Old English, and some in middle English and today show up in our pronominal system. But when English stopped using case, it became harder to tell a dative object from an indirect object. If we look at how languages move from overt marking to syntactic ordering then there might be some answers there. If I had to take a stab at it from a cognitive perspective; there is this idea of motion and in indo-eurpean it is expressed with the dative. And if I tell someone am going somewhere, that Where should be defined as a place of mutual understanding between the two interlocutors. Maybe not in that particular sentence, but in their common experience. So, definiteness as it functions in English is not needed… and we get English phrases like “I am going to school” (even in the U.S.). There is only one school which is salient between the parties of the conversation. But if I am a detective looking for a fugitive, and I say to some of my team “you check the station” and others of my team “you go to school”. – That second part doesn’t work because the school is not common to the experience of the interlocutors. So, some of this is the difference between how as a culture we understand common experience, some how we express and use the idea of definiteness. If we as interlocutors want to express a more tight knit relational closeness with our interlocutor we might refer to things in a manner which infers more common experience than what is actually a fact. – I reject the idea that Brits think of Hospital as an adjective.

The Workflow Management for Linguists

Image

Example of Linguistic Fieldwork workflow

Workflow Management for Linguistic session from some of Becky’s previous Field Methods materials

The Data Management Space for Linguists

Posted on October 10, 2012 by Hugh Paterson III

This week I have been outlining the types of data that linguists need to be able to use and relate to each other as they do Language Documentation and Linguistic Research. I try to express these things graphically and then also express where some of the leading tools which SIL International is offering sit in the problem space.

The Data Management Space for linguists with SIL software.

Leave Typology to the Typologists: I am a Linguist

Posted on September 13, 2012 by Hugh Paterson III

A User Experience look at Linguistic Archiving

In a recent paper Jeremy Nordmoe, a friend and colleague, states that:

Because most linguists archive documents infrequently, they will never be experts at doing so, nor will they be experts in the intricacies of metadata schemas. ^[1]Jeremy Nordmoe. 2011. Introducing RAMP: an application for packaging metadata and resources offline for submission to an institutional repository. In Proceedings of Workshop on Language Documentation … Continue reading

My initial reply is:

You are d@#n right! and it is because archives are not sexy enough!

Continue reading →

References[+]

References
↑1	Jeremy Nordmoe. 2011. Introducing RAMP: an application for packaging metadata and resources offline for submission to an institutional repository. In Proceedings of Workshop on Language Documentation & Archiving 18 November 2011 at SOAS, London. Edited by: David Nathan. p. 27-32. [Preprint PDF]

The Citation Problem

Posted on August 28, 2012 by Hugh Paterson III

In a team framework where there are several members of a research team and the job requirements call for the sharing of bibliographic data (of materials referenced) as well as the actual resources being referenced. In this environment there needs to be a central repository for sharing both kinds of data. This is true for small localized (geographically) groups as well as large distributed research teams. New researchers joining a existing team need to be able to “plug-in” to existing foundational work on the project and be able to access bibliographic data as well as the resources those bibliographic details point to. It is my point here to outline some of the current challenges involved in trying to overcoming the collaborative obstacle when working in the fields of Linguistics and Language Documentation ^[1]Nikolaus P. Himmelmann. 1998. Documentary and Descriptive Linguistics. Linguistics vol. 36:161-195. [PDF] [Accessed 24 Dec. 2010].This sentiment is echoed by many in the world of science. Here is someone on Zetero’s forums [INSERT LINK]. (Though Zetero does claim to combat some of these issues.)

Bibliographic Data v.s Citation Data

Continue reading →

References[+]

References
↑1	Nikolaus P. Himmelmann. 1998. Documentary and Descriptive Linguistics. Linguistics vol. 36:161-195. [PDF] [Accessed 24 Dec. 2010]

Keyboard Design for Minority languages

Posted on June 22, 2012 by Hugh Paterson III

This post is a open draft! It might be updated at any time… But was last updated on at .

Keyboards Virtual and Physical

Pre-Print Draft will not be available through this means, though there is a video of the presentation.

A. Meꞌphaa Text Sample

A̱ ngui̱nꞌ, tsáanꞌ ninimba̱ꞌlaꞌ ju̱ya̱á Jesús, ga̱ju̱ma̱ꞌlaꞌ rí phú gagi juwalaꞌ ído̱ rí nanújngalaꞌ awúun mbaꞌa inii gajmá. Numuu ndu̱ya̱á málaꞌ rí ído̱ rí na̱ꞌnga̱ꞌlaꞌ inuu gajmá, nasngájma ne̱ rí gakon rí jañii a̱kia̱nꞌlaꞌ ju̱ya̱á Ana̱ꞌlóꞌ, jamí naꞌne ne̱ rí ma̱wajún gúkuálaꞌ. I̱ndo̱ó máꞌ gíꞌmaa rí ma̱wajún gúkuálaꞌ xúgíí mbiꞌi, kajngó ma̱jráanꞌlaꞌ jamí ma̱ꞌne rí jañii a̱kia̱nꞌlaꞌ, asndo rí náxáꞌyóo nitháan rí jaꞌyoo ma̱nindxa̱ꞌlaꞌ. [I̱yi̱i̱ꞌ rí niꞌtháán Santiágo̱ 1:2-4]

B. Sochiapam Chinantec Text Sample

Hnoh² reh², ma³hiún¹³ hnoh² honh² lɨ³ua³ cáun² hi³ quiunh³² náh², quí¹ la³ cun³ hi³ má²ca³lɨ³ ñíh¹ hnoh² jáun² hi³ tɨ³ jlánh¹ bíh¹ re² lı̵́²tɨn² tsú² hi³ jmu³ juenh² tsı̵́³, nı̵́¹juáh³ zia³² hi³ cá² lau²³ ca³tɨ²¹ hi³ taunh³² tsú² jáun² ta²¹. Hi³ jáun² né³, chá¹ hnoh² cáun² honh², hi³ jáun² lı̵́¹³ lɨ³tɨn² hnoh² re² hi³ jmúh¹³ náh² juenh² honh², hi³ jáun² hnoh² lı̵́¹³ lı̵́n³ náh² tsá² má²hún¹ tsı̵́³, tsá² má²ca³hiá² ca³táunh³ ca³la³ tán¹ hián² cu³tí³, la³ cun³ tsá² tiá² hi³ lɨ³hniauh²³ hí¹ cáun² ñí¹con² yáh³. [Jacobo Jmu² Cáun² Sí² Hi³ Ca³tɨn¹ Tsá² *Judíos, Tsá² Má²tiáunh¹ Ñí¹ Hliáun³ 1:2-4]

C. Spanish Text Sample

Hermanos míos, gozaos profundamente cuando os halléis en diversas pruebas, sabiendo que la prueba de vuestra fe produce paciencia. Pero tenga la paciencia su obra completa, para que seáis perfectos y cabales, sin que os falte cosa alguna. [Santiago 1:2-4 Reina-Valera 1995 (RVR1995)]

D. English Text Sample

Dear brothers and sisters, when troubles come your way, consider it an opportunity for great joy. For you know that when your faith is tested, your endurance has a chance to grow. So let it grow, for when your endurance is fully developed, you will be perfect and complete, needing nothing. [James 1:2-4 New Living Translation (NLT 2007)]

Types of Linguistic Maps: The Mapping of linguistic Features and Researcher Interactivity

Posted on March 22, 2012 by Hugh Paterson III

A couple of years ago I had a chance meeting with a cartographer in North Dakota. It was interesting because he asked us (a group of linguists) What is a language or linguistic map? So, I grabbed a few examples and put them into a brief for him. This past January at the LSA meeting in Portland, Oregon, I had several interesting conversations with the folks at the LL-Map Project under Linguists’ List. It occurred to me that such a presentation of various kinds of language maps might be useful to a larger audience. So this will be a bit unpolished but should show a wide selection of language and linguistic based maps, and in the last section I will also talk a bit about interactive maps. Continue reading →

Ethnologue: the linguistic straw-man

Posted on February 21, 2012 by Hugh Paterson III

The Ethnologue ^[1] M. Paul Lewis. (ed.), 2009. Ethnologue: Languages of the World, 16th Edn. Dallas, Tex.: SIL International. as an academic book, is somewhat of a straw man in linguistics. Many people who write grants for language documentation projects (generally on under described or endangered languages) will cite the Ethnologue and some other resources or lack of resources ^[2] Steven A. Marlett. 2011. Documenting the Me’phaa genus. DEH-NEH fellowship proposal. http://www.neh.gov/grants/guidelines/pdf/DEL_NEH_Marlett.pdf. [PDF] [DEL Awards] [Accessed: 15 February 2011] ^[3] Sadaf Munshi. 2011. Archive of Annotated Burushaski Texts. NSF grant proposal. http://www.neh.gov/grants/guidelines/pdf/DEL_NSF_Munshi.pdf. [PDF] [DEL Awards] [Accessed: 15 February 2011] ^[4]Monica A. Macaulay. 2011. Potawatomi Documentation, Lexical Database, and Dictionary. NEH grant proposal. http://www.neh.gov/grants/guidelines/pdf/DEL_NEH_Macaulay.pdf. [PDF] [DEL Awards] [Accessed: … Continue reading . These efforts seeking funding are usually an effort to get more language data. The rationale for this is two fold:

Because so little is known that we do not know if the Ethnologue is correct.
Because there is a conflict between other published sources and the Ethnologue ^[5]Roger Blench. n.d. Introduction to the Temein languages http://www.rogerblench.info/Language/Nilo-Saharan/Eastern%20Sudanic/Temein%20cluster/Blench%20Temein%20language%20NM%20proceedings.pdf [PDF] … Continue reading .

Continue reading →

References[+]

References
↑1	M. Paul Lewis. (ed.), 2009. Ethnologue: Languages of the World, 16th Edn. Dallas, Tex.: SIL International.
↑2	Steven A. Marlett. 2011. Documenting the Me’phaa genus. DEH-NEH fellowship proposal. http://www.neh.gov/grants/guidelines/pdf/DEL_NEH_Marlett.pdf. [PDF] [DEL Awards] [Accessed: 15 February 2011]
↑3	Sadaf Munshi. 2011. Archive of Annotated Burushaski Texts. NSF grant proposal. http://www.neh.gov/grants/guidelines/pdf/DEL_NSF_Munshi.pdf. [PDF] [DEL Awards] [Accessed: 15 February 2011]
↑4	Monica A. Macaulay. 2011. Potawatomi Documentation, Lexical Database, and Dictionary. NEH grant proposal. http://www.neh.gov/grants/guidelines/pdf/DEL_NEH_Macaulay.pdf. [PDF] [DEL Awards] [Accessed: 15 February 2011]
↑5	Roger Blench. n.d. Introduction to the Temein languages http://www.rogerblench.info/Language/Nilo-Saharan/Eastern%20Sudanic/Temein%20cluster/Blench%20Temein%20language%20NM%20proceedings.pdf [PDF] [Accessed: 15 February 2011]

The Journeyler

A walk through: Life, Leadership, Linguistics, Language Documentation, WordPress, and OS X (and a bit of Marketing & Business Administration)

Tag Archives: Linguistics

Lexical Data Management helps (with SIL software)

Audio Dominant Texts and Text Dominant Audio

to Hospital

The Workflow Management for Linguists

Image

The Data Management Space for Linguists

Leave Typology to the Typologists: I am a Linguist

A User Experience look at Linguistic Archiving

The Citation Problem

Bibliographic Data v.s Citation Data

Keyboard Design for Minority languages

Types of Linguistic Maps: The Mapping of linguistic Features and Researcher Interactivity

Ethnologue: the linguistic straw-man