Funding language documentation 

Just a quick thought.

Perception based loosely on facts:

A lot of language documentation money gets pushed towards endangered languages or languages with very few speakers. Is often endowed upon the aspiring academic, who may be promising to create a grammar for a previously un-written or undescribed language.

Sometimes I have the opportunity to read grammars. I read them and have questions about how the described data sounds. Both In context and as elicited. To that end I wonder if it wouldn't be money better spent for language documentation and benefit to the academy, if organizations funding language documentation research for the academy would rather fund the collection of audio texts and video texts of data already described in grammars. In a way provide the support that modern grammars should have.

That is, I find that often the state of grammars about languages (often about African languages) are so fraught with errors, or jaded with theoretical disposition, that it would be immensely helpful if these grammars were supported with audio texts. It seems that the focus on small, often dying, languages, requiring an impetus of "adequate" endangerment for funding, shows a pre-disposition to try and collect specimens of some exotic language. While the collection of rare specimens is good in some sense, it is not always the most gentrifying for the language speakers, nor is it really the most helpful for academic pursuits.

An Awesome list for Open Source Software

The world is full of problems. Some of these can be solved through the use of technology. Other problems can't be solved directly through the use of technology, but the deployment of technologies can impact social environments and social interactions in a way so that the problems not solvable directly though technology can be addressed.

Low-resource languages suffer from one of the problems of the second type. That is there is a sociological problem that follows briefly in the following way:

Feel-good, and do-good linguists often want to help "low-resourced" language community have digital tools in their language. This scenario is mirrored by the a different scenario, which may be contrasted with people "helping" from the outside. That is,people from within the low-resource language want to create tools for using their language - often in written form - in digital contexts. The result is that there are often a set of persons who are project managers, or who hold the business strings (access to grant funding, and are responsible for contracting with technologists to implement the project ideas and goals). The problem that occurs is that the more project managers there are (which might be more than one per language - with over 7,000 languages) the more divergent the technological solutions which are expensive and often not compatible or extensible - even if they are "open sourced".

Problem we are trying to solve is the communication problem between the plethora of coders which vary from cowboy soloists to dedicated shops working on language software targeted for use in Low-resourced communities, and the growing number of visionary project managers, who might have a background in linguistics, but often not have one in information technology or in information technology project management.

The benefits of collaboration, and reusability of code are obvious. However, there still stand a large gap between the project manager and the coding technologist. We find that this gap can be characterized by two critical problems:

What are the things which have been coded - and for what purpose are they coded?
Assuming that this data can be gathered, how can this data be quirky made usable so that project managers can use the information intelligently in their evolving relationships with their technical teams? In summary we must create a pile of data, and then we need to make it usable.
In the sprit of taking baby steps, we have started to amass a pile of data (asking the question - what do we know has been coded), we have started with a solution which is more native to coders than to project managers. We have used an element of Github culture - 'the Awesome list'.

While this does list does make a browse able list, it does not address the myriad of points of view which project managers come from. Synthesizing the data to match the various points of view; making the data relevant and usable is still an open task.

“Biblical terms”

There are phrases in some bible translations which are sometimes referred by American Christians as "biblical terms". I "wonder" should our perspective be to hold these terms as "biblical terms" or should it be "in another culture they have an idiom…" (or in an older stage in our own culture, or in another culture that also used English). My point is that it seems that we intentionally or unintentionally elevate the language of the Bible without focusing on the culture in which the events and letters are sent. It seems that by taking this approach we decontextualize the original message. One inadvertent result of removing the cultural context is that it allows us to recontextualize the text in our own mental framework. Instead of looking at the message as it was conveyed from party "A" to party "B"along with the cultural abnormalities of the methodology used to convey that message.

My example is comes from sitting in church and hearing the preacher reference the following verse while explaining the phrase "he fell asleep".

And falling to his knees he cried out with a loud voice, "Lord, do not hold this sin against them." And when he had said this, he fell asleep. - Acts 7:60

Similarity by not understanding the context of the common culture in which the stories were generated it allows an errant contextual vacuum to form in our understanding of the original text. In the following verse what does "Son of Man" mean?

And he said, "Behold, I see the heavens opened, and the Son of Man standing at the right hand of God." - Acts 7:56

Images in the Free Culture Movement

I have been really encouraged by the availability of images which have been released under Creative Commons licenses.

While there are a lot of icon sets out there, here are some of my "go to" places.

  • The first place I usually go for free icons There is a growing community behind the endeavor and their management operations are being taken seriously.
  • A second place which I have found helpful is:
  • I have also found these images which are SVG for maps:

As an archivist, I wonder where will these icons go if they are just privately hosted? - Is there an archive for these things?

Excel, XML, and CSV

I never thought the day would come when I would say that I wished that I had a Windows version of MS Excel. I am simply aghast. But never-the-less I have been looking for an XML parsing solution for OS X and can not find one which is graphically oriented.

I want to move certain XML encoded content to my blog and the best way (that I can figure) to do this is to import CSV files (although there is a WordPress plugin for importing XML).

I want to be able to do this, but the Mac version of Excel does not do this:

I really want to drag and drop, but this tutorial makes it look easy-ish to do at the command line.

What am I Using this for? Well I would like to use it with itunes XML, Endnote XML, Bookpedia XML, BibTeXXML, SIL-OLAC data as XML, WorldCat Data as XML? Glotalogue data as XML.

Lexical Database Archiving Questionnaire


It's true!

I am asking around on different mailing lists to gain some insight into the archiving habits of linguists who use lexical databases. I am specifically interested in databases created by tools like FLEx, ToolBox, Lexus, TshwaneLex, etc.

Background Story Continue reading

OER Links

A few weeks a go I put together a resource ("paper") outlining an economic strategy related to Open Educational Resources (OER) and mobile compatible resources. The purpose was to kickstart and provide ideas for the organization I work for to consider alternative models of information maintenance and dissemination. The following links are more or less my list of references which did not make into that paper.

Economically (in terms of information economy), the problem I see with CommonCore as it is implemented in the USA across grades 1-12, is that law and policy affect the kinds of resources being produced and subsequently also shared in these curriculum development co-op endeavors (OER). I think the impact is greater than originally anticipated (or perhaps not, perhaps this is a foreign policy move affecting exports of knowledge). The indirect impact of CommonCore on the consumers of these OER materials, is that when people from other countries consume Open Education Resources, they are consuming CommonCore. Thankfully, there is a lot of OER work going on at the university level and outside of the scope of CommonCore.
Continue reading