Lexical Database Archiving Questionnaire


It's true!

I am asking around on different mailing lists to gain some insight into the archiving habits of linguists who use lexical databases. I am specifically interested in databases created by tools like FLEx, ToolBox, Lexus, TshwaneLex, etc.

Background Story Continue reading

OER Links

A few weeks a go I put together a resource ("paper") outlining an economic strategy related to Open Educational Resources (OER) and mobile compatible resources. The purpose was to kickstart and provide ideas for the organization I work for to consider alternative models of information maintenance and dissemination. The following links are more or less my list of references which did not make into that paper.

Economically (in terms of information economy), the problem I see with CommonCore as it is implemented in the USA across grades 1-12, is that law and policy affect the kinds of resources being produced and subsequently also shared in these curriculum development co-op endeavors (OER). I think the impact is greater than originally anticipated (or perhaps not, perhaps this is a foreign policy move affecting exports of knowledge). The indirect impact of CommonCore on the consumers of these OER materials, is that when people from other countries consume Open Education Resources, they are consuming CommonCore. Thankfully, there is a lot of OER work going on at the university level and outside of the scope of CommonCore.
Continue reading

Adding white space to PDF after cropping

I have a PDF that I would like to crop to text and then add consistent white space (margin). The PDF was generated by a Bookeye 4 scanner. Which exported the content straight to PDF. So, I am trying to do this with Adobe Acrobat 9.2. SIL Americas Area Publishing suggested that I use ScanTailor - An excellent program, but one which I find crashes on OS X.

Continue reading

Combination of Tips

Some days I am more clever than others. Today, I was working on digitizing about 50 older (30 years old) cassettes for a linguist. To organize the data I have need of creating a folder for each tape. Each folder needs to be sequentially numbered. It is a lot of tedious work - not something I enjoy.

So I looked up a few things in terminal to see if I could speed up the process. I needed to create a few folders so I looked up on hints MacWorld:

So I looked at the mkdir command, which creates new folders or directories. It uses the following syntax: mkdir folder1 folder2 folder3

Now I needed a list of the folders I needed... something like 50.

So I created a formula in a google spreadsheet using the Concatenate command. I was able in one column to add the Alpha characters I needed and in the next column I was able to add the sequential numerics I needed.

Now I had a list of 50 names of my folders, but I still needed to remove the return characters which separated them from each other to allow the mkdir command to work. So I opened up TextEdit and did a search for return tabs in the document and deleted them.

Now I could just paste the 50 folder names in terminal and hit enter and it created 50 folders... But I wonder if there was a way to add sequential numbers to a base folder-name in terminal without using google spreadsheets...

Client-Side Content Restrictions for Archives and Content Providers

Two times since the launch of the new SIL.org website colleagues of mine have contacted me about the new requirement on SIL.org to log-in before downloading content from the SIL Language and Culture Archive. Both know that I relate to the website implementation team. I feel as if they expect me to be able to speak into this situation (as if I even have this sort of power) - I only work with the team in a loose affiliation (from a different sub-group within SIL), I don't make design decisions, social impact decisions, or negotiate the politics of content distribution.

However, I think there are some real concerns by web-users users about being required to log-in prior to downloading, and some real considerations which are not being realized by web-users.

I want to reply to these concernes.

Continue reading

Audio Dominant Texts and Text Dominant Audio

As linguistics and language documentation interface with digital humanities there has been a lot of effort to time-align texts and audio/video materials. At one level this is rather trivial to do and has the backing of comercial media processes like subtitles in movies. However, at another level this task is often done in XML for every project (digital corpus curation) slightly differently. At the macro-scale the argument is that if the annotation of the audio is in XML and someone wants to do something else with it, then they can just convert the XML to whatever schema they desire. This is true.

However, one antidotal point that I have not heard in discussion of time aligned texts is specifications for Audio Dominant Text vs. Text Dominant Audio. This may not initially seem very important, so let me explain what I mean.
Continue reading


I have been working on describing the FLEx software eco-system (for both a blog post and an info-graphic). In the process I googled "language documentation" workflow and was promptly directed to resources created for InField and aggregated via ctldc.org. An amazing set of resources. the ctldc.org website is well put together and the content from InField 2010 and 2008 is amazing - I which I could have been there. I am almost convinced that most SIL staff pursuing linguistic fieldwork should just go to InField... But it is true that InField seems to be targeted at someone who has had more than one semester of linguistics training.