I am asking around on different mailing lists to gain some insight into the archiving habits of linguists who use lexical databases. I am specifically interested in databases created by tools like FLEx, ToolBox, Lexus, TshwaneLex, etc.
In this post I take a look at some of the software needs of a language documentation team. One of my ongoing concerns of linguistic software development teams (like SIL International's Palaso or LSDev, or MPI's archive software group, or a host of other niche software products adapted from main stream open-source projects) is the approach they take in communicating how to use the various elements of their software together to create useful workflows for linguists participating in field research on minority languages. Many of these software development teams do not take the approach that potential software users coming to their website want to be oriented to how these software solutions work together to solve specific problems in the language documentation problem space. Now, it is true that every language documentation program is different and will have different goals and outputs, but many of these goals are the same across projects. New users to software want to know top level organizational assumptions made by software developers. That is, they want to evaluate how software will work in a given scenario (problem space) and to understand and make informed decisions based on the eco-system that the software will lead them into. This is not too unlike users asking which is better Android or iPhone, and then deciding what works not just with a given device but where they will buy their music, their digital books, and how they will get those digital assets to a new device, when the phone they are about to buy no-longer serves them. These digital consequences are not in the mind of every consumer... but they are nonetheless real consequences. Continue reading →
This week I have been outlining the types of data that linguists need to be able to use and relate to each other as they do Language Documentation and Linguistic Research. I try to express these things graphically and then also express where some of the leading tools which SIL International is offering sit in the problem space.
The Data Management Space for linguists with SIL software.
This post is a open draft! It might be updated at any time… But was last updated on < ?php the_modified_date() ?> at < ?php the_modified_time()?>.
In this reviewRegardless of the views expressed here in this review, it should be stated that I have high hopes for Webonary’s future. Some of the people working on Webonary are my colleagues so I attempt hedge my review with the understanding that this is not the final state of Webonary. I am excited that easy to use technology, like WordPress is being used, and that minority language groups around the world have the opportunity to use free software like webonary. I will be looking at the WordPress plugin, Webonary and several associated issues. Continue reading →
In October, Becky and I were invited to present FLEx at the Universiti of Malaysia, Sabah as part of a workshop for compiling native dictionaries and managing cultural data. I learned a lot about dictionaries, about using FLEx to organize dictionary data, about Webonary and about Malaysia.
One of the things this workshop helped me to clearly articulate was that there are four knowledge content areas which dictionary creators need:
Knowledge about Theoretical Linguistics to understand the language being described and the categories possible in the dictionary.
Knowledge about the language being analyzed and described so that they can apply the appropriate options available to this situation.
Knowledge about how to manage the editorial process for the dictionary (including entry submission).
Knowledge about how to use the software to implement the editorial process.
This workshop’s focus was only on the software used to implement the editorial process (mostly the data collection part of the editorial process). So in some ways it felt like we weren’t giving the participants all the tools they will need (or even showing them all the tools they will need). But we had to realize that it is not our responsibility to give them all the tools they need or to expose them to these issues. They need local contacts for that. Regardless of these issue we were still ecstatic that there were about 80 people in attendance.
About 80 people
Opening Cerimonies at UMS
Becky took most of the sessions on FLEx. She presented on using FLEx as a tool for collecting words and various things about words. We covered several input methods and features in the application.
Becky talking about FLEx as a tool
Becky helping people doing exercises
I presented a session on explaining how to get data out of FLEx. We talked about putting dictionary data on the web and turning it into .epub files.
Hugh presenting on getting things out of FLEx
I think one of the more interesting things that I learned was about expectations, culture and photographs.
Many people wanted photographs with us (or of us). This is not totally unexpected. What was unexpected was that rather than taking one photo and sharing it (passing it around), everyone wanted their own picture. Not their own picture with us but a picture with us made with their own camera! It was in that moment that I had an epiphany. Having training in Language Documentation I am aware and concerned with rules and laws concerning privacy. In the U.S. when dealing with issues of informed consent and intellectual property, it can not be assumed that if I want to take a picture of you that I, the owner of the camera, own the picture. Furthermore it can not be assumed that I have the right to do with that picture as I please. i.e. Post it to the internet. This may be in part that our laws are based on our semantics. It may be in part our culture. But there I realized that if the photo is taken with your camera you own the photo. You can do with it as you please. The asking for permission is that you have asked for permission to take the photo.
Taking our picture
Taking their picture, while they were taking a picture of us. Since he who owns the camera, owns the picture...
I took this last picture at about the same time I had the epiphany.
The diagram above roughly illustrates our network setup. This set-up might be typologically rare in terms of language documentation field stations for several reasons. But we had reasonable power (both in quality and quantity), though there were some power outages. And we had high-speed internet.
In terms of network set up there was the need for an internet direct out, so that we could have a team network, and then a separate network for language consultants, who would bring their own computers to have a “drop box with us”. To fill this need we could open our network to each of the consultants or we could use an outside service like Dropbox. – I am not sure why we did not use DropBox. Eventually we did use google spread sheets for collection word frames. Our consultants might have been atypical in that they also had their own computers and had some familiarity with computer use.
Single FLEx Datastore for all languages
MicrosoftSQL Server for running FLEx on the Network. This is achieved through running XP in a virtual machine via Virtualbox on the OSX Server. We have multi-able entry points of data to the “FLEx System”. We also did not completely solve the network access to the data bases. That is one person could access the database at a time with write access. Since this project the current version of FLEx has moved from a MicrosoftSQL Server Backend to an XML backend. But perhaps what would have been better was to use FLExBridge or LiftBridge.
Server and data store Backup
Best practice for backup calls for a three way backup plan.
An onsite backup.
An “across town” backup. Where a (at least weekly) backup is held by a friend or colleague across town.
And an out of country back-up.
This three way backup is to:
Protect from mistakes or equipment failure.
Protect from theft.
Protect from catastrophic events.
Our onsite backup was handled by Time Machine.
We would switch out our Backup drive every week and give it to a colleague across town.
We attempted to use KKoncepts for our offsite backup. (KKoncepts did not work out because it was based on a simple rsync script and every time we tried to re-organize folders in our corpus it would try and re-sync all of the Gigabytes of data which lived under the folders.) The DropBox service is much more efficient and looks at the block level (inside the file) and only updates things that have changed. It then looks at the tree structure and mirrors what is currently on the clients computer, rather than re-uploading the content.
Not yet well defined are the network settings needed to run WindowsXP in the virtual machine, OS X, and Windows 7, establish a DNS server with AirPort Extreme.Note: Although the title/URL says “Multi-lingual” this is to be understood that multiple languages are being documented. The term poly-lingual also fits this particular project because the language of communication and authorship was Spanish, yet many of the network issues were resolved in English.