Lexical Database Archiving Questionnaire

Featured

It's true!

I am asking around on different mailing lists to gain some insight into the archiving habits of linguists who use lexical databases. I am specifically interested in databases created by tools like FLEx, ToolBox, Lexus, TshwaneLex, etc.

Background Story Continue reading

Long Bike Rides

I'd like to do a Bike ride from Eugene, to Bend, OR via Willamette Pass on 58 and then return to Eugene via McKinnzie Pass on 126. 289 miles in total. Here is a map: https://goo.gl/maps/g6xWQCTHhrJ2 . The thing is I will need water. When I did my trip from Sisters to Eugene a few years ago water management was my issue during the ride (as well as fruit acquisition).

Here are some links for Hydration systems I am gathering information on:

Funding language documentation 

Just a quick thought.

Perception based loosely on facts:

A lot of language documentation money gets pushed towards endangered languages or languages with very few speakers. Is often endowed upon the aspiring academic, who may be promising to create a grammar for a previously un-written or undescribed language.

Sometimes I have the opportunity to read grammars. I read them and have questions about how the described data sounds. Both In context and as elicited. To that end I wonder if it wouldn't be money better spent for language documentation and benefit to the academy, if organizations funding language documentation research for the academy would rather fund the collection of audio texts and video texts of data already described in grammars. In a way provide the support that modern grammars should have.

That is, I find that often the state of grammars about languages (often about African languages) are so fraught with errors, or jaded with theoretical disposition, that it would be immensely helpful if these grammars were supported with audio texts. It seems that the focus on small, often dying, languages, requiring an impetus of "adequate" endangerment for funding, shows a pre-disposition to try and collect specimens of some exotic language. While the collection of rare specimens is good in some sense, it is not always the most gentrifying for the language speakers, nor is it really the most helpful for academic pursuits.

Awesome Hat…

Well, I like hats... they keep my head warm and sunburnt free. A month or two ago I got a hat for riding my bike in the winter. I got the hat from REI, but of course there are other places where one can get similar hats. My wife likes the hat (on me), Katja like the hat (on her), and I like the hat even when I am not on the bike. Evidently I am not the only one who likes these hats either.Some call the hat style a swrve Belgian Wool Cap, but all I knew was that it was highly functional and stylish.

brimbini from REI

brimbini from REI

About two weeks ago a friend, who is also a biker (of the human powered kind), asked where I got the hat from... that got me thinking: How hard would it be to make one of these hats? I should try and sew one sometime, it only took this lady five tries
Me holding my newborn nephew while wearing my swrve hat

Me holding my newborn nephew while wearing my swrve hat.

Creative commons in U.S. Government

I am a big advocate of creative commons. I think it makes a lot of sense for a lot of reasons. One arena I have been watching the growing use of Creative Commons licenses is in the U.S. Government. I am particularly interested in the issue of over licensing. That is, my understanding is that the Federal government can not be a copyright holder unless someone else created the work and then gave the work to the US Government, and that items (creative works and intellectual property) created by the government can not be copyrighted, such content is by law supposed to be in the public domain. Therefore, when a government (in this case the U.S. Government) produces content and licenses the content under creative commons, doesn't that mean that they must copyright the material and then release the material under license? The following website talks about data - government data, and how that is legally supposed to be open. https://theunitedstates.io/licensing/. (And Ben Balter gives some really clear suggestions here: http://ben.balter.com/2014/10/08/open-source-licensing-for-government-attorneys/.) There are certain rights reserved, like the use of logos. In short I am a bit confused then by moves in the Department of Labor and the Department of Education where the CC-BY license is adapted:

Is this just saying that if I create something with money from the Federal Government then that work needs to also be CC-BY?

The Creative Commons wiki currently says about the US Government:

Federal

Works by the US federal government are automatically part of the public domain in the US as stipulated by http://www.copyright.gov/title17/92chap1.html#105
Third-party content (such as the text of speeches by the first lady) on the White House web site are licensed with CC BY 3.0 US by default.
President-Elect Transition Team, Barack Obama and Joseph Biden. CC BY 3.0 Unported. (Not an official federal government site, but an election team site, hence not required to be public domain.)
The U.S. Department of Education has made OER an invitational priority in their Ready to Learn (PDF) and Ready to Teach (PDF) grants.
The U.S. Department of Education has included open educational resources in their Notice of Proposed Priorities for discretionary grant funding. Essentially, if the priorities are adopted, it could mean that grant seekers who include open educational resources as a component of an application for funding from the Department of Education could receive priority.
The U.S. Department of Labor and Department of Education commit $2 billion to community colleges and career training; CC BY required for grant outputs.
The U.S. Department of Labor Career Pathways Innovation Fund Grants Program; CC BY required for grant outputs.
U.S. Open Data Action plan is under CC0 + some federal datasets: report (pdf); blog post

State

New York State Senate, Senate Content, CC-BY-NC-ND with CC+ allowing non-political fundraising use of content.
State of Virginia, legislation that indicates a preference for state-funded materials to be released with a CC (or equivalent open) license.
Washington State open policy and requirement of CC BY
New Hampshire adopts Open Source and Open Data requirements (policy friendly to CC use, but not a specific CC tool adoption)
OER K-12 bill passed in WA state. The focus of the bill is to help school districts identify existing high-quality, free, openly licensed, common core state standards aligned resources available for local adoption; in addition, any content built with public funds, must be licensed under “an attribution license” (CC BY)
The city of Washington, D.C. has made available an unofficial copy of the DC Code under the CC0 Public Domain Dedication.

So, as a business person looking at the limitations of CC-BY and the DMCA. If I were a grant recipient from the department of labor, and I wanted to profit from the output of the grant, I could make all the output CC-By and then release that content via an app that I sell. Make the app with funds not from the grant and make the content only available via the app. Hacking the app would constitute Copyright infringement and would be enforceable via the DMCA.

Creative Commons does not solve the open access and permanent access guarantee problems.

The big chair

When I was little (like three years old) my parents got a lazy boy rocker.  I have many fond memories in that chair. First sitting next to my mom or my dad, and then on their lap because I had gotten bigger. Then on the arm because a sibling had taken the lap position. That chair left many lasting impressions. It was the place where I was read the books: the cross and the switchblade, brother andrew, and the silver chair. I would also read many of my own books in that chair.  

Today I went to a furniture store to look at table designs. They happens to have some lazy boy recliners.  None that felt the same as the one growing up. But it left me wondering… if I am going to have a chair like that in my house with my kids.  

oversize wide lazyboy chair

lazy boy chair with room for a little one

An Awesome list for Open Source Software

The world is full of problems. Some of these can be solved through the use of technology. Other problems can't be solved directly through the use of technology, but the deployment of technologies can impact social environments and social interactions in a way so that the problems not solvable directly though technology can be addressed.

Low-resource languages suffer from one of the problems of the second type. That is there is a sociological problem that follows briefly in the following way:

Feel-good, and do-good linguists often want to help "low-resourced" language community have digital tools in their language. This scenario is mirrored by the a different scenario, which may be contrasted with people "helping" from the outside. That is,people from within the low-resource language want to create tools for using their language - often in written form - in digital contexts. The result is that there are often a set of persons who are project managers, or who hold the business strings (access to grant funding, and are responsible for contracting with technologists to implement the project ideas and goals). The problem that occurs is that the more project managers there are (which might be more than one per language - with over 7,000 languages) the more divergent the technological solutions which are expensive and often not compatible or extensible - even if they are "open sourced".

Problem we are trying to solve is the communication problem between the plethora of coders which vary from cowboy soloists to dedicated shops working on language software targeted for use in Low-resourced communities, and the growing number of visionary project managers, who might have a background in linguistics, but often not have one in information technology or in information technology project management.

The benefits of collaboration, and reusability of code are obvious. However, there still stand a large gap between the project manager and the coding technologist. We find that this gap can be characterized by two critical problems:

What are the things which have been coded - and for what purpose are they coded?
Assuming that this data can be gathered, how can this data be quirky made usable so that project managers can use the information intelligently in their evolving relationships with their technical teams? In summary we must create a pile of data, and then we need to make it usable.
In the sprit of taking baby steps, we have started to amass a pile of data (asking the question - what do we know has been coded), we have started with a solution which is more native to coders than to project managers. We have used an element of Github culture - 'the Awesome list'.

While this does list does make a browse able list, it does not address the myriad of points of view which project managers come from. Synthesizing the data to match the various points of view; making the data relevant and usable is still an open task.