Thoughts on file formats and file names in language documentation projects and archiving

I’ve written about some of these file issues before.

https://hugh.thejourneyler.org/2012/the-workflow-management-for-linguists/

https://hugh.thejourneyler.org/2012/the-data-management-space-for-linguists/

https://hugh.thejourneyler.org/2012/resources-for-digitizing-audio-as-part-of-archiving/

https://hugh.thejourneyler.org/2011/presentation-version-vs-archival-version-of-digital-audio-files/

Lexical Database Archiving Questionnaire

Featured

It's true!

I am asking around on different mailing lists to gain some insight into the archiving habits of linguists who use lexical databases. I am specifically interested in databases created by tools like FLEx, ToolBox, Lexus, TshwaneLex, etc.

Background Story Continue reading

Navigating Organizational Structure in SIL for the purpose of Archiving

Is what you say what you want really what you want?

I am involved in an operation which is tasked with digitizing content created by SIL staff in the Americas. All 80 or so years of history. The end goal is to make the items accessible and usable as widely as possible (there are a lot of factors which dictate how wide, wide truly is). Today I came across an item which was created at the end of 2008. It was "born digital" that is, it was created on a computer. As such it should not need to be scanned if the digital production file can be located. Unfortunately, this is not the only item in its class. There are quite a few items in the line up to be scanned which have been born digital in the last few years. It would help us to understand a little bit about the item in question to fully realize this scenario.

Here was the process for creating the item in Dec. 2008:

  1. Item was created in a .txt / xml environment.
  2. The text was flowed through a page layout process and put into a PDF.
  3. The PDF was taken to a printer and printed.
  4. A copy of the printout was presented to the Language and Culture Archive

So there should be a .txt/xml type file (valid archival format) for this item, and there should be a PDF for this item (also possibly an archival format). Neither of these files has been submitted to the archive at SIL International nor does the SIL Area archiving staff have a definitive recourse to acquire the file.

To understand some of the impact of this statement it is important to understand some of the corporate history and the corporate structure (with a hint of corporate culture).

SIL's history is as one organization, which started in Mexico. Through time the founders also started what might be best classified as sister organization with the same name in various countries. Again with the passage of time an organization was conceived which needed to support and in some ways "unify" the various sister organizations. This cover organization is known as SIL International. These management structures, or their vestiges still exist today. Though in recent times expatriate staff have been returning from working within host countries and overall staff counts have been in decline (particularly in the Americas). So as branches (these former sister companies) have folded, they have folded into a larger management structure called an Area. These branches retain a rather autonomous position (in management practice and in goal setting and policy), while being connected and dedicated at some level to the larger overarching stated goals of SIL International. Yet an individual might be underThis is not a universally understood concept. That is, the alternative perspectives Is an SIL staff person there for the needs of the company or is the company there to serve the individual? are still a disputed issue in the minds of many people serving with SIL. the administration of any of these administrative structures.

Administrative Structures of SIL

The levels of autonomy in the above diagram are illustrated by the solid line and the dotted line with more autonomous units further up the chart and separated by more solid lines. Aside from these basic structures there is the autonomy factor for the Areas. These areas operate on a semi-autonomous basis, from each other and from the organization known as SIL International.

This history has left the archiving practice in an interesting managerial arrangement. Former branches which have folded into the area are often called regions and are administered by a regional director. This might be illustrated by the following diagram.

Archive organization in Americas Area as of 2011-2012

An alternative organization method would be to organize around the content of the task. That is illustrated in the lower right of the above diagram by grouping all of the archivist together administratively and marketing their operations as a service. However, discussion of that sort of organizational change is beyond the scope of this post.

Current dilemma

As things stand currently though, the operational goal of this project is to make content accessible and usable to end users. More use cases are able to be solved if archivable formats are used and the objects collected are actually those same digitally created objects. However, managerial success on the project is measured by how many scans are made of products in the Americas Area's reach, rather than the quality of the items that the archive is able to put into the hands of end users. So for these items which were born digital, because we do not have a recourse to pursue the file we will scan the item. We will also then "clean up" the item and make it into .tiff files and a PDF (a sum of about 5 hours of work for every 100 pages). Now is the original digital item out of reach of our pursuit? Well, there is one more structure which is needed to be understood so that this can be fully realized.

Organizational structure of Manpower in SIL Americas Area

Organizational structure of Manpower in SIL Americas Area

In this diagram the area director has the mandate to secure all property belonging to the SIL organizational/business unit including intellectual property. This part of his responsibility has been delegated to one of his subordinates, the Support Services Director. The Support Services Directer manages the staff providing services to the Language Program People. But in the Americas Area, Language Program personnel are trained not to respond to persons who are not in their direct chain of supervisors. This means that the area Archive Coordinator has to coordinate with the Language Programs Director to get a request to the appropriate field person. It also means that the person working in the field is not responsible to archive their work (because this part of the mandate is viewed to be fulfilled by the archive coordinator).This leads to some interesting problems in terms of managing intellectual property. Intellectual property accountability and human resource accountability are not as highly ranked as financial accountability. These can be inherently difficult aspects of any business to manage, let alone a Not-for-Profit organization. It would be interesting if IP and HR resources could be evaluated like finances are by the ECFA. It would seem that in the SIL family of organizations that there is a corporate value/culture to not value intellectual property. In terms of market economy, intellectual property is generally not viewed as being monetizable. Therefore, the products containing the IP are also not worth more than the moment's task. This is possibly in part because the organization is a relationally motivated organization and not a data driven organization. There are several ways that this disjunct can be viewed. One of them is that there should be a data planThis data plan would include archiving, backup, and distribution. as part of the project plan before funding for the plan is provided. Additionally, a separate but related plan should be implemented to cover IP issues, copyright issues, and the licensing and use of data, and products. By pushing this to the project planing level it puts the burden on the project doers to meet the requirements for funding. This model is often used in European Union financed research projects. In 2011 the National Science Foundation in the U.S. also required a data management plan to be submitted with grants being applied for. It is interesting that SIL International's funders do not require this to be part of the project planning.

However, having a data management plan does not cover the above use case completely. The project did submit a physical object to the archive at one point. The problem here is the continued access to an ongoing project by services being performed in one part of the company to individuals in another part of the company. This is a management and service integration issue. Because there is a perception that management is too busy or that this is not a high enough priority for them to act on in a timely manner, then it costs the archiving department 5-6 man hours when all that might be needed is 10-20 minutes of email time. But being efficient, or providing a higher quality product which is more usable and has a smaller digital foot print does not come in the the matrix for evaluating results. Seems to me to be a process design FAIL.

Smart Lists and UI

Working in an archive, I deal with a lot of metadata. Some of this metadata is from controlled vocabularies. Sometimes they show up in lists. Some times these controlled vocabularies can be very large, like for the names of language where there are a limited amount of languages but the amount is just over 7,000. I like to keep an eye out for how websites optimized the options for users. FaceBook, has a pretty cool feature for narrowing down the list of possible family relationships someone has to you. i.e. a sibling could be a brother/sister, step-brother/step-sister, or a half-brother/half-sister. But if the sibling is male, it can only be a brother, step-brother, or a half-brother.

FaceBook narrows the logical selection down based on atributes of the person mentioned in the relationship.

meun with all the relationship options

All the relationship options.

That is if I select Becky, my wife, as an person to be in a relationship with me then FaceBook determines that based on her gender atribute that she can only be referenced by the female relationships.

Menu showing just some relationships

Menu showing just some relationships based on an atribute of the person referenced.