A document’s DOI (http://www.doi.org/ or on Wikipedia under Digital Object Identifier) is an important part of the citation of a document. Many style sheets allow for just the DOI of a paper as the citation. Because DOIs are unique they can act as URIs which are resolvable and look like URLs. However, a DOI is different than a URL for where a digital object might be located. It might be well argued that a DOI should be tracked in the metadata schemes of archives which collect language and linguistic data.
Continue reading
Category Archives: Access
Presenting Audio and Video on the Web
I have been trying to find out what is the best way to present audio on the web. This led me to look at how to present video too. I do not have any conclusions on the matter. But I have been looking at HTML5 and not using javascript or Flash. Because my platform (CMS) is WordPress, Continue reading
Linking Data and SIL’s goal of Sharing what they know…
I have recently been introduced to Linked Data and to RDF. In my investigation, I have noticed that some have said that Linked Data and RDF is much like a solution without a problem (Defense against the claim).
However, the relationships between datasets and the data created by those data sets have been growing over the past few years.

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ CC-BY-SA
I am being convinced that at some point there will be enough open data out there that there will be a tipping point where if your data is not shared in this method that app producers will not process your data (without significant extra charge in home-grown apps, or at all for externally produced data consuming apps). This means that the social significance of open and Linked Data in RDF will be more important than, more labor intensive proprietary data sets.
I was watching this video, where several web app and several mobile apps were developed and competed for a prize. What one can do with this data is incredible.
httpv://vimeo.com/25163082
I particularly like the app which tells you how long it takes someone in London to travel from point A to point B.
So where does this come into play with SIL International? Well, SIL is an NGO. NGO’s need engagement strategies. That is, Non-profits and NGOs operate to affect change. They have a compelling story, they tell the story and the hearers of the story are motivated to do some sort of action.
An engaged employee population is a strategic asset that enables organizations to inspire and mobilize their people to achieve specific business objectives. – http://engagementstrategies.com/
This has been the very nature of the Kony 2012 video and story. Their web presence is not about marketing, it is not about messaging, it is not about branding or color palettes. It is about engaging people to commit a certain set of activities. The Kony campaign’s entire web presence from the scripting of the youtube film to the design of their website is about getting people to commit to do and to carry out those suggested activities.
But how does this relate back to RDF and Linked Data? Well, if web apps and mobile apps are going to present data to users and work thought the presentation challenges of User Experience and User Interface in multiple locations and contexts. Then it becomes in the interest of NGOs as data providers to provide data which will affect users for their cause. Some NGO’s like SIL are very involved in content production. Consider the 40,000 plus items in the SIL bibliography of academic and vernacular works produced over their 75+ year history. These bits of content or resources are describable in RDF for data consumers. The obvious question is “Why”? That answer is simple: so that when others use Linked Data your resources are found and thereby promote awareness of your cause.
Let’s say that the organization, Invisible Children released 100,000 images of children who were carrying AK-47s and shooting their parents and were maimed or raped. Let’s also say that these images were also geo-tagged for the locations they were taken in. And that this metadata and these images were made available as Linked Data. Then, when global leaders in internet mapping technologies like Google, Wikipedia, and Yahoo! create web based applications which display Geo-Spacial content from Linked Data sources who’s content do you think is going to be displayed when someone is looking for pictures of Africa?

Image from BBC article about Kony. The BBC caption reads: "Some South Sudanese have already taken up arms against Kony and the LRA"
Reviewing Webonary
This post is a open draft! It might be updated at any time… But was last updated on at .
In this reviewRegardless of the views expressed here in this review, it should be stated that I have high hopes for Webonary’s future. Some of the people working on Webonary are my colleagues so I attempt hedge my review with the understanding that this is not the final state of Webonary. I am excited that easy to use technology, like WordPress is being used, and that minority language groups around the world have the opportunity to use free software like webonary. I will be looking at the WordPress plugin, Webonary and several associated issues. Continue reading
GIAL Web structure
I was looking at the wikipedia article for Language Documentation. The only reference cited was a thesis by Debbie Chang. I happen to know Debbie. So I thought I would take a look at her thesis and see what she said. So I clicked the link and was delivered to a 404 error page on GIAL’s website. GIAL had recently renovated their website. I was able to locate thesis and fix the URL on wikipedia by digging through the GIAL website. The new URL is: http://www.gial.edu/images/theses/Chang_Debbie-thesis.pdf
But then I looked at the URL and asked: Why are PDFS in the images folder? What is the long term infrastructure for this school? It seems that when PDFs (thesis) are put into the images folder rather than into a digital repository that something is not quite right with the longterm planning for the school. Ironically, this is not too far from the main thrust of Debbie’s thesis.
It would seem that the long term solution for this kind of problem would be for a small school like GAIL to A. have its library develop an infrastructure for permanently housing these kinds of materials. Or B. contract with another organization or archive which could take care of these sorts of issues for them, provide handles or stable URLs, and then for GIAL to link to the permanent location of these items from GIAL’s website. It is interesting to note that on the same campus as GIAL is SIL International’s Language and Culture Archive, yet GIAL has not taken advantage of this opportunity.
World Map Navigation
For one of the web projects I am working in we have been throwing around the idea of having a world map as a navigation element. Each country would then be clickable. This kind of navigation has been done with hyperlinked bitmaps like the LL-Map project.
Or with flash like the Joshua project. I have not seen any implementations in HTML5 canvas or in SVG. It occurs to me that these technologies could be used. I am not deeply familiar with either technology. So I did some googling.I found some interesting articles on the matter.
- Performance of SVG vs. Canvas
- How to Choose Between Canvas and SVG
- SVG or Canvas? Сhoosing between the two
- CanVG: Using Canvas to render SVG files
I am not sure that I have any answers but this is my thought towards the problem space.
There is one map of languages I have found which deserves to be mentioned. I am not sure of the technology used but it seems it would be either of these methods. It is the map of the Languages of California hosted at Berkeley.
OAI-PMH for WordPress
Umm frankly, I am not sure anything out there right now is going to work to bring OAI-PMH services to WordPressConsider these three resources for more info on OAI:
- Main Technical Ideas of OAI-PMH
- Specification for an OAI Static Repository and an OAI Static Repository Gateway
- OAI-PMH Metadata Exchange
. If it does then is it going to be able to use WordPress to advertise things or is it going to use WordPress to aggregate things? if the former then nothing out there ever let the admin user choose which fields were matched to which attributes, dynamically. But if it is also the former then why would anyone actually want this functionality? What is the Use Case? If one is using WordPress as a bibliography reference system like some libraries do, then this makes a lot of sense. However, there is another use case I would like to present. That is, the website which is about several or a single language. There are potentially two ways to conceptualize this:
- If there were a website based on WordPress which was a dictionary website then the whole website might be considered a resource on a language. An example of this might be the use of SIL’s Webonary Plugin for WordPress and the Cherokee Language Project’s Dictionary.
- If there were a website presenting materials on several languages and each page was a resource on a single language then that would be a different use case. This would be more like what the Survey of California and
Other Indian Languages does or what the Central Institute of Indian Languages does.
Existing Foundation
- COinS-PMH (unAPI) WordPress Plugin (2005)
- Peter Binkley tagged blog posts for OAI.
- unAPI Server for WordPress.
- WordPress, now with added unAPI!
- New OAI-PMH metadata format (It was an update).
I think there is a second question here too: why does one need OAI-PMH for wordpress… is it as a provider or as a consumer? If one needs a PHP app for OAI-PMH maybe they can use: https://github.com/caseyamcl/phpoaipmh
Presenting Research on the Web
I have been Looking at different ways to make SIL’s digital research content more interactive, findable, and usable. Today I found http://research.microsoft.com/en-us/. It is interesting how they approach the facets of Location, Projects, Publications, and People up in the right hand corner. I think they did a good job. The site feels like it is balanced.
Ethnologue: the linguistic straw-man
The Ethnologue as an academic book, is somewhat of a straw man in linguistics. Many people who write grants for language documentation projects (generally on under described or endangered languages) will cite the Ethnologue and some other resources or lack of resources . These efforts seeking funding are usually an effort to get more language data. The rationale for this is two fold:
- Because so little is known that we do not know if the Ethnologue is correct.
- Because there is a conflict between other published sources and the Ethnologue .
Navigating Organizational Structure in SIL for the purpose of Archiving
Is what you say what you want really what you want?
I am involved in an operation which is tasked with digitizing content created by SIL staff in the Americas. All 80 or so years of history. The end goal is to make the items accessible and usable as widely as possible (there are a lot of factors which dictate how wide, wide truly is). Today I came across an item which was created at the end of 2008. It was "born digital" that is, it was created on a computer. As such it should not need to be scanned if the digital production file can be located. Unfortunately, this is not the only item in its class. There are quite a few items in the line up to be scanned which have been born digital in the last few years. It would help us to understand a little bit about the item in question to fully realize this scenario.
Here was the process for creating the item in Dec. 2008:
- Item was created in a .txt / xml environment.
- The text was flowed through a page layout process and put into a PDF.
- The PDF was taken to a printer and printed.
- A copy of the printout was presented to the Language and Culture Archive
So there should be a .txt/xml type file (valid archival format) for this item, and there should be a PDF for this item (also possibly an archival format). Neither of these files has been submitted to the archive at SIL International nor does the SIL Area archiving staff have a definitive recourse to acquire the file.
To understand some of the impact of this statement it is important to understand some of the corporate history and the corporate structure (with a hint of corporate culture).
SIL's history is as one organization, which started in Mexico. Through time the founders also started what might be best classified as sister organization with the same name in various countries. Again with the passage of time an organization was conceived which needed to support and in some ways "unify" the various sister organizations. This cover organization is known as SIL International. These management structures, or their vestiges still exist today. Though in recent times expatriate staff have been returning from working within host countries and overall staff counts have been in decline (particularly in the Americas). So as branches (these former sister companies) have folded, they have folded into a larger management structure called an Area. These branches retain a rather autonomous position (in management practice and in goal setting and policy), while being connected and dedicated at some level to the larger overarching stated goals of SIL International. Yet an individual might be underThis is not a universally understood concept. That is, the alternative perspectives Is an SIL staff person there for the needs of the company or is the company there to serve the individual? are still a disputed issue in the minds of many people serving with SIL. the administration of any of these administrative structures.
This history has left the archiving practice in an interesting managerial arrangement. Former branches which have folded into the area are often called regions and are administered by a regional director. This might be illustrated by the following diagram.
An alternative organization method would be to organize around the content of the task. That is illustrated in the lower right of the above diagram by grouping all of the archivist together administratively and marketing their operations as a service. However, discussion of that sort of organizational change is beyond the scope of this post.
Current dilemma
As things stand currently though, the operational goal of this project is to make content accessible and usable to end users. More use cases are able to be solved if archivable formats are used and the objects collected are actually those same digitally created objects. However, managerial success on the project is measured by how many scans are made of products in the Americas Area's reach, rather than the quality of the items that the archive is able to put into the hands of end users. So for these items which were born digital, because we do not have a recourse to pursue the file we will scan the item. We will also then "clean up" the item and make it into .tiff files and a PDF (a sum of about 5 hours of work for every 100 pages). Now is the original digital item out of reach of our pursuit? Well, there is one more structure which is needed to be understood so that this can be fully realized.
In this diagram the area director has the mandate to secure all property belonging to the SIL organizational/business unit including intellectual property. This part of his responsibility has been delegated to one of his subordinates, the Support Services Director. The Support Services Directer manages the staff providing services to the Language Program People. But in the Americas Area, Language Program personnel are trained not to respond to persons who are not in their direct chain of supervisors. This means that the area Archive Coordinator has to coordinate with the Language Programs Director to get a request to the appropriate field person. It also means that the person working in the field is not responsible to archive their work (because this part of the mandate is viewed to be fulfilled by the archive coordinator).This leads to some interesting problems in terms of managing intellectual property. Intellectual property accountability and human resource accountability are not as highly ranked as financial accountability. These can be inherently difficult aspects of any business to manage, let alone a Not-for-Profit organization. It would be interesting if IP and HR resources could be evaluated like finances are by the ECFA. It would seem that in the SIL family of organizations that there is a corporate value/culture to not value intellectual property. In terms of market economy, intellectual property is generally not viewed as being monetizable. Therefore, the products containing the IP are also not worth more than the moment's task. This is possibly in part because the organization is a relationally motivated organization and not a data driven organization. There are several ways that this disjunct can be viewed. One of them is that there should be a data planThis data plan would include archiving, backup, and distribution. as part of the project plan before funding for the plan is provided. Additionally, a separate but related plan should be implemented to cover IP issues, copyright issues, and the licensing and use of data, and products. By pushing this to the project planing level it puts the burden on the project doers to meet the requirements for funding. This model is often used in European Union financed research projects. In 2011 the National Science Foundation in the U.S. also required a data management plan to be submitted with grants being applied for. It is interesting that SIL International's funders do not require this to be part of the project planning.
However, having a data management plan does not cover the above use case completely. The project did submit a physical object to the archive at one point. The problem here is the continued access to an ongoing project by services being performed in one part of the company to individuals in another part of the company. This is a management and service integration issue. Because there is a perception that management is too busy or that this is not a high enough priority for them to act on in a timely manner, then it costs the archiving department 5-6 man hours when all that might be needed is 10-20 minutes of email time. But being efficient, or providing a higher quality product which is more usable and has a smaller digital foot print does not come in the the matrix for evaluating results. Seems to me to be a process design FAIL.