Category Archives: Meta-data
The Data Management Space for Linguists
This week I have been outlining the types of data that linguists need to be able to use and relate to each other as they do Language Documentation and Linguistic Research. I try to express these things graphically and then also express where some of the leading tools which SIL International is offering sit in the problem space.
Useful or Not?
This post is a open draft! It might be updated at any time... But was last updated on at .
The online version of the SIL Bibliography contains a subset of over 29,000 citations from the more than 40,000 publications representing 75 years of SIL International's language research in over 2,700 languages. [1] SIL Bibliography Online. April 2012 version. SIL International on Ethnologe.com. http://www.ethnologue.com/bibliography.asp [Accessed: 21 August 2012] [Link]
Finding Resources through SIL.org's (as of 2 August 2012) Bibliography can be a challenge at times - Maybe even a time-wasting endeavor. Time wasting because it might not be very useful to consult the online Bibliography.
The challenging aspect which affects usefulness is primarily three fold:
- Items known by SIL to have been created by SIL staff may or may not be listed. (The on-line Bibliography is a sub-set.)
- Items listed in the Bibilography may or may not have digitally accessible resources.
- Items created by SIL staff may or may not be in the bibliography because they have not been submitted to the Language and Culture Archive (managing division of the SIL Bibliography).
The Citation Problem
In a team framework where there are several members of a research team and the job requirements call for the sharing of bibliographic data (of materials referenced) as well as the actual resources being referenced. In this environment there needs to be a central repository for sharing both kinds of data. This is true for small localized (geographically) groups as well as large distributed research teams. New researchers joining a existing team need to be able to “plug-in” to existing foundational work on the project and be able to access bibliographic data as well as the resources those bibliographic details point to. It is my point here to outline some of the current challenges involved in trying to overcoming the collaborative obstacle when working in the fields of Linguistics and Language Documentation [1]Nikolaus P. Himmelmann. 1998. Documentary and Descriptive Linguistics. Linguistics vol. 36:161-195. [PDF] [Accessed 24 Dec. 2010].This sentiment is echoed by many in the world of science. Here is someone on Zetero’s forums [INSERT LINK]. (Though Zetero does claim to combat some of these issues.)
Bibliographic Data v.s Citation Data
The role of relationships in an data centric industry
I once listend to a Creative Commons Salon titled: What Does it Mean to Be Open in a Data-Driven World? and in that discussion there was a great discussion on what it means to have data which flows and is openMinute 50 has a really interesting comment about sharing scientific data.
http://blip.tv/creative-commons/creative-commons-salon-mountain-view-what-does-it-mean-to-be-open-in-a-data-driven-world-4725230
Continue reading
iPhone geo-data
I have been playing around with data available from the iPhone (and also separately visualizing Map data).
I came across a project, iPhoneTracker which was done to show iPhone users the kind of data that the iPhone collects about a users travel and whereabouts. I downloaded the app and ran it. Looks like about a complete history since I activated the phone… The interesting thing for me was that this app did not collect the data from my phone directly but rather from my computer.
DOIs and URLs same or different?
A document’s DOI (http://www.doi.org/ or on Wikipedia under Digital Object Identifier) is an important part of the citation of a document [1] Chelsea Lee. 21 September 2009. A DOI Primer. APA Style Blog. http://blog.apastyle.org/apastyle/2009/09/a-doi-primer.html [Accessed: 10 April 2011] [Link] . Many style sheets allow for just the DOI of a paper as the citation. Because DOIs are unique they can act as URIs which are resolvable and look like URLs [2] Dion Almaer. 23 November 2007. URI vs. URL: What’s the difference?. Ajaxian. http://ajaxian.com/archives/uri-vs-url-whats-the-difference. [Accessed: 10 April 2012] [Link] . However, a DOI is different than a URL for where a digital object might be located. It might be well argued that a DOI should be tracked in the metadata schemes of archives which collect language and linguistic data.
Continue reading
References
↑1 | Chelsea Lee. 21 September 2009. A DOI Primer. APA Style Blog. http://blog.apastyle.org/apastyle/2009/09/a-doi-primer.html [Accessed: 10 April 2011] [Link] |
---|---|
↑2 | Dion Almaer. 23 November 2007. URI vs. URL: What’s the difference?. Ajaxian. http://ajaxian.com/archives/uri-vs-url-whats-the-difference. [Accessed: 10 April 2012] [Link] |
From Folksonomies to Taxonomies with Linguistic Metadata
This post is a open draft! It might be updated at any time... But was last updated on at .
Metadata is very important - Everyone agrees. However, there is some discussion when it comes to how to develop metadata and also how to ensure that the metadata is accurate. Taxonomies are limited vocabularies (a set number of items) where each term has a predefined definition. A folksonomy is a vocabulary where people, usually users of data, assign their own useful words or metadata to an item. Folksonomies are like taxonomies in that they are both sets but are unlike taxonomies in the sense that they are an open set where taxonomies are closed sets.
An example of a taxonomy might be the colors of a traffic light: Red, Yellow, and Green. If this were a folksonomy people might suggest also the colors of Amber, Orange, Blue-Green and Blue. These additional terms may be accurate to some viewers of traffic lights or in some cases but they do not fit the stereo-typical model for what are the colors of traffic lights.
Continue reading
Linking Data and SIL’s goal of Sharing what they know…
I have recently been introduced to Linked Data and to RDF. In my investigation, I have noticed that some have said that Linked Data and RDF is much like a solution without a problem (Defense against the claim).
However, the relationships between datasets and the data created by those data sets have been growing over the past few years.
I am being convinced that at some point there will be enough open data out there that there will be a tipping point where if your data is not shared in this method that app producers will not process your data (without significant extra charge in home-grown apps, or at all for externally produced data consuming apps). This means that the social significance of open and Linked Data in RDF will be more important than, more labor intensive proprietary data sets.
I was watching this video, where several web app and several mobile apps were developed and competed for a prize. What one can do with this data is incredible.
httpv://vimeo.com/25163082
I particularly like the app which tells you how long it takes someone in London to travel from point A to point B.
So where does this come into play with SIL International? Well, SIL is an NGO. NGO’s need engagement strategies. That is, Non-profits and NGOs operate to affect change. They have a compelling story, they tell the story and the hearers of the story are motivated to do some sort of action.
An engaged employee population is a strategic asset that enables organizations to inspire and mobilize their people to achieve specific business objectives. – http://engagementstrategies.com/
This has been the very nature of the Kony 2012 video and story. Their web presence is not about marketing, it is not about messaging, it is not about branding or color palettes. It is about engaging people to commit a certain set of activities. The Kony campaign’s entire web presence from the scripting of the youtube film to the design of their website is about getting people to commit to do and to carry out those suggested activities.
But how does this relate back to RDF and Linked Data? Well, if web apps and mobile apps are going to present data to users and work thought the presentation challenges of User Experience and User Interface in multiple locations and contexts. Then it becomes in the interest of NGOs as data providers to provide data which will affect users for their cause. Some NGO’s like SIL are very involved in content production. Consider the 40,000 plus items in the SIL bibliography of academic and vernacular works produced over their 75+ year history. These bits of content or resources are describable in RDF for data consumers. The obvious question is “Why”? That answer is simple: so that when others use Linked Data your resources are found and thereby promote awareness of your cause.
Let’s say that the organization, Invisible Children released 100,000 images of children who were carrying AK-47s and shooting their parents and were maimed or raped. Let’s also say that these images were also geo-tagged for the locations they were taken in. And that this metadata and these images were made available as Linked Data. Then, when global leaders in internet mapping technologies like Google, Wikipedia, and Yahoo! create web based applications which display Geo-Spacial content from Linked Data sources who’s content do you think is going to be displayed when someone is looking for pictures of Africa?
Read the BBC article here.RDF Ontologies for the Bible
I have been looking for RDF ontologies for describing Bible portions. Particularly so that I can reference sections of scripture like chapter and verses of the bible (in addition to sections of books of the bible like The Prophets or The New Testament). Does such an ontology already exist? I have found http://bibleontology.com but this does not seem to be deep enough. I have also found http://www.semanticbible.com/ but the ontologies offered here do not seem to fit the desired coverage.
Know of any other Bible Ontology projects?