Skip to primary content
Skip to secondary content

The Journeyler

A walk through: Life, Leadership, Linguistics, Language Documentation, WordPress, and OS X (and a bit of Marketing & Business Administration)

The Journeyler

Main menu

  • Home
  • CV/Resume
    • Hugh’s Curriculum Vitae
  • Photography
  • Location
    • Cartography
    • Geo-Tagging
    • GPS
  • Language Documentation
    • Linguistics
    • Digital Archival
  • Open Drafts
  • Archives

Category Archives: Meta-data

Post navigation

← Older posts
Newer posts →

Metadata and the Target Audience

Posted on March 31, 2012 by Hugh Paterson III
Reply

I have been reviewing applications for library, research and citation metadata. Things like RDF, METS, Dublin Core, .ris and BibTeX. In some ways these things are related – they are metadata. But in other ways they are different animals.

In my search I have found two very different classes of metadata schemes based on two different kinds of end users.

  1. End users who are machines (Metadata for interoperability or resource discovery).
  2. End users who are human.

End Users who are machines are usually concerned with the interoperability of metadata for search, storage, and advertisement. These kinds of systems usually are engineered to use metadata schemes like Dublin Core, MODS and METS. Often these systems are able to communicate high level metadata in generic categories.

However, End Users who are human are usually concerned with purposing the metadata in creative processes. And in general, desire to use and appropriate more specific elements of metadata. This is especially true with citation metadata. Students and researchers want to be able to build bibliographies with the data. Additionally, Many of the more detaied metadata elements, that is, overly detailed from a Dublin Core perspective (i.e. can include geo-location name, or a Latitude value or an Altitude value), could be classified as technical metadata according to the first listing below. However, technical metadata is especially relevant for users of audio objects and graphic objects (photos and moving picture objects).

Of those users looking to use metadata to construct bibliographies and citations, they are often looking for that metadata in the interchange formats of either BibTeX, Endnote XML or .ris. Of those users interested in finding things based on technical metadata, such as audio technicians, linguists, ethnographers, and ethnomusicologists, they are looking to use the metadata and the object it describes in a workflow. And in order to purpose that media object as they need to, those users need to make sure that the digital object fits their workflow criteria.

This discrepancy between Metadata for System to System transmission and Metadata for End Users creates a bit of a complext situation, in that delivery systems need to consider both sets of users.

Which information to record?

http://www.jiscdigitalmedia.ac.uk/audio/advice/metadata-and-audio-resources

Structured metadata is divided into four main categories that contain information which is defined by the schemas or extension schemas being used:

  1. Structural metadata. This is information about the structural relationship with other parent or family files and how the metadata relates to the file.
  2. Descriptive metadata. This is information about the content of the digital file. The information recorded here is more curatorial than technical, and is the primary portal for users to access your resource. Data including File name, creator, associated dates, description, summary, locations etc should be standardised using a interoperable schema such as Simple DC or MODS.
  3. Administrative metadata. This contains information about the analogue source material, the rights of the content and any preservation information. Information here provides support to the managerial team of the collection and researchers in organising and providing access to the resource. Information about rights, ownership and usage restrictions is also kept within the administrative metadata.
  4. Technical metadata. To make good use of the digital object data is required which describes the technical qualities of the physical and/or digital object. This includes information such as channel number, bit-depth, sampling rate, and the unique file identifier. AudioMD, is an XML based schema that has been designed primarily for this purpose. It is soon to be superseded by AES-X098, developed by the Audio Engineering Society, upon its formal release.

Though it is possible to separate out some finer grained metadata categories. Consider the differences from above and those below which were part of my post about Metadata for Socio-linguistic Corpora:

  • Descriptive meta-data: supports discovery, attribution and identification of resources created.
  • Administrative meta-data: supports management, preservation, and appropriate usage of resources created.
    • Technical: About the machinery used to create the resource and the technical aspects of the resource.
    • Use (meaning how one may use the objects) and Rights: Copyright, license and moral ownership of the items.
  • Structural meta-data: maintains relationships between the parts of complex, multi-part resources (Spanne 2008).
  • Situational: this is metadata which describes the events around the creation of the work. Asking questions about the social setting, or the precursory events. It follows ideas put forward by Bergqvist (2007).
  • Use metadata: metadata collected from or about the users themselves (e.g. user annotations, number of people accessing a particular resource)

In that post I also said:

I think it is only fair to point out to archivist and to librarians that linguists and language documenters do not see a difference between descriptive and non-descriptive metadata in their workflows. That is sometimes we want to search all the corpora by licenses or by a technical attribute. This elevates the these attributes to the function of discovery metadata. It does not remove the function of descriptive metadata from its role in finding things but it does functionally mean that the other metadata is also viable as discovery metadata.

Compare and match three

My goal here is to compare Doublin Core [http://www.feedforall.com/dublin-core.htm] with BibTeXThere is a nice cross-walk technology for bibTex resources in source-forge: http://bibtexml.sourceforge.net/details.html and with .ris.
“RIS” Format Documentation Adding a “Direct Export” Button to Your Web Page or Web Application

List of Mappings not .ris or Bibtex to DC but many other cross walks.

Posted in Access, Blogging, Citations, Digital Archival, Meta-data | Tagged .ris, BibTex, citations, Dublin core, metadata, opendraft | Leave a reply

From Folksonomies to Taxonomies with Linguistic Metadata

Posted on March 22, 2012 by Hugh Paterson III
Reply

This post is a open draft! It might be updated at any time... But was last updated on at .

Metadata is very important - Everyone agrees. However, there is some discussion when it comes to how to develop metadata and also how to ensure that the metadata is accurate. Taxonomies are limited vocabularies (a set number of items) where each term has a predefined definition. A folksonomy is a vocabulary where people, usually users of data, assign their own useful words or metadata to an item. Folksonomies are like taxonomies in that they are both sets but are unlike taxonomies in the sense that they are an open set where taxonomies are closed sets.

An example of a taxonomy might be the colors of a traffic light: Red, Yellow, and Green. If this were a folksonomy people might suggest also the colors of Amber, Orange, Blue-Green and Blue. These additional terms may be accurate to some viewers of traffic lights or in some cases but they do not fit the stereo-typical model for what are the colors of traffic lights.
Continue reading →

Posted in Digital Archival, Library, Linguistics, Meta-data, UI/UX | Tagged Folksonomy, Gold, metadata, opendraft, RDF, sil.org, Taxonomy | Leave a reply

Linking Data and SIL’s goal of Sharing what they know…

Posted on March 19, 2012 by Hugh Paterson III
Reply

I have recently been introduced to Linked Data and to RDF. In my investigation, I have noticed that some have said that Linked Data and RDF is much like a solution without a problem (Defense against the claim).

However, the relationships between datasets and the data created by those data sets have been growing over the past few years.

Linking Open Data cloud diagram

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ CC-BY-SA

I am being convinced that at some point there will be enough open data out there that there will be a tipping point where if your data is not shared in this method that app producers will not process your data (without significant extra charge in home-grown apps, or at all for externally produced data consuming apps). This means that the social significance of open and Linked Data in RDF will be more important than, more labor intensive proprietary data sets.

I was watching this video, where several web app and several mobile apps were developed and competed for a prize. What one can do with this data is incredible.

httpv://vimeo.com/25163082

I particularly like the app which tells you how long it takes someone in London to travel from point A to point B.

So where does this come into play with SIL International? Well, SIL is an NGO. NGO’s need engagement strategies. That is, Non-profits and NGOs operate to affect change. They have a compelling story, they tell the story and the hearers of the story are motivated to do some sort of action.

An engaged employee population is a strategic asset that enables organizations to inspire and mobilize their people to achieve specific business objectives. – http://engagementstrategies.com/

This has been the very nature of the Kony 2012 video and story. Their web presence is not about marketing, it is not about messaging, it is not about branding or color palettes. It is about engaging people to commit a certain set of activities. The Kony campaign’s entire web presence from the scripting of the youtube film to the design of their website is about getting people to commit to do and to carry out those suggested activities.

But how does this relate back to RDF and Linked Data? Well, if web apps and mobile apps are going to present data to users and work thought the presentation challenges of User Experience and User Interface in multiple locations and contexts. Then it becomes in the interest of NGOs as data providers to provide data which will affect users for their cause. Some NGO’s like SIL are very involved in content production. Consider the 40,000 plus items in the SIL bibliography of academic and vernacular works produced over their 75+ year history. These bits of content or resources are describable in RDF for data consumers. The obvious question is “Why”? That answer is simple: so that when others use Linked Data your resources are found and thereby promote awareness of your cause.

Let’s say that the organization, Invisible Children released 100,000 images of children who were carrying AK-47s and shooting their parents and were maimed or raped. Let’s also say that these images were also geo-tagged for the locations they were taken in. And that this metadata and these images were made available as Linked Data. Then, when global leaders in internet mapping technologies like Google, Wikipedia, and Yahoo! create web based applications which display Geo-Spacial content from Linked Data sources who’s content do you think is going to be displayed when someone is looking for pictures of Africa?

Some South Sudanese have already taken up arms against Kony

Image from BBC article about Kony. The BBC caption reads: "Some South Sudanese have already taken up arms against Kony and the LRA"

Read the BBC article here.

Posted in Access, Business, Digital Archival, Marketing, Meta-data, mobile web | Tagged Linked Data, OpenData, RDF, SIL International, sil.org | Leave a reply

RDF Ontologies for the Bible

Posted on March 15, 2012 by Hugh Paterson III
2

I have been looking for RDF ontologies for describing Bible portions. Particularly so that I can reference sections of scripture like chapter and verses of the bible (in addition to sections of books of the bible like The Prophets or The New Testament). Does such an ontology already exist? I have found http://bibleontology.com but this does not seem to be deep enough. I have also found http://www.semanticbible.com/ but the ontologies offered here do not seem to fit the desired coverage.

Know of any other Bible Ontology projects?

Posted in Digital Archival, Faith, Meta-data | Tagged Bible, Ontologies, RDF | 2 Replies

Reviewing Webonary

Posted on March 15, 2012 by Hugh Paterson III
Reply

This post is a open draft! It might be updated at any time… But was last updated on at .

Dictionary Wordle
In this reviewRegardless of the views expressed here in this review, it should be stated that I have high hopes for Webonary’s future. Some of the people working on Webonary are my colleagues so I attempt hedge my review with the understanding that this is not the final state of Webonary. I am excited that easy to use technology, like WordPress is being used, and that minority language groups around the world have the opportunity to use free software like webonary. I will be looking at the WordPress plugin, Webonary and several associated issues. Continue reading →

Posted in Access, Blogging, CMS, Digital Archival, Language Documentation, Linguistics, Meta-data, Opensource, WordPress | Tagged Bi-lingual Dictionary, Corpus linguistics, Dictionary, FLEx, Language Documentation, lexicography, opendraft, Plugin, UI, wordpress | Leave a reply

GIAL Web structure

Posted on March 14, 2012 by Hugh Paterson III
Reply

I was looking at the wikipedia article for Language Documentation. The only reference cited was a thesis by Debbie Chang. I happen to know Debbie. So I thought I would take a look at her thesis and see what she said. So I clicked the link and was delivered to a 404 error page on GIAL’s website. GIAL had recently renovated their website. I was able to locate thesis and fix the URL on wikipedia by digging through the GIAL website. The new URL is: http://www.gial.edu/images/theses/Chang_Debbie-thesis.pdf

But then I looked at the URL and asked: Why are PDFS in the images folder? What is the long term infrastructure for this school? It seems that when PDFs (thesis) are put into the images folder rather than into a digital repository that something is not quite right with the longterm planning for the school. Ironically, this is not too far from the main thrust of Debbie’s thesis.

It would seem that the long term solution for this kind of problem would be for a small school like GAIL to A. have its library develop an infrastructure for permanently housing these kinds of materials. Or B. contract with another organization or archive which could take care of these sorts of issues for them, provide handles or stable URLs, and then for GIAL to link to the permanent location of these items from GIAL’s website. It is interesting to note that on the same campus as GIAL is SIL International’s Language and Culture Archive, yet GIAL has not taken advantage of this opportunity.

Posted in Access, Citations, Digital Archival | Tagged Digital Archival, IA, UI, University Library, UX | Leave a reply

OAI-PMH for WordPress

Posted on March 6, 2012 by Hugh Paterson III
Reply

Umm frankly, I am not sure anything out there right now is going to work to bring OAI-PMH services to WordPressConsider these three resources for more info on OAI:

  • Main Technical Ideas of OAI-PMH
  • Specification for an OAI Static Repository and an OAI Static Repository Gateway
  • OAI-PMH Metadata Exchange

. If it does then is it going to be able to use WordPress to advertise things or is it going to use WordPress to aggregate things? if the former then nothing out there ever let the admin user choose which fields were matched to which attributes, dynamically. But if it is also the former then why would anyone actually want this functionality? What is the Use Case? If one is using WordPress as a bibliography reference system like some libraries do, then this makes a lot of sense. However, there is another use case I would like to present. That is, the website which is about several or a single language. There are potentially two ways to conceptualize this:

  1. If there were a website based on WordPress which was a dictionary website then the whole website might be considered a resource on a language. An example of this might be the use of SIL’s Webonary Plugin for WordPress and the Cherokee Language Project’s Dictionary.
  2. If there were a website presenting materials on several languages and each page was a resource on a single language then that would be a different use case. This would be more like what the Survey of California and
    Other Indian Languages
    does or what the Central Institute of Indian Languages does.

OAI WordPress models

Existing Foundation

  • COinS-PMH (unAPI) WordPress Plugin (2005)
  • Peter Binkley tagged blog posts for OAI.
  • unAPI Server for WordPress.
  • WordPress, now with added unAPI!
  • New OAI-PMH metadata format (It was an update).
Posted in Access, Citations, Digital Archival, Library, Meta-data, WordPress | Tagged metadata, OAI, OAI-PMH, wordpress | Leave a reply

Timeline of Communication

Posted on March 1, 2012 by Hugh Paterson III
1

In recent time there has been a lively discussion over several issues in the translation of the Bible between various denominational and church leaders and those conducting the translation. I am not aware of all the issues, nor all the details. However, my financial supporters and friends are very interested in this discussion. Many of them are coming to the conversation late in the discussion. They do not always start to observe the discussion from the beginning of the discussion. They usually get introduced in the middle, and they do not know enough of the context of the discussion to make heads or tails of the discussion.

The Bystanders

The Bystanders

In the end I lose credibility with my supporters if they are confused and their confusion goes unaddressed. So, I have a vested interest in explaining this conversation to my supporters and friends.Here is an example from 15 February 2012 (14:21CST) of the question I have had and the type of response I have given:

Hugh, I recognize you are not a spokesperson for Wycliffe but there is a lot of “buzz” right now of WBT ad SIL creating Bible versions that are less offensive to Muslims by taking out references to Jesus being the Son of God and to God as the Father. Do you know of this and what is your understanding of it?

My Reply:

Yes. I know a little bit about it. The issue has been brewing for the last 6-7 months. But I don’t know very much about the issue because I do not deal with that part of the world. I do work in External Communications. So my boss works with the people who are crafting the responses. There are several issues going on at the same time.

  1. Wycliffe as a corporation, and as a partner of the evangelical church has not been proactive in communicating the challenges in translation to the churches.
  2. The church has had an attitude of “support and forget”: until someone gets offended and then doesn’t know all the facts and comes at the issue with a particular theological (denominational) view.

To complicate the matter. SIL has been dragged into this media firestorm but has traditionally been silent on translation around the world and left that discussion to Wycliffe. But now SIL has had to respond. So this is new and virgin territory. SIL has said more on Bible translation in the last 6 months than it has in the last 15 years.

Neither Wycliffe nor SIL has taken the lead on explaining to onlookers to the discussion, what the whole discussion on a time line looks like or what the facts are. There are two sides in this discussion and both NGOs would do well to present the objections and the replies in a manner where onlookers could get all the facts. I do not even have a good grasp on this. But there is a lot of fear on the part of the NGOs that if they do this that they will reveal too much, because this is not an area of the world that either company publicizes that it works in. I think there are only like 9 translations in question. The only thing I have read about the issue was here: http://www.wycliffe.net/stories/tabid/67/Default.aspx?id=2408

My question has been if you use the analogy that Jesus is socially the “son” of God, rather than being sired through sexual intercourse with (the virgin) Mary, then how is the zygote formed? I have always believed in a virgin birth (No intercourse), but I also believe that the sperm must have been from God and the egg from Mary.

At any rate the controversy has pitted the churches against the Mission and churches are pulling their support for missionaries.

However, I need to do it understanding the issues they can see and read about. I am not a spokesman for any company. But, as this discussion has turned into a media war, it has increasingly become hard to tell what WycliffeUSA has or has not said when. Content at the same URL can change through time. WycliffeUSA, Wycliffe Global Alliance and SIL International do not use two things consistently in their communications strategy which would make communications clearer to viewers. (Examples in footnotes

  • WycliffeUSA
    WycliffeUSA Page without a date published on it.

    WycliffeUSA Page without a date published on it.

  • SIL International
    SIL Uses month and year but no specific day.

    SIL Uses month and year but no specific day.

  • Translations with the same dates but posted later.

    Translations with the same dates but posted later.

  • Wycliffe Global Alliance
    Wycliffe Global Alliance has no date posted, date.

    Wycliffe Global Alliance has no date posted, date.

    Wycliffe Global Alliance has a date someone else posted.

    Wycliffe Global Alliance has a date someone else posted on an item which is republished with permission.

  • Wycliffe Canada does have a date something was published!
    Wycliffe Canada has the date something was published.

    Wycliffe Canada has the date something was published.

). These two issues are:

  • Post Dates
  • and

  • Update Notices with Dates/time stamps.

It is common practice when issuing a statement online to provide a date on which the content was posted. It is also common practice to show when content has been updated or altered and to tell what has been altered, often it is in response to something left in a comment (in the blogging and columnist worlds).
(I do not necessarily espouse the views of the following post but I use them to present visually what is socially a common practice.)

  • An article on Ron Paul.
    Ron Paul article update example

    Ron Paul article update example

  • An Article on the iPad2
    Techland article

    Techland article update example

  • An article about one of Google’s services
    Google Article

    Google Blog post update example

It has been claimed that WycliffeUSA has altered their FAQ in a manner which would lead current viewers to think this is always been the way the data has been presented, and therefore always the way the story has been told. If there has been some change then this change should be clearly expressed. (And there are functional, well designed, and tactful ways to express this change without spending lots of page space or focus to the reader in the process of doing so.) However, it is this lack of date giving which makes a time oriented anthology of communication so valuable.

[Update: 5 March 2012: As the following image shows, it would appear that Wycliffe does have an update notice for each item on their FAQ sheet, but it still remains unclear what the content was updated from, or alternatively if the FAQ element was added at this later date as the FAQ page itself has no date published.]
Wycliffe update Notices

WycliffeUSA Update Notices as of 5 March 2012

[mf_timeline]

If you know of another Publicly available and verifiable resources, event or discussion with a date relevant to the Son of God discussion leave a note in the comments and I will consider adding it to the time line. After I add it to the time line I will delete the comment. The timeline created is Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. Another timeline format is also in the works and is appearing here.

If you do not want your comment shared under this license then please do not submit it. – Comments may be edited before appearing.

Posted in Business, Citations, Faith, Marketing, Meta-data, SIL International, UI/UX | Tagged David Abernathy, faith, Hussein Wario, John Piper, Rick Brown, sil.org, Son of God, Translation, Wycliffe, WycliffeUSA | 1 Reply

Ethnologue: the linguistic straw-man

Posted on February 21, 2012 by Hugh Paterson III
Reply

The Ethnologue as an academic book, is somewhat of a straw man in linguistics. Many people who write grants for language documentation projects (generally on under described or endangered languages) will cite the Ethnologue and some other resources or lack of resources . These efforts seeking funding are usually an effort to get more language data. The rationale for this is two fold:

  1. Because so little is known that we do not know if the Ethnologue is correct.
  2. Because there is a conflict between other published sources and the Ethnologue .

Continue reading →

Posted in Access, Cartography, Citations, Language Documentation, Linguistics, Marketing | Tagged Data Services, Ethnologue, Langauge Documentation, Linguistic Data, Linguistics, Publishing | Leave a reply

SEO Considerations for Linguistic Resources

Posted on February 19, 2012 by Hugh Paterson III
Reply

SEO for standard websites is pretty straight forward. I happen to be working on a website redesign (in Drupal) which presents Linguistic resources both published and unpublished. I recently came across two specialized SEO options which are useful:

  1. Integration with Google Scholar
  2. Aggregation with OLAC

Google Scholar

Google Scholar’s page on getting data into Google Scholar:http://scholar.google.com/intl/en/scholar/inclusion.html

Biblo, a module for drupal which handles bibliographic data had something here.

This blog also has an interesting write up: http://blog.reallywow.com/archives/123

OLAC Search

This means implementing the OAI-PMH protocol so that OLAC can harvest it.
I am not sure how this is done exactly… but here is the link: http://www.language-archives.org/.

Posted in Access, Citations, Digital Archival, Language Documentation, Library, Linguistics, Marketing, Meta-data | Tagged citations, Drupal, SEO, sil.org | Leave a reply

Post navigation

← Older posts
Newer posts →

I’ve been saying

  • Topical vs. Exegetical
  • My kids and I love the snow
  • Some resources on designing and publishing games
  • Social Good Companies
  • Skoolie with mother-in-law suit
  • New laundry center
  • Climbing Wall Resources
  • Thanksgiving
  • Going places
  • Clean air at the University of Oregon
  • Downstairs stadium
  • Capitalization in indigenous writing systems

Say What?

  • Michael Paterson on Hugh Paterson (1781) of Edinburgh
  • Robert Rouse on RDF Ontologies for the Bible
  • Oma on The climbing gym 2nd time
  • Peter Brassington on Language Survey
  • Kristina Cartwright on The Larsons

One should not consider the content on this website to be an official opinion of any company associated with me. These posts are solely my opinion.

Proudly powered by WordPress