OAI-PMH for WordPress

Posted on March 6, 2012 by Hugh Paterson III

Umm frankly, I am not sure anything out there right now is going to work to bring OAI-PMH services to WordPressConsider these three resources for more info on OAI:

. If it does then is it going to be able to use WordPress to advertise things or is it going to use WordPress to aggregate things? if the former then nothing out there ever let the admin user choose which fields were matched to which attributes, dynamically. But if it is also the former then why would anyone actually want this functionality? What is the Use Case? If one is using WordPress as a bibliography reference system like some libraries do, then this makes a lot of sense. However, there is another use case I would like to present. That is, the website which is about several or a single language. There are potentially two ways to conceptualize this:

If there were a website based on WordPress which was a dictionary website then the whole website might be considered a resource on a language. An example of this might be the use of SIL’s Webonary Plugin for WordPress and the Cherokee Language Project’s Dictionary.
If there were a website presenting materials on several languages and each page was a resource on a single language then that would be a different use case. This would be more like what the Survey of California and
Other Indian Languages does or what the Central Institute of Indian Languages does.

OAI WordPress models

Existing Foundation

COinS-PMH (unAPI) WordPress Plugin (2005) ^[1] Peter Binkley. 9 December 2005. COinS-PMH (unAPI) WordPress Plugin. http://www.wallandbinkley.com/quaedam/2005/12_09_coins-pmh-unapi-wordpress-plugin.html [Accessed: 5 March 2012]
Peter Binkley tagged blog posts for OAI.
unAPI Server for WordPress. ^[2] Mike Giarlo. 19 May 2006. unAPI Server for WordPress. Technosophia. http://lackoftalent.org/michael/blog/unapi-wordpress-plug-in/ [Accessed: 5 March 2012]
WordPress, now with added unAPI! ^[3] Peter Binkley. 18 February 2006. WordPress, now with added unAPI!. http://www.wallandbinkley.com/quaedam/2006/02_18_wordpress-now-with-added-unapi.html [Accessed: 5 March 2012]
New OAI-PMH metadata format (It was an update).

I think there is a second question here too: why does one need OAI-PMH for wordpress… is it as a provider or as a consumer? If one needs a PHP app for OAI-PMH maybe they can use: https://github.com/caseyamcl/phpoaipmh

References[+]

References
↑1	Peter Binkley. 9 December 2005. COinS-PMH (unAPI) WordPress Plugin. http://www.wallandbinkley.com/quaedam/2005/12_09_coins-pmh-unapi-wordpress-plugin.html [Accessed: 5 March 2012]
↑2	Mike Giarlo. 19 May 2006. unAPI Server for WordPress. Technosophia. http://lackoftalent.org/michael/blog/unapi-wordpress-plug-in/ [Accessed: 5 March 2012]
↑3	Peter Binkley. 18 February 2006. WordPress, now with added unAPI!. http://www.wallandbinkley.com/quaedam/2006/02_18_wordpress-now-with-added-unapi.html [Accessed: 5 March 2012]

Metadata for Educational Materials

Posted on November 27, 2011 by Hugh Paterson III

I have been following Learning Resource Metadata Initiative (LRMI), a collaborative effort between Creative Commons and the Association of Educational Publishers^[1]Creative Commons. 7 June 2011. Creative Commons & the Association of Educational Publishers to establish a common learning resources framework. http://creativecommons.org/weblog/entry/27603 . … Continue reading , with some interest as I start to look at SIL.org and potential services and resources offered through SIL.org are merged with the larger world of well described data.

https://youtu.be/-1QEkA9qbwA

SIL has a long tradition of providing linguistic training. With the digital revolution, it only seems right that these training resources would be described appropriately in the educational arena. It will be interesting to look at LRMI as it develops over the next few months. And then to think about applying it in the context of Drupal.

References[+]

References
↑1	Creative Commons. 7 June 2011. Creative Commons & the Association of Educational Publishers to establish a common learning resources framework. http://creativecommons.org/weblog/entry/27603 . [Accessed: 27 November 2011] [Link]

Citations, Names and Language Documentation

Posted on September 30, 2011 by Hugh Paterson III

I have recently been reading the blog of Martin Fenner and came upon the article Personal names around the world ^[1] Martin Fenner. 14 August 2011. Personal names around the world. PLoS Blog Network. http://blogs.plos.org/mfenner/2011/08/14/personal-names-around-the-world . [Accessed: 16 September 2011]. [Link] . His post is in fact a reflection on a W3C paper on Personal Names around the WorldSeveral other reflections are here: http://www.w3.org/International/wiki/Personal_names (same title). This is apparently coming out of the i18n effort and is an effort to help authors and database designers make informed decisions about names on the web.
I read Martin’s post with some interest because in Language Documentation getting someone’s name as a source or for informed consent is very important (from a U.S. context). Working in a archive dealing with language materials, I see lot of names. One of the interesting situations which came to me from an Ecuadorian context was different from what I have seen in the w3.org paper or in the w3.org discussion. The naming convention went like this:

The elder was known by the younger’s name plus a relationship.

My suspicion is that it is a taboo to name the dead. So to avoid possibly naming the dead, the younger was referenced and the the relationship was invoked. This affected me in the archive as I am supposed to note who the speaker is on the recordings. In lue of the speakers name, I have the young son’s first name, who is well known in the community, and is in his 30’s or so, and I have the relationship. So in English this might sound like John’s mother. Now what am I supposed to put in the metadata record for the audio recordings I am cataloging? I do not have a name but I do have a relationship to a known (to the community) person.

I inquired with a literacy consultant who has worked in Ecuador with indigenous people for some years, she informed me that in one context she was working in everyone knew what family line they were from and all the names were derived from that family line by position. It was of such that to call someone by there name was an insult.

It sort of reminds me of this sketch by Fry and Laurie.

References[+]

References
↑1	Martin Fenner. 14 August 2011. Personal names around the world. PLoS Blog Network. http://blogs.plos.org/mfenner/2011/08/14/personal-names-around-the-world . [Accessed: 16 September 2011]. [Link]

Letting Go

Posted on August 29, 2011 by Hugh Paterson III

Working in an archive, one can imagine that letting go of materials is a real challenge. Both in that it is hard to do becasue of policy, but also because it is hard to do because of the emotional “pack-rat” nature of archivist. This is no less the case of the archive where I work. We were recently working through a set of items and getting rid of the duplicates. (Physical space has its price; and the work should soon be available via JASOR.) However, one of the items we were getting rid of was a journal issue on a people group/language. The journal has three articles, of these, only one of them article was written by someone who worked for the same organization I am working for now. So the “employer” and owner-operator of the archive only has rights to one of the three works. (Rights by virtue of “work-for-hire” laws.) We have the the off-print, which is what we have rights to share, so we keep and share that. It all makes sense. However, what we keep is catalogued and inventoried. Our catalogue is shared with the world via OLAC. With this tool someone can search for a resource on a language, by language. It occurs to me that the other two articles on this people group/language will not show in the aggregation of results of OLAC. This is a shame as it would be really helpful in many ways. I wish there was a groundswell, open source, grassroots web facilitated effort where various researchers can go and put metadata (citations) of articles and then they would be added to the OLAC search.

Metadata Magic

Posted on August 10, 2011 by Hugh Paterson III

The company I work for has an archive for many kinds of materials. In recent times this company has moved to start a digital repository using DSpace. To facilitate contributions to the repository the company has built an Adobe AIR app which allows for the uploading of metadata to the metadata elements of DSpace as well as the attachement of the digital item to the proper bitstream. Totally Awesome.

However, one of the challenges is that just because the metadata is curated, collected and properly filed, it does not mean that the metadata is embedded in the digital items uploaded to the repository. PDFs are still being uploaded with the PDF’s author attribute set to Microsoft-WordMore about the metadata attributes of PDF/A can be read about on pdfa.org. Not only is the correct metadata and the wrong metadata in the same place at the same time (and being uploaded at the same time) later, when a consumer of the digital file downloads the file, only the wrong metadata will travel with the file. This is not just happening with PDFs but also with .mp3, .wav, .docx, .mov, .jpg and a slew of other file types. This saga of bad metadata in PDFs has been recognized since at least 2004 by James Howison & Abby Goodrum. 2004. Why can’t I manage academic papers like MP3s? The evolution and intent of Metadata standards.

So, today I was looking around to see if Adobe AIR can indeed use some of the available tools to propagate the correct metadata in the files before upload so that when the files arrive in DSpace that they will have the correct metadata.

The first step is to retrieve metadata from files. It seems that Adobe AIR can do this with PDFs. (One would hope so as they are both brain children of the geeks at Adobe.) However, what is needed in this particular set up is a two way street with a check in between. We would need to overwrite what was there with the data we want there.
However, as of 2009, there were no tools in AIR which could manipulate exif Data (for photos).
But it does look like the situation is more hopeful for working with audio metadata.

One way around the limitations of JavaScript itself might be to use JavaScript to call a command-line tool or execute a python, perl, or shell script, or even use a library. There are some technical challenges which need bridged when using these kinds of tools in a cross-platform environment. (Anything from flavors of Linux to, OS X 10.4-10.7 and Windows XP – Current.) This is mostly because of the various ways of implementing scripts on differnt platforms.

The technical challenge is that Adobe AIR is basically a JavaScript environment. As such there are certain technical challenges around implementation of command-line tools like Xpdf from fooLabs and Coherent PDF Tools or Phil Harvey’s ExifTool, Exifv2, pdftk, or even TagLib. One of the things that Adobe AIR can do is call an executable via something called actionscript. There are even examples of how to do this with PDF Metadata. This method uses PurePDF, a complete actionscript PDF library. Actionscript is powerful in and of itself, it can be used to call the XMP metadata of a PDF, Though one could use it to call on Java to do the same “work”.

Three Lingering Thoughts

Even if the Resource and Metadata Packager has the abilities to embed the metadata in the files themselves, it does not mean that the submitters would know about how to use them or why to use them. This is not, however, a valid reason to not include functionality in a development project. All marketing aside, an archive does have a responsibility to consumers of the digital content, that the content will be functional. Part of today’s “functional” is the interoperability of metadata. Consumers do appreciate – even expect – that the metadata will be interoperable. The extra effort taken on the submitting end of the process, pays dividends as consumers use the files with programs like Picasa, iPhoto, PhotoShop, iTunes, Mendeley, Papers, etc.
Another thought that comes to mind is that When one is dealing with large files (over 1 GB) It occurs to me that there is a reason for making a “preview” version of a couple of MB. That is if I have a 2 GB audio file, why not make 4 MB .mp3 file for rapid assessment of the file to see if it is worth downloading the .wav file. It seems that a metadata packager could also create a presentation file on the fly too. This is no-less true with photos or images. If a command-line tool could be used like imagemagick, that would be awesome.
This problem has been addressed in the open source library science world. In fact a nice piece of software does live out there. It is called the Metadata Extraction Tool. It is not an end-all for all of this archive’s needs but it is a solution for some needs of this type.

Matching IPTC to Dublin Core

Posted on July 20, 2011 by Hugh Paterson III

Go to start of metadata
Matching IPTC to Dublin Core:

http://metadatadeluxe.pbworks.com/w/page/25784393/W3C,-IPTC,-Dublin-Core,-and-Adobe

How this metadata stuff is stored

Posted on July 20, 2011 by Hugh Paterson III

This is an introduction to how this metadata is stored.

http://wiki.gbif.org/gbif/wikka.php?wakka=MMMetaData

XMP Sidecar Files
A sidecar file is an alternative to storing the metadata directly in the image file itself by instead storing the data in a separate .xmp file with the same base name as the photo. Sidecars are typically used in cases where the file format of the photo doesn't directly support embedding metadata or in cases when the image file should not be edited directly. It should be noted that very few programs support the reading the xmp sidecar files, most will default to reading and writing to the photo directly.

Gracefully copied from http://www.earlyinnovations.com/photolinker/xmp-sidecar-files.html

Note: that side car files are separate files from the photos so if that photo were to be archived it would need to form a package of two files a sidecar file and the main photo image.

Embedded Metadata
Exif
http://www.digicamhelp.com/glossary/exif-data/
http://www.opanda.com/en/iexif/
http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/EXIF.html
http://www.stuffware.co.uk/cheese/

IPTC
http://www.iptc.org/std/photometadata/2008/specification/IPTC-PhotoMetadata-2008.pdf

The funky Dublin Core Metadata Schema
http://www.iptc.org/std/Iptc4xmpCore/1.0/specification/Iptc4xmpCore_1.0-spec-XMPSchema_8.pdf
http://www.prismstandard.org/about/relationships.asp

Ok, so the items which can be recorded in Exif data spots, might not be able to be recorded in ITCP spots and vise versa. That is that some elements like copyright holder or photographer can be recorded in the ITCP data but not in the Exif data. This means that there is not 100% correspondence between the two sets. We can not choose to use one and ignore the other. When we throw XMP into the mix there are attentional things which can be recorded in XMP but not in either of the other sets. Additionally, XMP is in its own file, not embedded. Dublin Core (DC) is also a set of options for metadata. They are not embedded in the photo itself, rather in a way it is a way to organize a database of metadata about objects. REAP uses DC. DC is extensible, that is we can move embedded metadata (or sidecars) from photos into the REAP database's metadata structure. But then what happens when the photo is removed from the REAP container. Does the metadata travel with it?

Here is a clip gracefully copied from http://www.earlyinnovations.com/photolinker/annotation-philosophy.html

Many popular websites and applications allow you to annotate your photos by adding keywords, a description, a title, a location, a list of the people in the photograph and many other tags. These websites and applications generally suffer from two major deficiencies:

Annotations are often exclusively added to a propriety database, and not written back to the photo. This means that unless the software or website is still available in, say, 50 years, the annotations will be completely lost.
Programs that do write the annotations directly to the image file usually corrupt existing tags or write partial information.
PhotoLinker solves both of these issues.

PhotoLinker write the annotations directly to the photo so that your annotations stay with the photo forever. After annotating with PhotoLinker you can use other programs or upload to popular websites with the knowledge that your annotations will stay with your copy of the photos.
PhotoLinker is one of first application to adhere to Metadata Working Group Guidelines for Handling Image Metadata. These guidelines ensure that annotations are handled correctly. In addition, PhotoLinker maintains transparency about how it handles the metadata by using open source tool ExifTool and showing exactly which tags are between read and written.

Photos and Metadata

Posted on July 19, 2011 by Hugh Paterson III

There are several things to think about with respect to photos and metadata.

1. The What: this is the elements of the metadata's data. The "Who", "What", "Where", "When", "Why and "How" of the photo.

2. The How: The technical storage of the metadata. Where is all of this data stored.

These two issues are discussed in their own child pages. There is a lot to say on each one.

In brief the What tries to answer the question What kind of meta-data should or can be collected?

Whereas the How tries to answer the question What should or can be be done with this meta-data?

File Naming convention: http://www.controlledvocabulary.com/imagedatabases/filenaming.html

Keywords: http://www.controlledvocabulary.com/metalogging/ck_guidelines.htm

A Minimal Set of meta-data to strive to collect for each Photo

Posted on July 10, 2011 by Hugh Paterson III

A Minimal Set of meta-data to strive to collect:

Photo ID Number: ______________________________

Collection:____________________________________

Sub-Collection:_________________________________

Who (photographer): ____________________________

Who (subject): _________________________________

Who (subject): ________________________________

People group:_________________________________

When (was the photo taken): _______________________

Where (Country): _______________________________

Where (City): _________________________________

Where (Place): ________________________________

What is in the Photo: ____________________________

Why was it taken (Event):_________________________

Description:____________________________________

Who (provider): ________________________________

Who (provider): _______________________________

Reading List

Posted on February 1, 2010 by Hugh Paterson III

I want to read:
Focus on Metadata: Understanding the Semantic Web: Bibliographic Data and Metadata

I found it here.

I would like to add this too: Gesture: Visible Action as Utterance

The Journeyler

A walk through: Life, Leadership, Linguistics, Language Documentation, WordPress, and OS X (and a bit of Marketing & Business Administration)

Tag Archives: metadata

OAI-PMH for WordPress

Existing Foundation

Metadata for Educational Materials

Citations, Names and Language Documentation

Letting Go

Metadata Magic

Three Lingering Thoughts

Matching IPTC to Dublin Core

How this metadata stuff is stored

Photos and Metadata

A Minimal Set of meta-data to strive to collect for each Photo

Reading List