This post is a open draft! It might be updated at any time... But was last updated on at .
Metadata is very important - Everyone agrees. However, there is some discussion when it comes to how to develop metadata and also how to ensure that the metadata is accurate. Taxonomies are limited vocabularies (a set number of items) where each term has a predefined definition. A folksonomy is a vocabulary where people, usually users of data, assign their own useful words or metadata to an item. Folksonomies are like taxonomies in that they are both sets but are unlike taxonomies in the sense that they are an open set where taxonomies are closed sets.
An example of a taxonomy might be the colors of a traffic light: Red, Yellow, and Green. If this were a folksonomy people might suggest also the colors of Amber, Orange, Blue-Green and Blue. These additional terms may be accurate to some viewers of traffic lights or in some cases but they do not fit the stereo-typical model for what are the colors of traffic lights.
Some examples of taxonomies might be the keywords on a book record in a library. A library might have only certain keywords it uses. In contrast to curated records at libraries, websites like flickr and delicious allow users to tag (or Keyword) their photos and links with the keywords which are useful to them. These are examples of folksonomies. However, the concept of user generated metadata goes beyond the folksonomy to the any and all user generated metadata. In this scope projects like LibraryThing and Bibsonomy deserved to be mentioned as sites where user generated metadata plays a powerful part of the organizational presentation of the content on the site.
So the question comes to how are managers of data, like web masters or librarians to ensure the quality of metadata? And also balance that quality with the usefulness of the metadata to the users of the data. So if visitors to the library can not find the book they are looking for because the way they are looking for the book (the terms they are using) is not supported (those terms are not associated with the record for the book) then the cataloguing record is not as useful to that person. But if the library opens up its records for everyone to edit the how is the library to know that the records are accurate?
In linguistics there are several important taxonomies.
- Gold Ontology
- ISO 639-x code sets and language names
- In this context there is also a multi-lingual element, each term may have several variations across languages. i.e. Phonology in English is Phonologie in German.
And in library science there are also several important taxonomies.
- OLAC extension to Dublin Core
- Resource type definitions
And every company or institution is going to have their own special taxonomies for various purposes.
- SIL International unique taxonomies
The challenge for "marketing" or enabling the rapid and useful discovery and association of resources is to spend as little effort describing resources as an institution and to allow users to provide accurate metadata which is helpful to them. After all their mental associations are very important to the use and discovery of relevant resources. So the question is how can users add metadata value to objects in the archive? And how can the institution trust these proposed added value elements? SIL International, as a host institution to the Language and Culture Archive is not alone in this problem space.
Basically what is needed is an algorithm for turning unstructured data into valuable, valued, authoritative, structured data.
As I have stated above SIL International is not alone in this problem space there have been several studies and use cases which have been done and published on this very kind of problem.
Analysis of User Generated Metadata in the Library Thing Folksonomy_Vincent Sterken
Using Social Discovery Systems to Leverage User-Generated Metadata
http://www.asis.org/Bulletin/Apr-11/AprMay11_Spiteri.html  Louise F. Spiteri. 2011. Using Social Discovery Systems to Leverage User-Generated Metadata. American Society for Information Science and Technology. Bulletin April/May 2011. … Continue reading
Using social discovery systems to leverage user-generated metadata
The use of social discovery systems is rapidly expanding, often building vibrant and interactive communities. Some public and academic libraries are trying out these systems, in which patrons can contribute ratings, reviews, and comments. While user-contributed metadata may not equal the quality of professional cataloging, it can enhance the catalog records with rich supplementary information and personal perspectives. The author's examination of use of social features in two public libraries led to the discouraging observation that addition of user-generated metadata in these contexts was limited, in sharp contrast to other social sites. The question of motivation is key. People's notions of library catalog records and their ownership by library staff may present an obstacle to contributing metadata. User-generated metadata has the potential to add value to records while conserving limited library resources. The challenge of promoting the active use of social discovery systems in libraries demands further research.
Repurposing User-Generated Metadata Pathfinder: Interim Report
The Continuum of Metadata Quality: Defining, Expressing, Exploiting
Like pornography, metadata quality is difficult to define. We know it when we see it, but conveying the full bundle of assumptions and experience that allow us to identify it is a different matter. For this reason, among others, few outside the library community have written about defining metadata quality. Still less has been said about enforcing quality in ways that do not require unacceptable levels of human effort.
Metadata creation system for mobile images
User-Generated Metadata for ETDs: Added Value for Libraries Sharon Reeves
http://epc.ub.uu.se/etd2007/files/papers/paper-40.pdf  Sharon Reeves. 2007. User-Generated Metadata for ETDs: Added Value for Libraries. http://epc.ub.uu.se/etd2007/files/papers/paper-40.pdf [PDF] [Accessed: 5 March 2011]
Making Use of User-Generated Content and Contextual Metadata Collected during Ubiquitous Learning Activities
During the last years significant research efforts have been conducted looking at how to standardize digital educational content. Due to better connectivity and computational power of mobile devices, new opportunities have emerged for collecting user-generated data based on the context and the environment where the content has been generated. While metadata standards for learning objects such as IEEE LOM make it possible to annotate digital content with pre-defined metadata tags, the ability to store custom user-generated or contextual metadata is not yet fully supported. The need for developing a flexible solution to deal with these problems motivated the design of our activity controller system (ACS), a rapid prototyping system and a task manager, which interprets, reacts to and stores contextual metadata and content extracted during learning activities. This paper presents how ACS facilitates coordination and reusability of user generated data, which we believe is as a valuable feature compared with existing standards and initiatives.
Annotea and Semantic Web Supported Collaboration
http://ceur-ws.org/Vol-137/01_koivunen_final.pdf  Marja-Riitta Koivunen. Annotea and Semantic Web Supported Collaboration. http://ceur-ws.org/Vol-137/01_koivunen_final.pdf [PDF] [Accessed: 5 March 2011]
Mapping Entry Vocabulary to Unfamiliar Metadata Vocabularies.
From Spectator to Annotator: Possibilities offered by User-Generated Metadata for Digital Cultural Heritage Collections  Seth van Hooland. 2006. From Spectator to Annotator: Possibilities offered by User-Generated Metadata for Digital Cultural Heritage Collections. … Continue reading
Author-generated Dublin Core Metadata for Web Resources: A Baseline Study in an Organization
 Jane Greenberg, Maria Cristina Pattuelli, Bijan Parsia and W. Davenport Robertson.. Author-generated Dublin Core Metadata for Web Resources: A Baseline Study in an Organization. … Continue reading
This paper reports on a study that examined the ability of resource authors to create acceptable metadata in an organizational setting. The results indicate that authors can create good quality metadata when working with the Dublin Core, and in some cases they may be able to create metadata that is of better quality than a metadata professional can produce. This research suggests that authors think metadata is valuable for resource discovery, that it should be created for Web resources, and that they, as authors, should be involved in metadata production for their works. The study also indicates that a simple Web form, with textual guidance and selective use of features (e.g. pop-up windows, drop-down menus, etc.) can assist authors in generating good quality metadata.
|↑1||Louise F. Spiteri. 2011. Using Social Discovery Systems to Leverage User-Generated Metadata. American Society for Information Science and Technology. Bulletin April/May 2011. http://www.asis.org/Bulletin/Apr-11/AprMay11_Spiteri.html [Link] [Accessed: 5 March 2011]|
|↑2||Sharon Reeves. 2007. User-Generated Metadata for ETDs: Added Value for Libraries. http://epc.ub.uu.se/etd2007/files/papers/paper-40.pdf [PDF] [Accessed: 5 March 2011]|
|↑3||Marja-Riitta Koivunen. Annotea and Semantic Web Supported Collaboration. http://ceur-ws.org/Vol-137/01_koivunen_final.pdf [PDF] [Accessed: 5 March 2011]|
|↑4||Seth van Hooland. 2006. From Spectator to Annotator: Possibilities offered by User-Generated Metadata for Digital Cultural Heritage Collections. http://homepages.ulb.ac.be/~svhoolan/Usergeneratedmetadata.pdf [PDF]|
|↑5||Jane Greenberg, Maria Cristina Pattuelli, Bijan Parsia and W. Davenport Robertson.. Author-generated Dublin Core Metadata for Web Resources: A Baseline Study in an Organization. http://journals.tdl.org/jodi/article/viewArticle/42/45,|