Skip to primary content
Skip to secondary content

The Journeyler

A walk through: Life, Leadership, Linguistics, Language Documentation, WordPress, and OS X (and a bit of Marketing & Business Administration)

The Journeyler

Main menu

  • Home
  • CV/Resume
  • Family
    • Katja
    • Hugh V
  • Location
    • Cartography
    • Geo-Tagging
    • GPS
  • Language Documentation
    • Linguistics
    • Digital Archival
  • Visiting Collections
    • Photography
    • Open Drafts
    • Posts to move to another website
  • Archives

Category Archives: Meta-data

Post navigation

← Older posts

Metadata Interoperability at OLAC

Posted on April 12, 2023 by Hugh Paterson III
Reply

This week we had a lecture on metadata interoperability. Interoperability is a major theme of Gary Simons work on OLAC. It was the keyword or concept that he used to push the social behavior requirements related to the activities around, in, and at language archives.

I think that across the history of OLAC there have been various understandings on the kinds of metadata needed to describe language resources. That is, discovery is the architectural goal of OLAC, but other requirements also exist. In the beginning of OLAC many of the participants were looking at OLAC for a complete solution to the kinds of metadata they should be collecting and using. The other requirements upon resource stewards have always meant additional fields in diverse institutional contexts. The freedom to explore these other requirements has not always been explored or embraced by stewards. Some have seen OLAC as an all or nothing involvement. Maybe the fear has been that there will be divergence from a communal norm.

However, my perspective is that it is quite normal for each institution to have its own metadata schema or application profile some portion of which gets shared with OLAC.

With this as background then, with the assumption that different management practices will produce different metadata schemes it seems reasonable that each institution should update their schema from time to time. This implies that metadata quality in terms of coverage or "encoding" is a moving target. Another implication then, is that even in fields which are shared with the OLAC aggregator and are defined in the OLAC metadata application profile, that those fields may have different internal syntax at different providers or at different time depths of the records creation.

The ISO639-3 field is one evidence of evolutionary change. This standard has fields which split and merge from time to time. Associating a records time of creation with a version of an institutions metadata schema is a useful dynamic when evaluating a record's quality.

The question is how should a record and the version of its applicable metadata profile be associated in the OLAC context? How should this information be communicated to record viewers?

The answer is rather straightforward, but requires two parts. The first part requires a modification to the archive profile to have two information bits:

  1. The name of the native application profile at the data provider
  2. A link to the native metadata application profile documentation

The documentation should be in a publicly accessible place so that the provided metadata makes sense. There are several ways this could be accomplished one way is to create a manifestation record for each iteration of the application profile. These could be related into a collection or they could have a single relation.

which in the listSet

The OLAC OAI record should have in its source in the first harvest the name and version of the native metadata schema used for the generation of the record. The link to the native version of the providers metadata schema's documentation should be provided in the archive section of the OAI describer.

Some utilities in OAI can modify data, some can be servers only, some havesters only, some harvesters and servers.

Some OAI providers are

Using record sets:

OLAC could allow end-users to dynamically create sets of records for export using the setSpec part of OAI. Playing with this and audience interest might create some social interest.

Posted in Meta-data | Tagged in_Obsidian, OAI-PMH, OLAC, R-90, setSpec | Leave a reply

Dublin Core Acronyms

Posted on April 12, 2023 by Hugh Paterson III
Reply

DC = Dublin Core: This may refer to simple Dublin Core which, depending on the time of writing may refer to the original 15 elements. See Phelps (2012)
DCMI = Dublin Core Metadata Initiative as used by Cole (2002), later changed to Dublin Core Metadata Innovation; but the term innovation does not appear on the current-(2022/2023) Dublin Core website, or it's parent organization ASIS&T.

DCMI Name on ASIS&T website.

DCMI Name on Dublin Core website.

QDC = Qualified Dublin Core as used by Cole (2002).
DCMES = Dublin Core Metadata Element Set: Generally this means the 18 elements 15 of which are in the DC 1.1 namespace and the other three in the DCTERMS namespace. In prefered parlance elements are known as properties, however due to the historical practice of using Dublin Core within an XML context and seeing these properties used XML elements, the term elements was applied. In my opinion, choosing a term like "properties" from the parlance of RDF is just as jaded. Used for example by Ward (2004), Saadat Alijani & Jowkar (2009), Phelps (2012), Jackson et al (2008), and Nevile & Lissonnet (2004).
DCMS = Dublin Core Metadata Standard. See Eckert et al (2009) and Quam (2001).
DCMES 1.1 = Dublin Core Metadata Element Set; Simple Dublin Core. See also this (DC Website) and this (OLAC).
DCTERMS = Dublin Core Terms or Qualified Dublin Core.

Cole, Timothy W. 2002. “Qualified Dublin Core Metadata for Online Journal Articles.” OCLC Systems & Services: International Digital Library Perspectives 18 (2). MCB UP Ltd: 79–87. doi:10.1108/10650750210430141.

Eckert, K., Pfeffer, M., & Stuckenschmidt, H. (2009). A Unified Approach for Representing Metametadata. International Conference on Dublin Core and Metadata Applications, pp. 21–29. Retrieved from https://dcpapers.dublincore.org/pubs/article/view/973

Jackson, Amy S., Myung-Ja Han, Kurt Groetsch, Megan Mustafoff, and Timothy W. Cole. 2008. “Dublin Core Metadata Harvested Through OAI-PMH.” Journal of Library Metadata 8 (1). Routledge: 5–21. doi:10.1300/J517v08n01_02.

Phelps, Tyler Elisabeth. 2012. “An Evaluation of Metadata and Dublin Core Use in Web-Based Resources.” Libri 62 (4). doi:10.1515/libri-2012-0025.

Nevile, L., & Lissonnet, S. (2004). The Case for a Person/Agent Dublin Core Metadata Element Set. International Conference on Dublin Core and Metadata Applications, . Retrieved from https://dcpapers.dublincore.org/pubs/article/view/780

Quam, Eileen. 2001. “Informing and Evaluating a Metadata Initiative: Usability and Metadata Studies in Minnesota’s Foundations Project.” Government Information Quarterly 18 (3): 181–94. doi:10.1016/S0740-624X(01)00075-2.

Saadat Alijani, Alireza, and Abdolrasool Jowkar. 2009. “Dublin Core Metadata Element Set Usage in National Libraries’ Web Sites.” The Electronic Library 27 (3). Emerald Group Publishing Limited: 441–47. doi:10.1108/02640470910966880.

Ward, Jewel. 2004. “Unqualified Dublin Core Usage in OAI‐PMH Data Providers.” OCLC Systems & Services: International Digital Library Perspectives 20 (1): 40–47. doi:10.1108/10650750410527322.

Posted in Meta-data | Tagged Dublin core, in_Obsidian | Leave a reply

Useful Modeling

Posted on April 2, 2023 by Hugh Paterson III
Reply

I find the documentation here very useful for modeling Events and Physical objects.

Posted in Meta-data | Leave a reply

Spatial Coverage on the OLAC network

Posted on March 20, 2023 by Hugh Paterson III
Reply

The issues is that OLAC and these other uses of Dublin Core don't agree in the semantics of spatial coverage.

https://archive-intranet.ardc.edu.au/display/DOC/Spatial+coverage#:~:text=Spatial%20coverage%20refers%20to%20a,the%20focus%20of%20an%20activity.

Critical question here, is one where we ask: "what do English think geography is for language?"

Thinking deeply about:

https://twitter.com/elararchive/status/1637559068398157824?s=46&t=Zdt2jeAjeFQx6k372aS64A

Posted in Meta-data | Tagged Dublin core, in_Obsidian, OLAC, To move | Leave a reply

Quantitative Analysis of Metadata Errors

Posted on March 13, 2023 by Hugh Paterson III
Reply

Various approaches to metadata quality assessment divide the assessment criteria into sections. For example accuracy, consistency, and completeness. However, one should ask if a quantitative approach to metadata quality assessment is better than a qualitative approach. Some may point out that the two are not mutually exclusive, and therefore not in direct competition with each other. However, I wonder if this is true. For example, if one has limited reading time does one benefit more from reading the percentage of errors relative to another error type or does one learn more by reading about the assumed noncompliance or disharmony across metadata records?

The second point in suggesting that a qualitative description of metadata quality might be better that a quantitative description is related to root causes — presumably the purposes of the investigation in the first place.

It seems to me that a quantitative approach makes the data the discussion and ignores the methods by which the data got into the observed format. For example, what were the human factors under which the metadata was produced? What was the workflow? What was the target metadata scheme at the time the records were created? What was the management implemented checking process, i.e. what were they checking for, or their metrics for success?

A qualitative analysis can show where the current process meets the management considerations. Essentially this is problem-solution fit analysis, where metadata quality is a trailing performance indicator for business processes. However, it gets interesting here because the prevailing thought is that metadata is also the way that a customers are serviced through the organization. That is, it is like a loss-leader product in that it is a product to get a customer to the main product.

Purely quantitative analysis simply announces that issues exist within a relative order. It doesn’t seek to explain the short comings using a contextual analysis.

Posted in Meta-data | Tagged metadata, qualitative, quality assessment, quantitative | Leave a reply

Dynamic collections aren’t.

Posted on March 12, 2023 by Hugh Paterson III
Reply

Some years ago, scholars were debating the definition of collection. In an archival sense, and the more traditional sense, a collection refers to a direct or accumulating set of resources. In a library sense a collection may wax and wane depending on the Curation of the collection. So what is a digital collection? Especially in an aggregator of metadata?

To this question I have given some thought. The DCMIType “collection” is ambiguous on this point. Aggregations seem not to be the same as “collection” in that they are continuously updating, and may be different for different viewers! However, essentially this is the same definition that is used in libraries.

After about a year and a half of thinking about this traveling point how to do it I think I have a solution. Aggregations such as those through OAI or RSS, are not collections at all. Rather, aggregations are a view through a dynamic access point. RDA and IFLA – LRM are two models that use the concept of access points. Aggregations, in this sense of access point, our temporary applications of an access point to a resource. In RDA and IFLA – LRM these access points are hard coded on the record. This need not be the case all the time in an information retrieval system. Information retrial system can have there own coded access points independent of the data they are operation on. In this way the information retrieval system might mitigate the possible limits in the information structure of the information being retrieved. It validates the autonomy of the information retrieval system from the information.

This sort of solution preserves the definition of collection bringing sanity to the concept of collection.

Posted in Meta-data | Tagged information retrieval, metadata, OAI, To move | Leave a reply

OLAC spelling mistakes

Posted on February 28, 2023 by Hugh Paterson III
Reply

I wonder how many spelling mistakes we can find in various records in OALC... This is a great reason for OLAC to retain the kind of language the record is in.

https://opensource.com/article/18/2/aspell

Spell Checking Your Programming from the Linux Command Line


https://github.com/uribench/spell-check/blob/master/docs/XML%20Spell%20Checking%20Workaround.md

Install with my blog workflow:
https://github.com/tbroadley/spellchecker-cli

other options: https://vi.stackexchange.com/questions/22220/how-to-make-spell-check-work-for-text-inside-a-xml-file
https://metacpan.org/dist/XML-Twig/view/tools/xml_spellcheck/xml_spellcheck

Posted in Meta-data | Tagged Automation tools, in_Obsidian, OLAC, To move | Leave a reply

Interesting Public DC schema use

Posted on February 23, 2023 by Hugh Paterson III
Reply

This record is interesting in that they use a dot notation for Dublin Core.

https://www.repository.cam.ac.uk/handle/1810/316390?show=full

They also have 4 total schemas used in their application profile, in contrast to extending dublin core with custom serializations.

Posted in Meta-data | Tagged dot-notation, Dublin core, in_Obsidian, OLAC | Leave a reply

OLAC data quality investigator

Posted on January 26, 2023 by Hugh Paterson III
Reply

On the flight back from Finland I found it challenging to use my laptop and pulled out my scratch pad to draw out some ideas I was having. One of those ideas was an idea for a record quality investigator. A tool which lets one investigator the presence or absence of features or sets of features in a record or set of records. The goal is to look for any patterns in the records which might be interesting and notable.

What follows are my written notes.

Page 1
Page 2
Page 3
Page 4
Page 5
Page 6
Page 7
Page 8
Posted in Meta-data | Tagged in_Obsidian, metadata, Metadata quality, OLAC | Leave a reply

Schema.org

Posted on October 18, 2022 by Hugh Paterson III
Reply

Some links and papers on schema.org.

Breadcrumb

https://schema.org/WebSite

https://neilpatel.com/blog/get-started-using-schema/

https://developers.google.com/search/docs/appearance/structured-data/image-license-metadata

https://search.google.com/test/rich-results/result/r%2Fevents?id=_s8HGEDUyCAtztd6qexRyA

https://github.com/wowchemy/wowchemy-hugo-themes/blob/main/modules/wowchemy-seo/layouts/partials/jsonld/event.html

Posted in Marketing, Meta-data | Tagged metadata, schema.org | Leave a reply

Post navigation

← Older posts

Activity

May 2025
M T W T F S S
 1234
567891011
12131415161718
19202122232425
262728293031  
« Jan    

I’ve been saying

  • Chasing subsets
  • New mouse buttons
  • Moving Apple notes
  • Academic Heritage in MARC records
  • Converting DC Subjects to Schema.org
  • Language Documentation Gear
  • Serials, MARC Records and RDA Core
  • Font Modulator
  • OLAC CMS options via XML
  • OLAC Collection Description and Linked Data Terms
  • Zotero Plugins
  • OLAC and User Tasks

Say What?

  • David Clews on German Waters
  • Jeff Pitts on Kinder Eier
  • Jeff on Plasticification of soil
  • Thoughts on file formats and file names in language documentation projects and archiving | The Journeyler on The Workflow Management for Linguists
  • Hugh Paterson III on Types of Linguistic Maps: The Mapping of linguistic Features and Researcher Interactivity

One should not consider the content on this website to be an official opinion of any company associated with me. These posts are solely my opinion.

Proudly powered by WordPress

© 2005-2025 Hugh Paterson III All Rights Reserved.
By submitting a comment here you grant this site a perpetual license to reproduce your Words, Name & Website URL in attribution.
Details of your viewing experience maybe retained and used. -- Copyright notice by Blog Copyright

 

Loading Comments...