Re-implementing the OLAC validator

The OLAC validator runs off of an unit of software which has the heartbleed security vulnerability. Thinking about implementing a validator the following software comes to mind. There was also an Online OAI-PMH validator from a former engineer on the Europeana project. I think he is based in Greece. His solution is not open source, but he mentioned that he would consider adding the OLAC profile.

It would be good to see what other OAI-PMH validators look like and how submitters expect to interact with them.; ;;

NASKO 2023 Reflections

I attended several papers and presented my own paper at NASKO 2023.

I was impressed with the paper presented by Julia Bullard.

Thesaurus Construction for Community-Centered Metadata [long paper] by Julia Bullard, Nigel Town, Sarah Nocente, Aleha McCauley and Heather O'Brien.

There were several things that I appreciated about it. While my observations and impressions are not directly related to the paper's subject the paper helped me think about other sorts of things as I struggle through my own thoughts and contexts.

  1. I really appreciated the role that the geographical relationship played in the library work completed. In this work, done in Canada, a thesaurus was created and then implemented as part of a portal. The whole purpose of the portal was to make accessible research, different kinds of research, conducted on and in a community next to the university. So while the research was not always about the community, which was explained to have some of the wealthiest and the poorest of the city in the same neighborhood, the results of the research were always about the community. In a way the research was "of" the community. For me this "of-ness" and its relationship to the geographical contest was really important. This "of-ness" is different than "about-ness". This is really interesting as I contemplate the "of-ness" of OLAC metadata and the role of geographical information within the OLAC metadata schema, use cases and within Dublin Core in general. I asked Julia if the metadata driving the portal providing access to the research was driven off of Dublin Core. Her response was that "the base metadata is from the institutional repository, cIRcle, which is adapted Dublin Core. The "cIRcle metadata manual" is findable on the web and you can see the dc mappings in it". This of course makes me curious what metadata professionals mean by "adapted Dublin Core" versus, extended Dublin Core, as Bird and Simons describe OLAC.

Another thing was that Julia used the term "Extractive" as in the scholars of the university had an "extractive relationship" with the community. For me the term "extractive" with regards to "extractive research" has never been very clear. It has always seemed to be a highly charged term with lots of finger pointing without an clear definition. Therefore it seemed to be one of those general accusations which could never be defended against nor proven false. My first exposure to the term was at an ICLDC plenary where the speaker was asked questions and the term came up in either the plenary or the discussion. In reflection on the ICLDC conversation I think the speaker was from Canada, so maybe the term has some wider use in that geographical context than what I am used to. However, in Julia's case I really appreciated the definition of "extractive relationship" that she provided. She defined it as the non-accessibility of research results. Specifically applied to the way that researched peoples would think to access the results. Thi is an interesting dynamic to explore. For example, is it still extractive research if one collects information from individuals, but does provide the information back to the individuals, but then doesn't provide the community access to the sum of the participant's information? What about a summary of the information rather than the raw information? Would that still be extractive? Does extrative only apply in academic contexts or does it also apply in corporate contexts? Can non-profits be extractive? What if the research information was collected but there was no permission to share and the collecting organization can point back to that lack of permission to share, would that be extractive? The information serves the purpose of the organization but not the diverse purposes possible in the community or other actors within the community.

Finally, there was the topic of the creation of the thesaurus. There were a variety of terms that they sought to recontextualize. Presumably subject terms. I assessed these in a 4 part grid based on the kind of management practice needed. Top-left is severity, top-right degree of offensiveness, bottom-right Null-results or no Change, While in the bottom left were addressable terms where they were able to bring in a subject matter expert to engage with the materials and provide alternative terminology.

  1. I found Carlin Soos's paper addressing issues in Generative AI based author attribution very interesting. I need to follow up with Carlin on these issues. He addressed it in terms of attribution and plagiarism, arguing strongly that it is not plagiarism but that there are other trace stakeholders in the mix. This has certain links to Linguistic applications in information annotation. There are other sorts of links to how universities craft policies. At UNT plagiarism includes the idea that an author can plagiarize their own work. This is crazy in my opinion. The administrative goal is to limit creative output to certain classes of creative efforts. Therefore anything outside the KO acknowledged by the administration is plagiarism. Since there is a social supported offense against plagiarism it is seen as evil. We see a similar approach to how governments define "terrorist organization". Different governments apply "security measures" for different reasons.

  2. In the context of my own paper, Thomas Dousa asked a very important, and not unanticipated question regarding the types of bonds in the archival bond. Specifically what types of bonds exist and do these types of bonds infer that different series should be established within a collection of language resources. The clear answer is yes there are different kinds of bonds between resources, but it is less clear if there are any kinds of bonds which don't also occur in other kinds of archival collections. Establishing why something should be split remains an open area of research.

  3. Finally, there was an interesting comment which cam out in a discussion, I think deserves some research. the comment or phrase "metadata is cataloging for men". Where did this phrase get its first use? is that documented?

Web Archiving

I attended a totally fascinating presentation on Web Archiving by Matt Kelly of Drexel today.

Here are some resources I need to follow up on:

IIPC Memento Aggregator

Book chapter

Collection aboutness in OLAC

Defining aboutness of a collection is a challenge. From a philosophical point of view, this is even harder for collections in anthropological linguistics. These kinds of collections are not assembled for the sake of their "about-ness" but rather for the sake of their "is-ness". A collection in a museum might be about 19th century trains but such collections rarely contain the trains themselves. So, does this mean that linguistic collections are really about the people groups the speech is representing? and then the of-ness is the speech? Then linguists come along and write about the grammar of the language, and that is about the language? Often original stories will have an aboutness meaning which is never recorded in metadata. This needs to change.

This thought needs to be explored with MARC 655 $x and $v sub-fields. see:

see email:

Building a discourse server

pfaffman/discourse-doi-resolver <-- content in OLAC

Position conversations within the OLAC search space.

This might be a way forward to an OAI-PMH repo: another option is to use a query mechanism in the JSON api to get all threads and treat these threads as resources for description.

I wonder how many layers a tag-group can have...

Legal and privacy considerations:

Your Discourse forum and the GDPR

Import from other discourse instances:

Self-hosting, self-managed, hosted, serviced,

Discourse Server Maintenance


Discourse Hosting Plans and Pricing

Dedicated email: