Re-implementing the OLAC validator

The OLAC validator runs off of an unit of software which has the heartbleed security vulnerability. Thinking about implementing a validator the following software comes to mind. There was also an Online OAI-PMH validator from a former engineer on the Europeana project. I think he is based in Greece. His solution is not open source, but he mentioned that he would consider adding the OLAC profile.

It would be good to see what other OAI-PMH validators look like and how submitters expect to interact with them.; ;;

OAI-PMH in golang

I was looking at the maturity of golang for data science and for projects in goLang which enable the interaction with OAI-PMH feeds. In my case working with XML is fairly important. I don't see in this XML example how to extract attributes and put those in the struct.

Building a discourse server

pfaffman/discourse-doi-resolver <-- content in OLAC

Position conversations within the OLAC search space.

This might be a way forward to an OAI-PMH repo: another option is to use a query mechanism in the JSON api to get all threads and treat these threads as resources for description.

I wonder how many layers a tag-group can have...

Legal and privacy considerations:

Your Discourse forum and the GDPR

Import from other discourse instances:

Self-hosting, self-managed, hosted, serviced,

Discourse Server Maintenance


Discourse Hosting Plans and Pricing

Dedicated email:

Django modules and links

Django application to collect submitted DOIs, acquire their API provided metadata (Bibliographic metadata and citation graph metadata), allow limited (specified) annotation, and then make those records harvestable via OAI-PMH. Language Resource tagger—Adding a layer of language related metadata to published resources.

Some Django modules for OAI-PMH

User Authentication


Introducing Crossref, the basics

Database Versioning
This depends on how the DB is set up. If we only have one record per item or one record per state... This needs more definition.

Form Builders

Some Javascript tools for creating the specific forms needed:

Markdown for documentation

Bibtex <-- also check the network as "improvements" are all over the place.
Other names include:
* Babybib
* Pybtex
* Pybibliographer



API Tutorial: Searching the ORCID registry

Crossref API doc
Crossref types:

Others — Mostly citation and references
InternetArchive Scholar
Thor project
Semantic Scholar API --> see:


For generating an ingesting MARC records


Overview see:

ISSNs is supposed to have an API.. but not sure if they do.
Any request to the portal may be automated thanks to the use of REST protocol. The download of results is also automated. This service is restricted to subscribing users. Please contact sales [at] for more information.
We could also slurp the HTML for the sameAs links to other DBs if needed.



Beautiful Soup

There is the issue of how do we add to a Dublin Core OAI record how it was changed over time.... I need to architect this out.

Record Provenance:

Exposing DOI metadata provenance

1. login with ORCID
2. query APIs (DOIs, ISBNs, ISSNs, ORCID, WikiData, etc.)
3. results display and annotation
4. submission
5. List of past submissions
6. update past submission screen (same as #3?)

If we ran a module like this:

Then we could take a reading on where the least spoken languages appear in the most highly ranked journals and determine if there was a bias or a loss to science.

Data Examples:

Have been moved to:

PDF Extraction:

PDF Creation:


Named Entity Recognition (NER) in Python with Spacy


here is a django app for controlling URIs for linked data vocabularies.
as seen here

And here is a one for source authority records.
as seen here:

Link Checker