Re-implementing the OLAC validator

The OLAC validator runs off of an unit of software which has the heartbleed security vulnerability. Thinking about implementing a validator the following software comes to mind. https://github.com/zimeon/oaipmh-validator There was also an Online OAI-PMH validator from a former engineer on the Europeana project. I think he is based in Greece. His solution is not open source, but he mentioned that he would consider adding the OLAC profile. https://validator.oaipmh.com/

It would be good to see what other OAI-PMH validators look like and how submitters expect to interact with them.

https://validador.rcaap.pt/validator2/?locale=en
http://oval.base-search.net/
https://doi.org/10.17700/jai.2016.7.1.277
https://rdamsc.bath.ac.uk/msc/t64; https://www.openaire.eu/validator-registration-guide ; https://github.com/EuroCRIS/openaire-cris-validator; https://www.fosteropenscience.eu/content/openaire-compatibility-validator-presentation
http://oai.clarin-pl.eu/

OAI-PMH in golang

I was looking at the maturity of golang for data science and for projects in goLang which enable the interaction with OAI-PMH feeds. In my case working with XML is fairly important. I don't see in this XML example how to extract attributes and put those in the struct.

https://pkg.go.dev/github.com/delving/hub3/ikuzo/service/x/oaipmh
https://github.com/renevanderark/goharvest

Building a discourse server

pfaffman/discourse-doi-resolver
https://meta.discourse.org/t/sign-in-to-discourse-using-orcid/105488/4
https://meta.discourse.org/t/discourse-category-experts/190814
https://meta.discourse.org/t/custom-category-boxes/144865
https://meta.discourse.org/t/mentionables/192948 <-- content in OLAC
https://meta.discourse.org/t/admin-guide-to-tags-in-discourse/121041

Position conversations within the OLAC search space.

https://blog.discourse.org/2021/11/discourse-forum-seo/
https://meta.discourse.org/t/does-discourse-support-google-structured-data-i-e-schema-org/58249
https://meta.discourse.org/t/beginners-guide-to-seo-with-discourse/146655/4
https://meta.discourse.org/t/beginners-guide-to-seo-with-discourse/146655/7
https://meta.discourse.org/t/discourse-sitemap/40348

This might be a way forward to an OAI-PMH repo: https://github.com/discourse/discourse-sitemap another option is to use a query mechanism in the JSON api to get all threads and treat these threads as resources for description. https://meta.discourse.org/t/discourse-rest-api-documentation/22706

I wonder how many layers a tag-group can have... https://docs.discourse.org/#tag/Tags/operation/updateTagGroup

https://meta.discourse.org/t/locations-plugin/69742

Legal and privacy considerations:

https://meta.discourse.org/t/legal-tools-plugin/87966/26

Your Discourse forum and the GDPR

Import from other discourse instances:
https://meta.discourse.org/t/create-download-and-restore-a-backup-of-your-discourse-database/122710

Self-hosting, self-managed, hosted, serviced,

https://meta.discourse.org/t/comparing-hosting-providers/100034/13

Discourse Server Maintenance


https://meta.discourse.org/t/recommended-hosting-providers-for-self-hosters/79562

Pricing:
https://discourse.org/pricing

https://github.com/discourse/discourse/blob/main/docs/INSTALL-cloud.md

Discourse Hosting Plans and Pricing

Dedicated email:
https://messagebird.com/pricing/email-sending

Django modules and links

Django application to collect submitted DOIs, acquire their API provided metadata (Bibliographic metadata and citation graph metadata), allow limited (specified) annotation, and then make those records harvestable via OAI-PMH. Language Resource tagger—Adding a layer of language related metadata to published resources.

Some Django modules for OAI-PMH
https://github.com/saw-leipzig/foaipmh
https://github.com/jnphilipp/django_oai_pmh

User Authentication
https://github.com/ubffm/django-orcid
https://django.fun/en/docs/social-docs/0.1/backends/orcid/

Crossref
https://github.com/fabiobatalha/crossrefapi

Introducing Crossref, the basics

Database Versioning
This depends on how the DB is set up. If we only have one record per item or one record per state... This needs more definition.
https://djangopackages.org/grids/g/versioning/
https://www.wpbeginner.com/beginners-guide/complete-guide-to-wordpress-post-revisions/

Form Builders
https://djangopackages.org/grids/g/form-builder/

Some Javascript tools for creating the specific forms needed:
https://github.com/HughP/dublin-core-generator
https://nsteffel.github.io/dublin_core_generator/generator.html

Markdown for documentation
https://neutronx.github.io/django-markdownx/

Bibtex
https://bibtexparser.readthedocs.io/en/master/
https://github.com/sciunto-org/python-bibtexparser
https://github.com/jnphilipp/bibliothek
https://github.com/lucastheis/django-publications <-- also check the network as "improvements" are all over the place.
Other names include:
* Babybib
* Pybtex
* Pybibliographer

APIs

ORCID
https://github.com/ORCID/python-orcid

API Tutorial: Searching the ORCID registry

Crossref API doc
https://github.com/CrossRef/rest-api-doc/blob/master/demos/crossref-api-demo.ipynb
Crossref types: https://www.crossref.org/documentation/register-maintain-records/
https://api.crossref.org/swagger-ui/index.html#/Types/get_types__id__works

Others — Mostly citation and references
http://www.scholix.org/
https://scholexplorer.openaire.eu/#/query/page=5/q=language
https://crossref.gitlab.io/knowledge_base/products/event-data/
FatCat https://fatcat.wiki/
InternetArchive Scholar https://scholar.archive.org/
Thor project https://project-thor.readme.io/docs/introduction-for-integrators
Corsscite.org
Semantic Scholar API https://api.semanticscholar.org/api-docs/graph
https://core.ac.uk/
https://opencitations.net/
https://unpaywall.org/ --> see: http://musingsaboutlibrarianship.blogspot.com/2017/11/using-oadoi-crossref-event-data-api-to.html
https://openalex.org/
https://arxiv.org/help/api/index
https://www.aminer.org/citation
https://www.aminer.org/download
https://open.aminer.cn/
https://analytics.hathitrust.org/datasets#top
https://pro.dp.la/developers/api-codex
https://pro.europeana.eu/page/apis

LCSH
https://github.com/edsu/id

MARC
For generating an ingesting MARC records
https://pymarc.readthedocs.io/en/latest/

Zotero
https://github.com/urschrei/pyzotero

Overview see: https://researchguides.smu.edu.sg/api-list/scholarly-metadata-api

ISSNs
ISSN.org is supposed to have an API.. but not sure if they do.
https://portal.issn.org/resource/ISSN/1904-0008
Any request to the portal may be automated thanks to the use of REST protocol. The download of results is also automated. This service is restricted to subscribing users. Please contact sales [at] issn.org for more information.
https://portal.issn.org/node/170

https://portal.issn.org/resource/ISSN/2549-5089#
https://portal.issn.org/resource/ISSN/2549-5089?format=json
We could also slurp the HTML for the sameAs links to other DBs if needed.

JATS
https://pypi.org/project/jatsgenerator/
https://stackoverflow.com/questions/42084165/extracting-text-from-jats-xml-file-using-python
https://github.com/sibils/jats-parser

Pandas
https://pypi.org/project/django-pandas/

Beautiful Soup

There is the issue of how do we add to a Dublin Core OAI record how it was changed over time.... I need to architect this out.

Record Provenance:
[]Explore
https://www.w3.org/TR/prov-dc/
https://www.w3.org/2011/prov/track/issues/607?changelog
http://www.ukoln.ac.uk/metadata/dcmi/collection-provenance/
https://edoc.hu-berlin.de/bitstream/handle/18452/2727/332.pdf?sequence=1&isAllowed=y
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4177195/
https://www.loc.gov/standards/mods/userguide/recordinfo.html
https://tsl.access.preservica.com/tslac-digital-preservation-framework/qualified-dublin-core-schema/
https://dl.acm.org/doi/10.5555/2770897.2770924

Exposing DOI metadata provenance


https://dgarijo.com/papers/dc2011.pdf
https://ceur-ws.org/Vol-670/paper_3.pdf
https://ecommons.cornell.edu/bitstream/handle/1813/55327/Encoding%20Provenance%20for%20Social%20Science%20Data-final.pdf?sequence=3&isAllowed=y

Views:
1. login with ORCID
2. query APIs (DOIs, ISBNs, ISSNs, ORCID, WikiData, etc.)
3. results display and annotation
4. submission
5. List of past submissions
6. update past submission screen (same as #3?)

If we ran a module like this:
https://pybliometrics.readthedocs.io/en/latest/classes/SerialTitle.html

Then we could take a reading on where the least spoken languages appear in the most highly ranked journals and determine if there was a bias or a loss to science.

Data Examples:

Have been moved to:
https://github.com/HughP/CrossRef-to-OLAC-data-examples

PDF Extraction:
https://levelup.gitconnected.com/scrap-data-from-website-and-pdf-document-for-django-app-fa8f37010085
https://towardsdatascience.com/how-to-extract-pdf-data-in-python-876e3d0c288
https://stackoverflow.com/questions/71850349/download-a-pdf-from-url-edit-it-an-render-it-in-django
https://stackoverflow.com/questions/48882768/django-reading-pdf-files-content
https://www.geeksforgeeks.org/working-with-pdf-files-in-python/

PDF Creation:
https://docs.djangoproject.com/en/4.1/howto/outputting-pdf/
https://jeltef.github.io/PyLaTeX/current/examples/header.html

NER:




https://johnfraney.github.io/django-ner-trainer/settings/

Named Entity Recognition (NER) in Python with Spacy

Other:
https://prodi.gy/

https://realpython.com/testing-in-django-part-1-best-practices-and-examples/

here is a django app for controlling URIs for linked data vocabularies.
https://github.com/unt-libraries/django-controlled-vocabularies
as seen here https://digital2.library.unt.edu/vocabularies/agent-qualifiers/

And here is a one for source authority records.
https://github.com/unt-libraries/django-name
as seen here: https://digital2.library.unt.edu/name/nm0000001/

Link Checker
https://github.com/Kaltsoon/dead-link-checker
https://pypi.org/project/django-linkcheck/
https://github.com/bartdag/pylinkvalidator
https://stackoverflow.com/questions/43264291/in-django-how-can-i-unit-test-all-links-recursively-every-view-check-for-200-o