Dublin Core Uses

Posted on December 26, 2022 by Hugh Paterson III

There are several Dublin Core Uses... but what are the models behind them? Are they the same? Are they different? If they are different then does this mean that Dublin Core is more like a brand name than a metadata standard?

RSS: https://web.resource.org/rss/1.0/modules/dcterms/
HTML: https://www.dublincore.org/specifications/dublin-core/dcq-html/; https://www.dublincore.org/specifications/dublin-core/dc-html/
HTML: https://www.hjp.at/doc/rfc/rfc2731.html ; https://www.ietf.org/rfc/rfc2731.txt

Note that in https://www.ietf.org/rfc/rfc2731.txt they use DC1.0

When another application profile "borrows" the DC property/term/element... how do they borrow it? https://exiv2.org/tags-xmp-dwc.html

Open Cataloging Rules

Posted on December 22, 2022 by Hugh Paterson III

This seems promising for OLAC.

http://d-scholarship.pitt.edu/42973/
https://www.ncserialsconference.org/slides/2022/2022-1B.pdf
https://opencatalogingrules.org/

Cataloging Serials

Posted on December 20, 2022 by Hugh Paterson III

https://archive.org/details/podcast_electronic-serials-cataloging_386018207
https://wiki.rice.edu/confluence/display/METACAT/Cataloging+Continuing+Resources
https://www.tandfonline.com/doi/full/10.1080/01639374.2017.1388324
https://www.tandfonline.com/doi/abs/10.1080/1941126X.2018.1494014
https://wiki.rice.edu/confluence/display/METACAT/Cataloging+Continuing+Resources
https://web.library.yale.edu/book/export/html/570
https://github.com/WeblateOrg/language-data

https://alcts.libguides.com/alcts_standards/continuing_resources

https://www.tandfonline.com/doi/abs/10.1080/01639374.2017.1388324?journalCode=wccq20
https://alcts.libguides.com/alcts_standards/continuing_resources
https://www.loc.gov/aba/pcc/conser/word/Module0.docx
https://www.loc.gov/marc/bibliographic/bd008s.html
https://www.ala.org/alcts/confevents/upcoming/webinar/031815
https://archive.org/details/podcast_advanced-serials-cataloging-_386018196
https://archive.org/details/podcast_electronic-serials-cataloging_386018207
https://www.ala.org/alcts/mgrps/crs/

RDA & WEMI
http://www.mlalibrary.org/resources/Documents/Quickand%20DirtyRDA_MLA2016_TracyPizzi.pdf

German Grammar books to look at for OLAC examples.

Posted on November 30, 2022 by Hugh Paterson III

Here are some links about German Grammar Books which are interesting for use in OLAC examples.

German Grammar for English Speakers

https://www.thoughtco.com/best-german-grammar-books-4150500

Fuzzy matching

Posted on November 19, 2022 by Hugh Paterson III

https://towardsdatascience.com/fuzzy-string-matching-in-python-68f240d910fe

https://www.activestate.com/blog/how-to-implement-fuzzy-matching-in-python/#:~:text=As%20mentioned%20above%2C%20fuzzy%20matching,strings%20are%20to%20one%20another.

Topic Modeling in Python

Posted on November 19, 2022 by Hugh Paterson III

What if my import pd data array was the OLAC metadata schema?

https://www.semanticscholar.org/paper/A-Gentle-Introduction-to-Topic-Modeling-Using-Saxton/38742c56eadfdf11fb7218f7702c8fccfc78bd95

https://gist.github.com/umbertogriffo/5041b9e4ec6c3478cef99b8653530032

https://towardsdatascience.com/contextualized-topic-modeling-with-python-eacl2021-eacf6dfa576

Beginners Guide to Topic Modeling in Python

How to Use Bertopic for Topic Modeling and Content Analysis?

https://melaniewalsh.github.io/Intro-Cultural-Analytics/05-Text-Analysis/08-Topic-Modeling-Text-Files.html

https://asandeepc.bitbucket.io/courses/inls613_summer2019/lectures/08-lda_topic_modeling.pdf

http://derekgreene.com/slides/topic-modelling-with-scikitlearn.pdf

https://ourcodingclub.github.io/tutorials/topic-modelling-python/

https://stackabuse.com/python-for-nlp-topic-modeling/

https://www.toptal.com/python/topic-modeling-python

https://towardsdatascience.com/end-to-end-topic-modeling-in-python-latent-dirichlet-allocation-lda-35ce4ed6b3e0

ELAR roles and OLAC

Posted on November 12, 2022 by Hugh Paterson III

While doing my Masters Thesis, I took a look at the contributor roles declared for various works. One thing I noticed is that even though Stuart McGill contributed two corpora to ELAR when these corpora get translated to OALC the translation mucks the metadata so that only one resource shows up with his name.

http://dla.library.upenn.edu/dla/olac/search.html?fq=archive_facet%3A%22Endangered%20Languages%20Archive%22%20AND%20contributor_facet%3A%22Stuart%20McGill%22

I asked the archive director about this and my understanding/recollection from that conversation is that metadata was piped through the TLA. https://lat1.lis.soas.ac.uk/ds/asv/?0&openpath=538104 I think the above record was also at the preveious link... but it doesn't resolve currently and there have been technology stack changes at ELAR since my Thesis was released. Here is the interface in the Internet Archive for a different record.
http://web.archive.org/web/20200616011131/https://lat1.lis.soas.ac.uk/ds/asv/;jsessionid=00127741134CA14440824DA736655134?0&openhandle=2196/00-0000-0000-0012-D580-4

Django modules and links

Posted on November 8, 2022 by Hugh Paterson III

Django application to collect submitted DOIs, acquire their API provided metadata (Bibliographic metadata and citation graph metadata), allow limited (specified) annotation, and then make those records harvestable via OAI-PMH. Language Resource tagger—Adding a layer of language related metadata to published resources.

Some Django modules for OAI-PMH
https://github.com/saw-leipzig/foaipmh
https://github.com/jnphilipp/django_oai_pmh

https://pypi.org/user/jnphilipp/ his topic extraction module looks interesting.

Also look at the xsd schema here https://github.com/saw-leipzig/foaipmh/blob/5b15d5cc4700a3cccf497c47218c2fba6b3421d5/entrypoint.prod.sh#L5

Metadata utility for OAI-PMH

https://combine.readthedocs.io/en/master/configuration.html

User Authentication
https://github.com/ubffm/django-orcid
https://django.fun/en/docs/social-docs/0.1/backends/orcid/

Crossref
https://github.com/fabiobatalha/crossrefapi

Introducing Crossref, the basics

Database Versioning
This depends on how the DB is set up. If we only have one record per item or one record per state... This needs more definition.
https://djangopackages.org/grids/g/versioning/
https://www.wpbeginner.com/beginners-guide/complete-guide-to-wordpress-post-revisions/

Form Builders
https://djangopackages.org/grids/g/form-builder/

Some Javascript tools for creating the specific forms needed:
https://github.com/HughP/dublin-core-generator
https://nsteffel.github.io/dublin_core_generator/generator.html

Markdown for documentation
https://neutronx.github.io/django-markdownx/

Bibtex
https://bibtexparser.readthedocs.io/en/master/
https://github.com/sciunto-org/python-bibtexparser
https://github.com/jnphilipp/bibliothek
https://github.com/lucastheis/django-publications <-- also check the network as "improvements" are all over the place.
Other names include:
* Babybib
* Pybtex
* Pybibliographer

APIs

ORCID
https://github.com/ORCID/python-orcid

API Tutorial: Searching the ORCID registry

Crossref API doc
https://github.com/CrossRef/rest-api-doc/blob/master/demos/crossref-api-demo.ipynb
Crossref types: https://www.crossref.org/documentation/register-maintain-records/
https://api.crossref.org/swagger-ui/index.html#/Types/get_types__id__works

Others — Mostly citation and references
http://www.scholix.org/
https://scholexplorer.openaire.eu/#/query/page=5/q=language
https://crossref.gitlab.io/knowledge_base/products/event-data/
FatCat https://fatcat.wiki/
InternetArchive Scholar https://scholar.archive.org/
Thor project https://project-thor.readme.io/docs/introduction-for-integrators
Corsscite.org
Semantic Scholar API https://api.semanticscholar.org/api-docs/graph
https://core.ac.uk/
https://opencitations.net/
https://unpaywall.org/ --> see: http://musingsaboutlibrarianship.blogspot.com/2017/11/using-oadoi-crossref-event-data-api-to.html
https://openalex.org/
https://arxiv.org/help/api/index
https://www.aminer.org/citation
https://www.aminer.org/download
https://open.aminer.cn/
https://analytics.hathitrust.org/datasets#top
https://pro.dp.la/developers/api-codex
https://pro.europeana.eu/page/apis

LCSH
https://github.com/edsu/id

MARC
For generating an ingesting MARC records
https://pymarc.readthedocs.io/en/latest/

Zotero
https://github.com/urschrei/pyzotero

Overview see: https://researchguides.smu.edu.sg/api-list/scholarly-metadata-api

ISSNs
ISSN.org is supposed to have an API.. but not sure if they do.
https://portal.issn.org/resource/ISSN/1904-0008
Any request to the portal may be automated thanks to the use of REST protocol. The download of results is also automated. This service is restricted to subscribing users. Please contact sales [at] issn.org for more information.
https://portal.issn.org/node/170

https://portal.issn.org/resource/ISSN/2549-5089#
https://portal.issn.org/resource/ISSN/2549-5089?format=json
We could also slurp the HTML for the sameAs links to other DBs if needed.

JATS
https://pypi.org/project/jatsgenerator/
https://stackoverflow.com/questions/42084165/extracting-text-from-jats-xml-file-using-python
https://github.com/sibils/jats-parser

Pandas
https://pypi.org/project/django-pandas/

Beautiful Soup

There is the issue of how do we add to a Dublin Core OAI record how it was changed over time.... I need to architect this out.

Record Provenance:
[]Explore
https://www.w3.org/TR/prov-dc/
https://www.w3.org/2011/prov/track/issues/607?changelog
http://www.ukoln.ac.uk/metadata/dcmi/collection-provenance/
https://edoc.hu-berlin.de/bitstream/handle/18452/2727/332.pdf?sequence=1&isAllowed=y
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4177195/
https://www.loc.gov/standards/mods/userguide/recordinfo.html
https://tsl.access.preservica.com/tslac-digital-preservation-framework/qualified-dublin-core-schema/
https://dl.acm.org/doi/10.5555/2770897.2770924
https://blog.datacite.org/exposing-doi-metadata-provenance/
https://dgarijo.com/papers/dc2011.pdf
https://ceur-ws.org/Vol-670/paper_3.pdf
https://ecommons.cornell.edu/bitstream/handle/1813/55327/Encoding%20Provenance%20for%20Social%20Science%20Data-final.pdf?sequence=3&isAllowed=y

Views:
1. login with ORCID
2. query APIs (DOIs, ISBNs, ISSNs, ORCID, WikiData, etc.)
3. results display and annotation
4. submission
5. List of past submissions
6. update past submission screen (same as #3?)

If we ran a module like this:
https://pybliometrics.readthedocs.io/en/latest/classes/SerialTitle.html

Then we could take a reading on where the least spoken languages appear in the most highly ranked journals and determine if there was a bias or a loss to science.

Data Examples:

Have been moved to:
https://github.com/HughP/CrossRef-to-OLAC-data-examples

PDF Extraction:
https://levelup.gitconnected.com/scrap-data-from-website-and-pdf-document-for-django-app-fa8f37010085
https://towardsdatascience.com/how-to-extract-pdf-data-in-python-876e3d0c288
https://stackoverflow.com/questions/71850349/download-a-pdf-from-url-edit-it-an-render-it-in-django
https://stackoverflow.com/questions/48882768/django-reading-pdf-files-content
https://www.geeksforgeeks.org/working-with-pdf-files-in-python/

PDF Creation:
https://docs.djangoproject.com/en/4.1/howto/outputting-pdf/
https://jeltef.github.io/PyLaTeX/current/examples/header.html

NER:

https://johnfraney.github.io/django-ner-trainer/settings/

Named Entity Recognition (NER) in Python with Spacy

Other:
https://prodi.gy/

https://realpython.com/testing-in-django-part-1-best-practices-and-examples/

here is a django app for controlling URIs for linked data vocabularies.
https://github.com/unt-libraries/django-controlled-vocabularies
as seen here https://digital2.library.unt.edu/vocabularies/agent-qualifiers/

And here is a one for source authority records.
https://github.com/unt-libraries/django-name
as seen here: https://digital2.library.unt.edu/name/nm0000001/

Link Checker
https://github.com/Kaltsoon/dead-link-checker
https://pypi.org/project/django-linkcheck/
https://github.com/bartdag/pylinkvalidator
https://stackoverflow.com/questions/43264291/in-django-how-can-i-unit-test-all-links-recursively-every-view-check-for-200-o

Abstract and Table of Contents

Posted on November 7, 2022 by Hugh Paterson III

If abstract is a sample of about-ness, then a table of contents is sample if is-ness. Some have said that journal articles should not have table of contents (instructional staff at the UNT program teaching the Metadata I course). I disagree, but so does Habing, et al (2001). Sometimes more than an abstract a table of contents can deliver a substantial understanding of what an article is and is about by displaying its structure. In fact many law review articles actually include a table of contents prior to the main part of the article. Law review articles can be over 70 pages in length. An outline offers useful information to the potential reader.

An example of an outline from a linguistics article.

Roberts, David. 2011. “A Tone Orthography Typology.” Written Language & Literacy 14 (1): 82–108. doi:10.1075/wll.14.1.05rob.

Introduction
The six parameters
2.1 First parameter: Domain
2.2 Second parameter: Target
2.2.1 Tones
2.2.2 Grammar
2.2.3 Lexicon
2.2.4 Dual strategies
2.3 Third parameter: Symbol
2.3.1 Phonographic representations
2.3.2 Semiographic representations
2.4 Fourth parameter: Position
2.5 Fifth parameter: Density
2.5.1 Introduction
2.5.2 Zero density
2.5.3 Partial density
2.5.4 Exhaustive density
2.6 Sixth parameter: Depth
2.6.1 Introduction
2.6.2 Surface representation
2.6.3 Deep representation
2.6.4 Shallow (transparent) representation
Conclusion
Abbreviations
Notes
Bibliographical references

References

Thomas G. Habing, Timothy W. Cole, and William H. Mischo. 2001. Qualified Dublin Core using RDF for Sci-Tech Journal Articles. https://dli.grainger.uiuc.edu/Publications/metadatacasestudy/HabingDC2001.pdf

Opinions on OCLC and metadata ownership

Posted on November 5, 2022 by Hugh Paterson III

https://librarytechnology.org/document/7266
https://librarytechnology.org/document/7266/ownership-of-machine-readable-records-a-neglected-consideration-in-retrospective-conversion

Landgrab For Ownership Of Library Catalog Data

https://www.oclc.org/en/worldcat/cooperative-quality/policy.html

https://repository.law.uic.edu/cgi/viewcontent.cgi?article=1557&context=jitpl

https://dltj.org/article/oclc-records-use-policy-1/

https://wiki.harvard.edu/confluence/display/LibraryStaffDoc/OCLC+Institution+records+discontinuation

Matching algorithms

https://www.oclc.org/en/news/announcements/2022/worldcat-quality-enhancements.html
https://www.ohiolink.edu/content/matching_bibliographic_records_central_site

The Journeyler

A walk through: Life, Leadership, Linguistics, Language Documentation, WordPress, and OS X (and a bit of Marketing & Business Administration)

Tag Archives: OLAC