Skip to primary content
Skip to secondary content

The Journeyler

A walk through: Life, Leadership, Linguistics, Language Documentation, WordPress, and OS X (and a bit of Marketing & Business Administration)

The Journeyler

Main menu

  • Home
  • CV/Resume
  • Family
    • Katja
    • Hugh V
  • Location
    • Cartography
    • Geo-Tagging
    • GPS
  • Language Documentation
    • Linguistics
    • Digital Archival
  • Visiting Collections
    • Photography
    • Open Drafts
    • Posts to move to another website
  • Archives

Tag Archives: metadata

Post navigation

← Older posts
Newer posts →

HTML Metadata tags and Dublin Core

Posted on November 22, 2022 by Hugh Paterson III
Reply

https://infosci.um.ac.ir/index.php/RRP/article_27183.html?lang=en
https://doi.org/10.1080/13614579709516904
https://archive.ifla.org/documents/libraries/cataloging/metadata/drusch.pdf
http://www.ariadne.ac.uk/issue/10/dublin/
https://www.sid.ir/paper/102563/en
https://crln.acrl.org/index.php/crlnews/article/view/18374/20723
https://mn.gov/bridges/user2study.pdf
http://eprints.rclis.org/7319/
http://eprints.rclis.org/7319/1/Search_Engines_and_Resource_Discovery.pdf

What Happened to Dublin Core as an SEO Factor?


https://doi.org/10.1177/0165551504045851
https://www.seroundtable.com/google-on-using-dublin-core-schema-29002.html

https://espace.library.uq.edu.au/data/UQ_7837/final.html
https://espace.library.uq.edu.au/view/UQ:7837 <-- What is it with repositories and asking for human verification? Isn't the point of these to be machine crawlable...? Same thing with SIL

https://www.seroundtable.com/google-on-using-dublin-core-schema-29002.html

Subjects around DC: https://muse.jhu.edu/article/520975/pdf

Posted in Other Journals | Tagged Dublin core, HTML, in_Obsidian, metadata | Leave a reply

Subjects for images

Posted on November 5, 2022 by Hugh Paterson III
Reply

Somebody told me once that pictures don't have subjects because of the is-ness about-ness separation:

I disagree. Here are some things from the literature.

https://drum.lib.umd.edu/bitstream/handle/1903/15063/Describing_Visual_Materials_in_the_Digital_Age_Hamburger.pdf

http://duspeccoll.github.io/local_authority

https://journals.ala.org/index.php/lrts/article/viewFile/7564/10462

https://listserv.loc.gov/cgi-bin/wa?A2=ind0501&L=MARC&P=4254

https://inevermetadataididntlike.wordpress.com/category/library-of-congress-genreform-terms/

http://netanelganin.com/projects/lcgft/lcgftType.html

https://cornerstone.lib.mnsu.edu/cgi/viewcontent.cgi?article=1000&context=olac-publications

https://www.isko.org/cyclo/subject

Posted in Other Journals | Tagged about-ness, Images, is-ness, metadata, subjects, UNT-notes | Leave a reply

MODS and element order

Posted on November 5, 2022 by Hugh Paterson III
Reply

Is element order a thing in XML? That is is the order of appearance of sibling elements within an XML document critical?

https://stackoverflow.com/questions/28268696/is-the-order-of-two-siblings-implementation-dependent
https://xmltutorial.info/xml/node-relationships/

Here is the response from the XSD author:

I haven’t read the entire thread, but I take it the question is whether elements in a mods record need to be in a particular order (i.e. in the order that they are listed in the schema). They don’t.

In the MODS schema, look for:

*********************************************************************** ** Definition of a single MODS record ** ********************************************************************** 

And following that:

<xs:element name="mods" type="modsDefinition"/>
<!-- -->
<xs:complexType name="modsDefinition">
<xs:group ref="modsGroup" maxOccurs="unbounded"/>

……….

This says: a MODS record consists of one or more elements from the “modsGroup (at least one, because that is the default if there is no minOccurs, and as many as you want because maxOccurs=“unbounded”) enclosed within a element.

Next, look for:

*********************************************************************** ** These are the "top level" MODS elements ** ********************************************************************** —>

prior to that:

<xs:group name="modsGroup”>
<xs:choice>

…. and following it is the list of elements:


<xs:element ref="abstract"/>
<xs:element ref="accessCondition"/>
<xs:element ref="classification"/>
<xs:element ref="extension"/>
<xs:element ref="genre"/>
<xs:element ref="identifier"/>
<xs:element ref="language"/>
<xs:element ref="location"/>

……………. and so on.

“Choice: says “choose any one of these elements."

So all together, it says choose an elements from the list. Any element. And then repeat as desired.

So you could choose “genre”, and then choose “classification”, and so on. Chosen in no particular order.

And then enclose your list within a

<mods>

record, in the order in which you chose the elements.

Ray

Posted in Other Journals | Tagged metadata, MODS, XML | Leave a reply

Schema.org templates

Posted on November 5, 2022 by Hugh Paterson III
Reply

Some links to some schema.org templates and documentation.

Schema Markup for Colleges and Universities

Person JSON-LD Examples

sometimes these are more useful than the official site.

Posted in Other Journals | Tagged metadata, schema.org | Leave a reply

Scraping archives for OLAC

Posted on November 2, 2022 by Hugh Paterson III
Reply

This post is a set of resources I am compiling to create a scrape of a language archive to create a Static OLAC feed.

https://www.youtube.com/watch?v=RvCBzhhydNk

https://www.kdnuggets.com/2022/02/build-web-scraper-python-5-minutes.html

The archive: http://roa.rutgers.edu/article/browse

https://www.geeksforgeeks.org/how-to-build-web-scraping-bot-in-python/

https://www.edureka.co/blog/web-scraping-with-python/

https://www.webscrapingapi.com/python-web-scraping

Text Extraction
https://www.geeksforgeeks.org/extract-text-from-pdf-file-using-python/
https://towardsdatascience.com/how-to-extract-text-from-pdf-245482a96de7
https://betterprogramming.pub/how-to-convert-pdfs-into-searchable-key-words-with-python-85aab86c544f

OCR
https://www.javatpoint.com/how-to-read-contents-of-pdf-using-ocr-in-python
https://www.geeksforgeeks.org/python-reading-contents-of-pdf-using-ocr-optical-character-recognition/
https://pypi.org/project/ocrmypdf/
https://towardsdatascience.com/extracting-text-from-scanned-pdf-using-pytesseract-open-cv-cd670ee38052
https://stackabuse.com/applying-ocr-to-a-scanned-pdf-in-python-using-borb/

NER

Named Entity Recognition (NER) in Python with Spacy

File Type Detection
https://github.com/ahupp/python-magic
https://www.geeksforgeeks.org/determining-file-format-using-python/
https://stackoverflow.com/questions/10937350/how-to-check-type-of-files-without-extensions

XML parsing

https://pypi.org/project/defusedxml/
https://www.tutorialspoint.com/python/python_xml_processing.htm

Posted in Other Journals | Tagged bs4, metadata, OLAC, Python, R-90 | Leave a reply

MARC profiles for Language Archives

Posted on October 31, 2022 by Hugh Paterson III
Reply

AACR2 and RDA both constitute application profiles using the same database structure known as MARC. MARC defines the fields and the expected values within those fields (type control) while AACR2 and RDA compose definitions of cognitive models and data fingerprinting (not the LIS terms for these concepts). By cognitive model I mean the mental representation of entities and their relationships and by fingerprinting of data I mean that some artifacts are "well described" when various fields are employed. E.g., a book description needs a publisher, while a manuscript does not.

AACR2 and RDA both constitute application profiles to which the documentation is only provided on a subscription basis. This is a pay-to-play game. This sort of game is not well received by the language documentation community. These facts do no mean that preservation organizations need to avoid MARC, rather a MARC profile could be established and documented in the open.

When considering the future of OLAC and language resource archiving an outstanding question emerges, is this sort of profile something that is of interest within the community?

Posted in Other Journals | Tagged in_Obsidian, MARC, metadata, OLAC, RDA | Leave a reply

Dublin Core Subject field

Posted on October 31, 2022 by Hugh Paterson III
Reply

Dublin Core has a subject element. But what constitutes a subject?

Two points on this:

  1. Subject-hood is a complex notion. As pointed out by Birger Hjørland included in this concept can be both is-ness and about-ness. LIS theory can say to divide these concepts, but if Dublin Core as a descriptive framework does not allow this, then the notion of subjecthood should be assumed to include both notions.
  2. Pictures (still images, including paintings) are complex when evaluating their subject hood. First, when a picture depicts something then it is reasonable to say that the picture is about that thing, as well as the picture is something...

I am suggesting that Dublin Core as a standard does not distinguish between about-ness and is-ness with regard to subject. And to further make matters complicated about-ness and is-ness merge more in visual media than in other types of print based media.

The following articles indirectly address the distinction of about-ness and is-ness or address about-ness in visual media.

Rushton, M. Public Funding of Controversial Art. Journal of Cultural Economics 24, 267–282 (2000). https://doi.org/10.1023/A:1007682121108

Wall, J. M. (2005). The Medium & the Message: Theology and Film. Theology Today, 62(1), 74–77. https://doi.org/10.1177/004057360506200109

Wanda Klenczon & Paweł Rygiel (2014) Librarian Cornered by Images, or How to Index Visual Resources, Cataloging & Classification Quarterly, 52:1, 42-61, DOI: 10.1080/01639374.2013.848123

in a book
Emerging Frameworks and Methods: CoLIS 4 : Proceedings of the Fourth

Andrea Witcomb (1997) On the Side of the Object: an Alternative Approach to Debates About Ideas, Objects and Museums, Museum Management and Curatorship, 16:4, 383-399, DOI: 10.1080/09647779700501604

Wang, X., Song, N., Liu, X. and Xu, L. (2021), "Data modeling and evaluation of deep semantic annotation for cultural heritage images", Journal of Documentation, Vol. 77 No. 4, pp. 906-925. https://doi.org/10.1108/JD-06-2020-0102

Posted in Other Journals | Tagged Dublin core, metadata, OLAC, UNT-notes | Leave a reply

OLAC and Library of Congress Demographic Group Terms

Posted on October 31, 2022 by Hugh Paterson III
Reply

Library of Congress Demographic Group Termshttps://id.loc.gov/authorities/demographicTerms.html

Posted in Other Journals | Tagged audience, metadata, OLAC | Leave a reply

OLAC and some genre terms

Posted on October 31, 2022 by Hugh Paterson III
Reply

I need to explore connoncial equivalences between some genre terms and OLAC terms.

For example: https://www.loc.gov/standards/valuelist/marcgt.html and http://www.loc.gov/standards/sourcelist/genre-form.html

Posted in Other Journals | Tagged Genre terms, MARC, metadata, OLAC | Leave a reply

Raw questions when looking at the MODS documentation

Posted on October 30, 2022 by Hugh Paterson III
Reply

https://www.loc.gov/standards/mods/userguide/identifier.html

MODS documentation does not explain an expected syntax for the examples. This would be very helpful. What is the expected syntax for typeURI?

http://loc.gov/standards/mods/userguide/typeofresource.html
manuscript

Definition
A resource that is written in handwriting or typescript.
Application
This attribute is used as manuscript="yes" when a collection contains manuscripts and is considered generally to be manuscript in nature, and for individual manuscripts.

A collection is not the same thing as a DC collection, so what does the XSD say?

Where does the OLAC grene terms map to this list?

https://www.loc.gov/standards/valuelist/marcgt.html

Posted in Other Journals | Tagged metadata, MODS | Leave a reply

Post navigation

← Older posts
Newer posts →

Activity

June 2025
M T W T F S S
 1
2345678
9101112131415
16171819202122
23242526272829
30  
« Jan    

I’ve been saying

  • Chasing subsets
  • New mouse buttons
  • Moving Apple notes
  • Academic Heritage in MARC records
  • Converting DC Subjects to Schema.org
  • Language Documentation Gear
  • Serials, MARC Records and RDA Core
  • Font Modulator
  • OLAC CMS options via XML
  • OLAC Collection Description and Linked Data Terms
  • Zotero Plugins
  • OLAC and User Tasks

Say What?

  • David Clews on German Waters
  • Jeff Pitts on Kinder Eier
  • Jeff on Plasticification of soil
  • Thoughts on file formats and file names in language documentation projects and archiving | The Journeyler on The Workflow Management for Linguists
  • Hugh Paterson III on Types of Linguistic Maps: The Mapping of linguistic Features and Researcher Interactivity

One should not consider the content on this website to be an official opinion of any company associated with me. These posts are solely my opinion.

Proudly powered by WordPress

© 2005-2025 Hugh Paterson III All Rights Reserved.
By submitting a comment here you grant this site a perpetual license to reproduce your Words, Name & Website URL in attribution.
Details of your viewing experience maybe retained and used. -- Copyright notice by Blog Copyright