MODS language code usage…

Posted on November 1, 2022 by Hugh Paterson III

With regard to MODS 3.8 documentation viewable here: https://www.loc.gov/standards/mods/userguide/attributes.html#lang

The current documentation states the following:

xml:lang @xml:lang serves the same purpose as @lang, but follows the W3C documentation that indicates using the IANA language subtag registry, which includes codes from the ISO language and script standards.

This is confusing (and my recommendation is a revision similar to what I provide below).

Reason for confusion: The current documentation can be read to indicate that the IANA language subtag registry is used with the xml:lang attribute. This is really a mischaracterization of the XML/w3c specification. The w3c XML specification specifically states to use BCP-47 valid tags which provides a whole host of other valid tags than just the tags found in the IANA language subtag registry. Rather than pointing MODS users directly to the IANA registry, the more useful thing would be to point them to the BCP-47 documentation. It is not until one reads the BCP-47 documentation that one finds that the IANA language subtag registry is a valid option for use, but more importantly BCP-47 explains how to use the IANA language subtag registry in a valid way. BCP-47 also explains how to use the other components of the BCP-47 recommendation which may be beneficial to understanding some of the tags in the IANA language subtag registry.

Link to BCP-47: https://www.rfc-editor.org/info/bcp47

Suggested rewording: @xml:lang serves the same purpose as @lang, but follows the W3C documentation that requires using IETF BCP-47 compliant language tags. IETF BCP-47 provides instruction on how to construct valid tags for this field. BCP-47 draws upon ISO 639-1, ISO 639-2, ISO 639-3, ISO 639-5, ISO 3166-1, UN M.49, ISO 15924, and IANA language subtag registry.

Then in another section:

Page: https://loc.gov/standards/mods/userguide/language.html#languageterm
In the below replicated example found on the page linked to above, there is a logical error, rendering the example invalid. Providing invalid examples (with out marking them) in educational materials is problematic.
RFC5646 is the current instantiation of BCP-47. BCP-47 is the stable identifier, while the underlying RFC documents can change. Currently BCP-47 and RFC5646 are mostly the same thing, and for the purpose of this post are the same thing. So, in the below example the data provider is saying that they are providing a valid RFC5646 language tag. However, "i-navajo" is not a valid tag. It is not that it can't be found in the IANA subtag registry, rather it is that the subtag registry says that the value has been deprecated in favor of the ISO 639-1 value "nv". Therefore, the RFC5646 valid code for Navajo is "nv". Supporting documentation is provided below. I can also be available for further consultation to the MODS editors. I sit on the IANA mailing list, and am one of the US appointed observers to the ISO 639 workgroup.

Supporting document 1: https://www.iana.org/assignments/lang-tag-apps/i-navajo
Supporting document 2: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

This resource contains text in Navajo:

< language > < languageTerm type="code" authority="rfc5646">i-navajo< / languageTerm> < / language>

MIT-Harvard Model Open Access Policy

Posted on October 31, 2022 by Hugh Paterson III

This afternoon I had a consultation with a UO librarian and the issue of Open Access Policy and Ownership came up. This librarian was very helpful in pointing me to some of the discussion terms for Copyright Policy and Open Access Policy. They pointed me to https://openaccess.uoregon.edu/ as well as to something called the "MIT-Harvard Model Open Access Policy". The UO model is the inverse of the MIT Policy in some regards. I take note of this with interest as as I was part of SIL's Copyright Policy Committee...

https://libraries.mit.edu/about/policies/copyright-permissions-policy/

MARC profiles for Language Archives

Posted on October 31, 2022 by Hugh Paterson III

AACR2 and RDA both constitute application profiles using the same database structure known as MARC. MARC defines the fields and the expected values within those fields (type control) while AACR2 and RDA compose definitions of cognitive models and data fingerprinting (not the LIS terms for these concepts). By cognitive model I mean the mental representation of entities and their relationships and by fingerprinting of data I mean that some artifacts are "well described" when various fields are employed. E.g., a book description needs a publisher, while a manuscript does not.

AACR2 and RDA both constitute application profiles to which the documentation is only provided on a subscription basis. This is a pay-to-play game. This sort of game is not well received by the language documentation community. These facts do no mean that preservation organizations need to avoid MARC, rather a MARC profile could be established and documented in the open.

When considering the future of OLAC and language resource archiving an outstanding question emerges, is this sort of profile something that is of interest within the community?

Dublin Core Subject field

Posted on October 31, 2022 by Hugh Paterson III

Dublin Core has a subject element. But what constitutes a subject?

Two points on this:

Subject-hood is a complex notion. As pointed out by Birger Hjørland included in this concept can be both is-ness and about-ness. LIS theory can say to divide these concepts, but if Dublin Core as a descriptive framework does not allow this, then the notion of subjecthood should be assumed to include both notions.
Pictures (still images, including paintings) are complex when evaluating their subject hood. First, when a picture depicts something then it is reasonable to say that the picture is about that thing, as well as the picture is something...

I am suggesting that Dublin Core as a standard does not distinguish between about-ness and is-ness with regard to subject. And to further make matters complicated about-ness and is-ness merge more in visual media than in other types of print based media.

The following articles indirectly address the distinction of about-ness and is-ness or address about-ness in visual media.

Rushton, M. Public Funding of Controversial Art. Journal of Cultural Economics 24, 267–282 (2000). https://doi.org/10.1023/A:1007682121108

Wall, J. M. (2005). The Medium & the Message: Theology and Film. Theology Today, 62(1), 74–77. https://doi.org/10.1177/004057360506200109

Wanda Klenczon & Paweł Rygiel (2014) Librarian Cornered by Images, or How to Index Visual Resources, Cataloging & Classification Quarterly, 52:1, 42-61, DOI: 10.1080/01639374.2013.848123

in a book
Emerging Frameworks and Methods: CoLIS 4 : Proceedings of the Fourth

Andrea Witcomb (1997) On the Side of the Object: an Alternative Approach to Debates About Ideas, Objects and Museums, Museum Management and Curatorship, 16:4, 383-399, DOI: 10.1080/09647779700501604

Wang, X., Song, N., Liu, X. and Xu, L. (2021), "Data modeling and evaluation of deep semantic annotation for cultural heritage images", Journal of Documentation, Vol. 77 No. 4, pp. 906-925. https://doi.org/10.1108/JD-06-2020-0102

OLAC and Library of Congress Demographic Group Terms

Posted on October 31, 2022 by Hugh Paterson III

Library of Congress Demographic Group Termshttps://id.loc.gov/authorities/demographicTerms.html

OLAC and some genre terms

Posted on October 31, 2022 by Hugh Paterson III

I need to explore connoncial equivalences between some genre terms and OLAC terms.

For example: https://www.loc.gov/standards/valuelist/marcgt.html and http://www.loc.gov/standards/sourcelist/genre-form.html

Tuition costs for undergraduates at WOU and UO

Posted on October 30, 2022 by Hugh Paterson III

This week I was looking at undergraduate tuition rates at Western Oregon University and the University of Oregon. Finding these resources was a challenge, either UO hides the information, or they have really bad SEO.

	UO	WOU
Resident	$236.34	$194.00
Nonresident	$565.61	$638.00

Tuition costs for one credit hour. Sources: WOU | UO

Raw questions when looking at the MODS documentation

Posted on October 30, 2022 by Hugh Paterson III

https://www.loc.gov/standards/mods/userguide/identifier.html

MODS documentation does not explain an expected syntax for the examples. This would be very helpful. What is the expected syntax for typeURI?

http://loc.gov/standards/mods/userguide/typeofresource.html
manuscript

Definition
A resource that is written in handwriting or typescript.
Application
This attribute is used as manuscript="yes" when a collection contains manuscripts and is considered generally to be manuscript in nature, and for individual manuscripts.

A collection is not the same thing as a DC collection, so what does the XSD say?

Where does the OLAC grene terms map to this list?

https://www.loc.gov/standards/valuelist/marcgt.html

UNT IT help desk and Virtualization Service

Posted on October 21, 2022 by Hugh Paterson III

Initial Request on 16 OCT 2022

Reply the Next Day with my follow-up response.

Second Response of the same confusion.

Four rounds of this nonsense... please just read my message.

Then they sent me the questionnaire for the satisfaction survey.

Well, are you happy?

Umm, NO. Not happy. Please read my messages when I send them.

Then they marked it resolved... without any resolution!

Finally two days later, someone replies...

Two days later: "Oh we have a different virtualization service through the business school!

Can your service do what I need it to do?

With all the gusto of "Let's go down the rabbit hole again and contact a new IT department", I reached out to the Citrix service manager. However, contrary to initial expectations, I found that I was corresponding with a responsive and well informed person who could make things happen if all the boxes on his checklist were filled... only they the boxes still are not filled.

Let's get all the stakeholders involved...

And so there it is... UNT, the school which is not in want of defined process. It is a well managed school.

Dublin Core in HTML pages

Posted on October 19, 2022 by Hugh Paterson III

Dublin Core is sometimes inserted into in the HTML header for search engine optimization purposes. I am very curious to know which search engine are being optimized for with the inclusion of DC metadata in the HTML header. Google clearly sates they don't use keywords anymore. Some argue that dublin core tags are different than keywords and therefore google might still be using them. As far as I know the specifics are a trade secret that Google hasn't made public. If anyone knows more on this please let me know in the comments.

I do know that Google's search engine scholar.google.com runs via a different bot and crawl process and does use some DC tags for identification. They have a sub-dialect of tags and have added some non-standard (not true dublin core) tags to what they expect. — how rude and presumptuous of Google... But Google Scholar is the only search engine I know about looking for Dublin Core metadata in HTML. If anyone knows of another one I'm very keen to know about it.

Bing sunset their academic/scholar service. My understanding is that when it was running, it was just one bot that crawled the data and then they filtered the single crawl to create the academic materials product this is a different approach than Google is taking.

Here are some interesting links on Dublin Core in the headers:

http://webposible.com/utilidades/dublincore-metadata-gen/index.php?lang=en
http://criticism.com/seo/dublin-core-metadata.php
https://www.problogbooster.com/2010/12/use-dublin-core-meta-tags-in-blog-to.html
https://www.problogbooster.com/2010/03/meta-tag-generator-online-free-url-keyword-seo-html-description-code-improve-pagerank-traffic.html
https://www.woorank.com/en/blog/dublin-core-metadata-for-seo-and-usability
https://www.dublincore.org/specifications/dublin-core/dc-html/

The Journeyler

A walk through: Life, Leadership, Linguistics, Language Documentation, WordPress, and OS X (and a bit of Marketing & Business Administration)

MODS language code usage…

MIT-Harvard Model Open Access Policy

MARC profiles for Language Archives

Dublin Core Subject field

OLAC and Library of Congress Demographic Group Terms

OLAC and some genre terms

Tuition costs for undergraduates at WOU and UO

Raw questions when looking at the MODS documentation

UNT IT help desk and Virtualization Service

Dublin Core in HTML pages