OALC should implement oEmbed

Posted on January 30, 2023 by Hugh Paterson III

A New OLAC website should implement oEmbed...

https://github.com/rafaelmartins/pyoembed/
https://github.com/coleifer/micawber
https://oembed.com/

OLAC data quality investigator

Posted on January 26, 2023 by Hugh Paterson III

On the flight back from Finland I found it challenging to use my laptop and pulled out my scratch pad to draw out some ideas I was having. One of those ideas was an idea for a record quality investigator. A tool which lets one investigator the presence or absence of features or sets of features in a record or set of records. The goal is to look for any patterns in the records which might be interesting and notable.

What follows are my written notes.

Resetting OLAC Documentation

Posted on January 17, 2023 by Hugh Paterson III

OLAC Documentation is a set independently evolving documents with their own version numbers. What if these documents evolved together in sync? would it make the process more manageable? What if the documentation looked different? More like modern documentation?

I like the following document presentation layouts:

https://github.com/bep/docuapi
https://github.com/matcornic/hugo-theme-learn
https://github.com/alex-shpak/hugo-book
https://github.com/google/docsy
https://github.com/h-enk/doks

Next step is to take the most recent versions of OLAC documents and convert them to markdown. This converter catches all the XML tags while others don't https://codebeautify.org/html-to-markdown

On the list are:

http://www.language-archives.org/OLAC/repositories.html
http://www.language-archives.org/OLAC/metadata.html
http://www.language-archives.org/REC/bpr.html
http://www.language-archives.org/REC/olac-extensions.html
And each of the extensions.
These need to be cast one way for devs and implementers of technology while another way for managers and archivists.

OLAC Validator Custom Messages

Posted on January 15, 2023 by Hugh Paterson III

OLAC Validator custom messages can be created following these steps:
https://xerces.apache.org/xerces2-j/faq-xs.html#faq-4

This is the software it uses for its validator: https://xerces.apache.org/xerces-p/samples/validator.html Ideally this would also be containerized with the other parts.

One approach to get this containerized might be to use this script (which is older and linux oriented) https://github.com/dgricci/xmllint

Another option is to use: https://hub.docker.com/r/isaitb/xml-validator

If this service were implemented on a new server, with a web-interface we might expect to use a newer HTML front end.

here is what I found via gitub:

https://github.com/ebruchez/darius-xml.js
https://github.com/fulvio999/jxmlutil

Darius looks more promising but neither are "out of the box" tools.

Example Application Profiles with Dublin Core

Posted on January 11, 2023 by Hugh Paterson III

This application profile is interesting because the use Additive DC:Type values to refine each other... http://lib.psnc.pl/Content/153/CIMI%20-%20DC%20Guide%20to%20Best%20Practice.pdf

Text object metadata

Posted on January 4, 2023 by Hugh Paterson III

I find that this text object metadata scheme might be useful for describing corpora.

https://www.loc.gov/standards/textMD/

I should look at these auxiliary METS extensions and include them in OLAC discussions

Rights Metadata and Rights Vocabularies

Posted on January 4, 2023 by Hugh Paterson III

In the fall term of 2022 I took a course on Metadata at UNT. In that course I encountered an interesting Rights Metadata schema create my the California Digital Libraries Project called copyrightMD. This schema is interesting because it articulates where a resource was created.

his is currently on the web here:
https://cdlib.org/groups/rights-management-group-copyrightmd/
But that website seems to not render 100% so I looked it up in the Internet Archive here:
http://web.archive.org/web/20220119153216mp_/https://cdlib.org/wp-content/uploads/2019/01/copyrightMD_user_guidelines.pdf

CopyrightMD Has been mentioned in the following academic publications:

I find the list of rights metadata schemas list in the library guide at UCF very helpful:

https://guides.ucf.edu/metadata/adminMetadata

For rights metadata, the common metadata standards such as Dublin Core include a “rights” field. Any known intellectual property rights held for the data, including access rights and rights holder, can be specified in that field. Some digital repositories provide an opportunity to assign a Creative Commons license to the materials or datasets deposited in the repository.

There are other Right Metadata standards including CopyrightMD, METSRights, ONIX For Publications Licenses, Open Digital Rights Language and XrML.

https://wiki.creativecommons.org/wiki/CC_REL

However I found that the MEts Rights schema was not linked appropriately:
https://www.loc.gov/standards/rights/
https://www.loc.gov/standards/rights/METSRights.xsd
https://www.loc.gov/standards/rights/2005version/METSRights.xsd

I personally find the statements at rightsstatements.org to be limiting:
https://rightsstatements.org/en/

The educational use permitted one is very confusing: https://rightsstatements.org/page/InC-EDU/1.0/?language=en

Note that Creative Commons used to have one like this but they did away with the whole educational use series of licenses, but I can't find them at the moment. I would have thought they might have been here: https://creativecommons.org/retiredlicenses/

http://web.archive.org/web/20100101121150/https://learn.creativecommons.org/
http://web.archive.org/web/20080714211609/http://creativecommons.org/weblog/entry/8235
https://wiki.creativecommons.org/wiki/I_want_to_make_sure_that_the_OER_I_create_are_used_only_for_truly_educational_purposes._That_means_I_should_limit_my_works_to_%E2%80%9Ceducational_use_only,%E2%80%9D_right%3F

Creative Commons Welcomes David Wiley as Educational Use License Project Lead

Real problems with academics using CC licenses:
https://smcclatchy.github.io/exp-design/LICENSE.html
Copyfraud: https://www.researchgate.net/publication/228219706_Copyfraud August 2005New York University law review (1950) 81(3)
See also: 10.5334/jcms.1021217
https://www.researchgate.net/publication/275440056_The_Public_Domain_vs_the_Museum_The_Limits_of_Copyright_and_Reproductions_of_Two-dimensional_Works_of_Art
see also: 10.1002/meet.14504701045
see also: 10.2139/ssrn.1806809
see also: https://www.researchgate.net/publication/308339459_Museums_Property_Rights_and_Photographs_of_Works_of_Art_Why_Reproduction_Through_Photograph_Should_Be_Free

see also: 10.1515/9783110732009-010 — 8 Rights Issues in the Digitization of Library Collections

~~~~
OER Notes:

U.S. Department of Education Open Licensing Rule Now in Effect

https://wiki.creativecommons.org/wiki/Creative_Commons_and_Open_Educational_Resources
https://wiki.creativecommons.org/wiki/OER_Project

Stack Exchange for Language Resource Archiving

Posted on December 31, 2022 by Hugh Paterson III

I wonder if it would be productive to have language archive discussions via stack exchange. OLAC implementation, cataloging, and indexing discussions.

https://area51.stackexchange.com/faq

1000 questions in 6 months with 70% answers... that means 6 people asking one question a day for 6 months. Do we have that many questions? Do we have that big of a community?

OLAC language view pages

Posted on December 30, 2022 by Hugh Paterson III

There are several views in the OLAC website. Each with specific purposes.

The language page view is functionally defined here:
https://github.com/olac/olac/blob/89bbf36ddd6eb0863cf4f4e927277425f609db0f/wiki/LanguagePages.wiki
Implemented in code here:
https://github.com/olac/olac/blob/89bbf36ddd6eb0863cf4f4e927277425f609db0f/web/language.php#L92
And renders like this:

OLAC query needs

Posted on December 30, 2022 by Hugh Paterson III

The following example points to the need for users to be able to sort collection by license, relationships, and extent.

I am looking for large spoken corpora of spontaneous speech in any
language (ideally > 100 hours) with a time-aligned transcription. I am
not committed to a specific genre as long as it is spontaneous speech.
It should be available as a download (for research, no commercial use),
ideally free but I may be able to pay for it as well.

The Journeyler

A walk through: Life, Leadership, Linguistics, Language Documentation, WordPress, and OS X (and a bit of Marketing & Business Administration)

Tag Archives: OLAC