Python + Mysql

Font queries

What do I want to do?

In the context of a pipeline I want to test a font against an known orthography version to know if it will support the orthography.

To that end here is a start:

Another interesting site is: It provides information from the metadata. Is list the languages it supports. But I wonder, what does "language support" or "supported languages" mean in these contexts? Where do the list of languages come from? Where are these languages' requirements cataloged?

Topic Modeling in Python

What if my import pd data array was the OLAC metadata schema?

Beginners Guide to Topic Modeling in Python

How to Use Bertopic for Topic Modeling and Content Analysis?

Platform tools for OAI harvesting

So, in recent OLAC presentation I talked about enabling Omeka or Drupal via recipes for OAI harvesting. Here is some links to internet chatter on these issues.


enable Items in KOHA OAI Harvesting


Day 13: Harvest data with OAI-PMH

WordPress and Drupal






SSH, Unix commands & RegEx

This summer I am sitting in on a computational linguistics course. It is the first instruction I have had about UNIX. Pretty Awesome.
This has required me to do some googling looking from terminal commands.

This is kind of a sketch of where I have been.





Regular Expressions:

RegEx and Unicode:
One of the issues that I have had with RegEx has been what is a natural class? i.e. [A-Z], [A-Za-z], [0-9], etc. As a linguist I deal a lot with IPA characters, subscripts, superscripts, unicode, and diacritics. How am I to define a natural class with these? Can I define a natural class based on the phonology of the language?

So I did some more searching: