Font queries

What do I want to do?

In the context of a pipeline I want to test a font against an known orthography version to know if it will support the orthography.

To that end here is a start:

https://stackoverflow.com/questions/4458696/finding-out-what-characters-a-given-font-supports

Another interesting site is: https://fontdrop.info/. It provides information from the metadata. Is list the languages it supports. But I wonder, what does "language support" or "supported languages" mean in these contexts? Where do the list of languages come from? Where are these languages' requirements cataloged?

Topic Modeling in Python

What if my import pd data array was the OLAC metadata schema?

https://www.semanticscholar.org/paper/A-Gentle-Introduction-to-Topic-Modeling-Using-Saxton/38742c56eadfdf11fb7218f7702c8fccfc78bd95

https://gist.github.com/umbertogriffo/5041b9e4ec6c3478cef99b8653530032

https://towardsdatascience.com/contextualized-topic-modeling-with-python-eacl2021-eacf6dfa576

Beginners Guide to Topic Modeling in Python

https://www.holisticseo.digital/python-seo/topic-modeling/

https://melaniewalsh.github.io/Intro-Cultural-Analytics/05-Text-Analysis/08-Topic-Modeling-Text-Files.html

https://asandeepc.bitbucket.io/courses/inls613_summer2019/lectures/08-lda_topic_modeling.pdf

http://derekgreene.com/slides/topic-modelling-with-scikitlearn.pdf

https://ourcodingclub.github.io/tutorials/topic-modelling-python/

https://stackabuse.com/python-for-nlp-topic-modeling/

https://www.toptal.com/python/topic-modeling-python

https://towardsdatascience.com/end-to-end-topic-modeling-in-python-latent-dirichlet-allocation-lda-35ce4ed6b3e0

SSH, Unix commands & RegEx

This summer I am sitting in on a computational linguistics course. It is the first instruction I have had about UNIX. Pretty Awesome.
This has required me to do some googling looking from terminal commands.

This is kind of a sketch of where I have been.

UNIX:
http://www.osxfaq.com/Tutorials/LearningCenter/

SSH:
http://kimmo.suominen.com/docs/ssh/
http://ss64.com/osx/

TERMINAL:
http://homepage.mac.com/rgriff/files/TerminalBasics.pdf

grep:
http://www.thegeekstuff.com/2009/03/15-practical-unix-grep-command-examples/
http://en.wikipedia.org/wiki/Grep
http://www.computerhope.com/unix/ugrep.htm

Regular Expressions:
http://www.zytrax.com/tech/web/regex.htm
http://www.regular-expressions.info/tutorial.html
http://gnosis.cx/publish/programming/regular_expressions.html

RegEx and Unicode:
One of the issues that I have had with RegEx has been what is a natural class? i.e. [A-Z], [A-Za-z], [0-9], etc. As a linguist I deal a lot with IPA characters, subscripts, superscripts, unicode, and diacritics. How am I to define a natural class with these? Can I define a natural class based on the phonology of the language?

So I did some more searching:
http://unicode.org/reports/tr18/
http://unicode.org/reports/tr18/tr18-5.1.html
http://icu-project.org/docs/papers/iuc26_regexp.pdf
http://courses.ischool.berkeley.edu/i256/f06/papers/regexps_tutorial.pdf
http://wapedia.mobi/en/Regular_expression?t=5.

RegEx+PERL+Unicode:
http://perldoc.perl.org/perlretut.html

PERL:
http://www.enginsite.com/Library-Perl-Regular-Expressions-Tutorial.htm
http://www.cgi101.com/book/connect/mac.html
http://www.mactech.com/articles/mactech/Vol.18/18.09/PerlforMacOSX/index.html

Python:
http://www.amk.ca/python/howto/regex/