Python + Mysql

https://realpython.com/python-mysql/

https://dev.mysql.com/doc/connector-python/en/preface.html



https://www.geeksforgeeks.org/working-with-pdf-files-in-python/
https://www.geeksforgeeks.org/extract-text-from-pdf-file-using-python/
https://qxf2.com/blog/extracting-data-from-pdfs-python/


https://www.metachris.com/pdfx/
https://softwarerecs.stackexchange.com/questions/76210/software-to-extract-the-list-of-references-and-title-from-a-pdf-of-a-research-pa
https://pypi.org/project/refextract/
https://stackoverflow.com/questions/62365767/extract-references-from-pdf-python
https://discuss.python.org/t/pdf-extraction-with-python-wrappers/40384

Font queries

What do I want to do?

In the context of a pipeline I want to test a font against an known orthography version to know if it will support the orthography.

To that end here is a start:

https://stackoverflow.com/questions/4458696/finding-out-what-characters-a-given-font-supports

Another interesting site is: https://fontdrop.info/. It provides information from the metadata. Is list the languages it supports. But I wonder, what does "language support" or "supported languages" mean in these contexts? Where do the list of languages come from? Where are these languages' requirements cataloged?

Topic Modeling in Python

What if my import pd data array was the OLAC metadata schema?

https://www.semanticscholar.org/paper/A-Gentle-Introduction-to-Topic-Modeling-Using-Saxton/38742c56eadfdf11fb7218f7702c8fccfc78bd95

https://gist.github.com/umbertogriffo/5041b9e4ec6c3478cef99b8653530032

https://towardsdatascience.com/contextualized-topic-modeling-with-python-eacl2021-eacf6dfa576

Beginners Guide to Topic Modeling in Python

How to Use Bertopic for Topic Modeling and Content Analysis?

https://melaniewalsh.github.io/Intro-Cultural-Analytics/05-Text-Analysis/08-Topic-Modeling-Text-Files.html

https://asandeepc.bitbucket.io/courses/inls613_summer2019/lectures/08-lda_topic_modeling.pdf

http://derekgreene.com/slides/topic-modelling-with-scikitlearn.pdf

https://ourcodingclub.github.io/tutorials/topic-modelling-python/

https://stackabuse.com/python-for-nlp-topic-modeling/

https://www.toptal.com/python/topic-modeling-python

https://towardsdatascience.com/end-to-end-topic-modeling-in-python-latent-dirichlet-allocation-lda-35ce4ed6b3e0

Platform tools for OAI harvesting

So, in recent OLAC presentation I talked about enabling Omeka or Drupal via recipes for OAI harvesting. Here is some links to internet chatter on these issues.

Koha

enable Items in KOHA OAI Harvesting

https://koha-community.org/manual/18.05/en/html/webservices.html

https://forums.zotero.org/discussion/38956/export-of-zotero-citation-to-marc-format-for-import-into-koha-lms

wordpress

Day 13: Harvest data with OAI-PMH

WordPress and Drupal

https://acrl.ala.org/techconnect/post/creating-an-oai-pmh-feed-from-your-website/

eHive

https://developers.ehive.com/

Catmandu

https://librecatproject.wordpress.com/tutorial/

Omeka
https://omeka.org/classic/docs/Plugins/OaiPmhRepository/

MOAI
https://pypi.org/project/MOAI/

PyOAI
https://pypi.org/project/pyoai/

SSH, Unix commands & RegEx

This summer I am sitting in on a computational linguistics course. It is the first instruction I have had about UNIX. Pretty Awesome.
This has required me to do some googling looking from terminal commands.

This is kind of a sketch of where I have been.

UNIX:
http://www.osxfaq.com/Tutorials/LearningCenter/

SSH:
http://kimmo.suominen.com/docs/ssh/
http://ss64.com/osx/

TERMINAL:
http://homepage.mac.com/rgriff/files/TerminalBasics.pdf

grep:
http://www.thegeekstuff.com/2009/03/15-practical-unix-grep-command-examples/
http://en.wikipedia.org/wiki/Grep
http://www.computerhope.com/unix/ugrep.htm

Regular Expressions:
http://www.zytrax.com/tech/web/regex.htm
http://www.regular-expressions.info/tutorial.html
http://gnosis.cx/publish/programming/regular_expressions.html

RegEx and Unicode:
One of the issues that I have had with RegEx has been what is a natural class? i.e. [A-Z], [A-Za-z], [0-9], etc. As a linguist I deal a lot with IPA characters, subscripts, superscripts, unicode, and diacritics. How am I to define a natural class with these? Can I define a natural class based on the phonology of the language?

So I did some more searching:
http://unicode.org/reports/tr18/
http://unicode.org/reports/tr18/tr18-5.1.html
http://icu-project.org/docs/papers/iuc26_regexp.pdf
http://courses.ischool.berkeley.edu/i256/f06/papers/regexps_tutorial.pdf
http://wapedia.mobi/en/Regular_expression?t=5.

RegEx+PERL+Unicode:
http://perldoc.perl.org/perlretut.html

PERL:
http://www.enginsite.com/Library-Perl-Regular-Expressions-Tutorial.htm
http://www.cgi101.com/book/connect/mac.html
http://www.mactech.com/articles/mactech/Vol.18/18.09/PerlforMacOSX/index.html

Python:
http://www.amk.ca/python/howto/regex/