Topic Modeling in Python

What if my import pd data array was the OLAC metadata schema?

Beginners Guide to Topic Modeling in Python

SSH, Unix commands & RegEx

This summer I am sitting in on a computational linguistics course. It is the first instruction I have had about UNIX. Pretty Awesome.
This has required me to do some googling looking from terminal commands.

Regular Expressions:

RegEx and Unicode:
One of the issues that I have had with RegEx has been what is a natural class? i.e. [A-Z], [A-Za-z], [0-9], etc. As a linguist I deal a lot with IPA characters, subscripts, superscripts, unicode, and diacritics. How am I to define a natural class with these? Can I define a natural class based on the phonology of the language?

So I did some more searching: