Venn diagrams

Posted on November 10, 2024 by Hugh Paterson III

I found the following resources really helpful with boolean operators and Venn diagrams.

Snarky Math (Director). (2021, October 21). Can you draw a Venn diagram for 4 sets? | Why Venn diagrams are not easy [Streamed]. Snarky Math. https://youtu.be/IekSOZIF5uI

Student Contributors The University of Edinburgh School of Informatics. (n.d.). Better Informatics. Betterinformatics.com. Retrieved November 10, 2024, from https://betterinformatics.com/resources/inf1-cl/venn/

This one allows prime sets: https://statpowers.com/venn.html

The University of Edinburgh School of Informatics. (n.d.). Venn Diagrams. The University of Edinburgh School of Informatics Teaching Aids. Retrieved November 10, 2024, from https://www.inf.ed.ac.uk/teaching/courses/inf1/cl/tools/venn/

I haddn't really thought about what they represent or the appropriateness of their use. https://www.sciencedirect.com/science/article/abs/pii/B9780444529374500113 https://blog.jooq.org/say-no-to-venn-diagrams-when-explaining-joins/ https://github.com/tctianchi/pyvenn This python lib is interesting for generating visualizations if they are accurate. I used an inaccurate visualization in my presentation on OLAC roles. Maybe this could be added to django to update automatically. https://www.sciencedirect.com/topics/mathematics/venn-diagram https://www.dubberly.com/concept-maps/visualizing-venn-diagrams.html

Some python tools for NER extraction tool chains

Posted on March 23, 2024 by Hugh Paterson III

https://github.com/johnfraney/django-ner-trainer

https://johnfraney.github.io/django-ner-trainer/

https://github.com/doccano/doccano

https://prodi.gy/buy

https://github.com/GitTeaching/Django_NER_Crisis?tab=readme-ov-file

Python + Mysql

Posted on March 3, 2024 by Hugh Paterson III

https://realpython.com/python-mysql/

https://dev.mysql.com/doc/connector-python/en/preface.html

https://www.geeksforgeeks.org/working-with-pdf-files-in-python/
https://www.geeksforgeeks.org/extract-text-from-pdf-file-using-python/
https://qxf2.com/blog/extracting-data-from-pdfs-python/

https://www.metachris.com/pdfx/
https://softwarerecs.stackexchange.com/questions/76210/software-to-extract-the-list-of-references-and-title-from-a-pdf-of-a-research-pa
https://pypi.org/project/refextract/
https://stackoverflow.com/questions/62365767/extract-references-from-pdf-python
https://discuss.python.org/t/pdf-extraction-with-python-wrappers/40384

OCR solutions for python toolchains

Posted on May 4, 2023 by Hugh Paterson III

https://github.com/Calamari-OCR/calamari
https://builtin.com/data-science/python-ocr
https://www.geeksforgeeks.org/python-reading-contents-of-pdf-using-ocr-optical-character-recognition/#

Extract Text from Images Quickly Using Keras-OCR Pipeline

https://github.com/usnistgov/ocr-pipeline

App Build and Deploy

Posted on April 26, 2023 by Hugh Paterson III

https://fly.io/docs/languages-and-frameworks/python/

Font queries

Posted on March 23, 2023 by Hugh Paterson III

What do I want to do?

In the context of a pipeline I want to test a font against an known orthography version to know if it will support the orthography.

To that end here is a start:

https://stackoverflow.com/questions/4458696/finding-out-what-characters-a-given-font-supports

Another interesting site is: https://fontdrop.info/. It provides information from the metadata. Is list the languages it supports. But I wonder, what does "language support" or "supported languages" mean in these contexts? Where do the list of languages come from? Where are these languages' requirements cataloged?

Fuzzy matching

Posted on November 19, 2022 by Hugh Paterson III

https://towardsdatascience.com/fuzzy-string-matching-in-python-68f240d910fe

https://www.activestate.com/blog/how-to-implement-fuzzy-matching-in-python/#:~:text=As%20mentioned%20above%2C%20fuzzy%20matching,strings%20are%20to%20one%20another.

Topic Modeling in Python

Posted on November 19, 2022 by Hugh Paterson III

What if my import pd data array was the OLAC metadata schema?

https://www.semanticscholar.org/paper/A-Gentle-Introduction-to-Topic-Modeling-Using-Saxton/38742c56eadfdf11fb7218f7702c8fccfc78bd95

https://gist.github.com/umbertogriffo/5041b9e4ec6c3478cef99b8653530032

https://towardsdatascience.com/contextualized-topic-modeling-with-python-eacl2021-eacf6dfa576

Beginners Guide to Topic Modeling in Python

How to Use Bertopic for Topic Modeling and Content Analysis?

https://melaniewalsh.github.io/Intro-Cultural-Analytics/05-Text-Analysis/08-Topic-Modeling-Text-Files.html

https://asandeepc.bitbucket.io/courses/inls613_summer2019/lectures/08-lda_topic_modeling.pdf

http://derekgreene.com/slides/topic-modelling-with-scikitlearn.pdf

https://ourcodingclub.github.io/tutorials/topic-modelling-python/

https://stackabuse.com/python-for-nlp-topic-modeling/

https://www.toptal.com/python/topic-modeling-python

https://towardsdatascience.com/end-to-end-topic-modeling-in-python-latent-dirichlet-allocation-lda-35ce4ed6b3e0

Scraping archives for OLAC

Posted on November 2, 2022 by Hugh Paterson III

This post is a set of resources I am compiling to create a scrape of a language archive to create a Static OLAC feed.

https://www.youtube.com/watch?v=RvCBzhhydNk

https://www.kdnuggets.com/2022/02/build-web-scraper-python-5-minutes.html

The archive: http://roa.rutgers.edu/article/browse

https://www.geeksforgeeks.org/how-to-build-web-scraping-bot-in-python/

https://www.edureka.co/blog/web-scraping-with-python/

https://www.webscrapingapi.com/python-web-scraping

Text Extraction
https://www.geeksforgeeks.org/extract-text-from-pdf-file-using-python/
https://towardsdatascience.com/how-to-extract-text-from-pdf-245482a96de7
https://betterprogramming.pub/how-to-convert-pdfs-into-searchable-key-words-with-python-85aab86c544f

OCR
https://www.javatpoint.com/how-to-read-contents-of-pdf-using-ocr-in-python
https://www.geeksforgeeks.org/python-reading-contents-of-pdf-using-ocr-optical-character-recognition/
https://pypi.org/project/ocrmypdf/
https://towardsdatascience.com/extracting-text-from-scanned-pdf-using-pytesseract-open-cv-cd670ee38052
https://stackabuse.com/applying-ocr-to-a-scanned-pdf-in-python-using-borb/

NER

Named Entity Recognition (NER) in Python with Spacy

File Type Detection
https://github.com/ahupp/python-magic
https://www.geeksforgeeks.org/determining-file-format-using-python/
https://stackoverflow.com/questions/10937350/how-to-check-type-of-files-without-extensions

XML parsing

https://pypi.org/project/defusedxml/
https://www.tutorialspoint.com/python/python_xml_processing.htm

Platform tools for OAI harvesting

Posted on October 17, 2022 by Hugh Paterson III

So, in recent OLAC presentation I talked about enabling Omeka or Drupal via recipes for OAI harvesting. Here is some links to internet chatter on these issues.

Koha

enable Items in KOHA OAI Harvesting

https://koha-community.org/manual/18.05/en/html/webservices.html

https://forums.zotero.org/discussion/38956/export-of-zotero-citation-to-marc-format-for-import-into-koha-lms

wordpress

Day 13: Harvest data with OAI-PMH

WordPress and Drupal

https://acrl.ala.org/techconnect/post/creating-an-oai-pmh-feed-from-your-website/

eHive

https://developers.ehive.com/

Catmandu

https://librecatproject.wordpress.com/tutorial/

Omeka
https://omeka.org/classic/docs/Plugins/OaiPmhRepository/

MOAI
https://pypi.org/project/MOAI/

PyOAI
https://pypi.org/project/pyoai/

Posted in Other Journals | Tagged OAI, OAI-PMH, OLAC, PHP, Python, R-90 | Leave a reply

Post navigation

← Older posts

Activity
July 2025

M T W T F S S

1 2 3 4 5 6

7 8 9 10 11 12 13

14 15 16 17 18 19 20

21 22 23 24 25 26 27

28 29 30 31

« Jan

I’ve been saying

Chasing subsets

New mouse buttons

Moving Apple notes

Academic Heritage in MARC records

Converting DC Subjects to Schema.org

Language Documentation Gear

Serials, MARC Records and RDA Core

Font Modulator

OLAC CMS options via XML

OLAC Collection Description and Linked Data Terms

Zotero Plugins

OLAC and User Tasks

Say What?
David Clews on German Waters
Jeff Pitts on Kinder Eier
Jeff on Plasticification of soil
Thoughts on file formats and file names in language documentation projects and archiving | The Journeyler on The Workflow Management for Linguists
Hugh Paterson III on Types of Linguistic Maps: The Mapping of linguistic Features and Researcher Interactivity

One should not consider the content on this website to be an official opinion of any company associated with me. These posts are solely my opinion.

The Journeyler

A walk through: Life, Leadership, Linguistics, Language Documentation, WordPress, and OS X (and a bit of Marketing & Business Administration)

Tag Archives: Python