Lexical Database Archiving Questionnaire

Featured

It's true!

I am asking around on different mailing lists to gain some insight into the archiving habits of linguists who use lexical databases. I am specifically interested in databases created by tools like FLEx, ToolBox, Lexus, TshwaneLex, etc.

Background Story Continue reading

Python + Mysql

https://realpython.com/python-mysql/

https://dev.mysql.com/doc/connector-python/en/preface.html



https://www.geeksforgeeks.org/working-with-pdf-files-in-python/
https://www.geeksforgeeks.org/extract-text-from-pdf-file-using-python/
https://qxf2.com/blog/extracting-data-from-pdfs-python/


https://www.metachris.com/pdfx/
https://softwarerecs.stackexchange.com/questions/76210/software-to-extract-the-list-of-references-and-title-from-a-pdf-of-a-research-pa
https://pypi.org/project/refextract/
https://stackoverflow.com/questions/62365767/extract-references-from-pdf-python
https://discuss.python.org/t/pdf-extraction-with-python-wrappers/40384

Need to Add LREC Workshops to aclanthology.org

From time to time I need to reference Heidi Johnson's work published as part of the LREC workshops in 2002 and 2006 under the title: "International Workshop on Resources and Tools in Field Linguistics". The papers never got hosted on the official LREC website. Rather the papers were hosted on the MPI website.

Who do I talk to about getting these papers into the https://aclanthology.org database of papers. They would get the attention they need in that paper repository.

Re-implementing the OLAC validator

The OLAC validator runs off of an unit of software which has the heartbleed security vulnerability. Thinking about implementing a validator the following software comes to mind. https://github.com/zimeon/oaipmh-validator There was also an Online OAI-PMH validator from a former engineer on the Europeana project. I think he is based in Greece. His solution is not open source, but he mentioned that he would consider adding the OLAC profile. https://validator.oaipmh.com/

It would be good to see what other OAI-PMH validators look like and how submitters expect to interact with them.

https://validador.rcaap.pt/validator2/?locale=en
http://oval.base-search.net/
https://doi.org/10.17700/jai.2016.7.1.277
https://rdamsc.bath.ac.uk/msc/t64; https://www.openaire.eu/validator-registration-guide ; https://github.com/EuroCRIS/openaire-cris-validator; https://www.fosteropenscience.eu/content/openaire-compatibility-validator-presentation
http://oai.clarin-pl.eu/

DSpace and Dataverse have a bug in Parsing OLAC XML

I read tonight about a bug in Xoai a foundational library for DSpace and Dataverse which uses lxml lib for parsing.

Since the OLAC XML implementation of OAI-PMH requires the use of an XSI element it seems that the bug defined here https://github.com/DSpace/xoai/issues/67 and discussed here would apply https://github.com/gdcc/xoai/issues/141