I am asking around on different mailing lists to gain some insight into the archiving habits of linguists who use lexical databases. I am specifically interested in databases created by tools like FLEx, ToolBox, Lexus, TshwaneLex, etc.
There are some really nice templates out there for Latex... I need to look at xLingPaper and then take a look at what I want to do and if I want to create a look-alike template in LaTeX. All things considered XLingPaper still pulls data nicely from FLEx. But I haven't used FLEx in a bit.
I really like CharisSIL and Linux Libertine. ACM two-column format really looks nice, in Libertine, but I haven't checked the linguistic symbols with it recently.
Defining aboutness of a collection is a challenge. From a philosophical point of view, this is even harder for collections in anthropological linguistics. These kinds of collections are not assembled for the sake of their "about-ness" but rather for the sake of their "is-ness". A collection in a museum might be about 19th century trains but such collections rarely contain the trains themselves. So, does this mean that linguistic collections are really about the people groups the speech is representing? and then the of-ness is the speech? Then linguists come along and write about the grammar of the language, and that is about the language? Often original stories will have an aboutness meaning which is never recorded in metadata. This needs to change.
This thought needs to be explored with MARC 655 $x and $v sub-fields. see: https://www.loc.gov/marc/bibliographic/bd655.html
see email: https://mail.google.com/mail/u/0/#sent/FMfcgzGsmrDLzSSBqXVPfKphwmdGhcZC
The massive pre-print industry has influenced Zotero to make their a specific category for pre-print. This is a cognitive fallacy which only exacerbates the citation and reference chaos.
Pre-prints are manuscripts.... There are hand-written manuscripts, there are typescript manuscripts and there are computer generated manuscripts... Zotero already has manuscripts as a category... no need to add a new category.
To make matters worse, Zotero imports PDFs when it can find open access versions of them. The problem is that it imports them to the article/publication type when they are pre-prints rather than to the pre-print item type. This make authority version management in Zotero nightmare. Classic case (try importing) : https://doi.org/10.1177/0964663914565848
I am still hopeful that Zotero staff will find a clean and easy way to automatically link pre-prints to their authority version records within Zotero.
Sometimes as a parent one has to encourage their child to do something their child doesn't want to do. That time can today. I had to pull teeth to get Katja to come to the pool with me today. I told her she only needed to swim 3 laps. After much cajoling we got to the car. By the time she got to the pool, she had a kick board and was off. I got a few laps in and she met me at the far wall of the 25 yard lane. She says to me: " I want to swim 12 laps". And so she did. So.. from poolside observer 4 years ago so swim partner today.
This might be a way forward to an OAI-PMH repo: https://github.com/discourse/discourse-sitemap another option is to use a query mechanism in the JSON api to get all threads and treat these threads as resources for description. https://meta.discourse.org/t/discourse-rest-api-documentation/22706
I wonder how many layers a tag-group can have... https://docs.discourse.org/#tag/Tags/operation/updateTagGroup
Subject analysis is very interesting. In a recent investigation into a theory of subject analysis, I was introduced to the concepts of: "about-ness", "is-ness", "of-ness".
Sometimes I wonder if linguists defy standard practices in subject representation, of if they define what a general population holds as a challenge with subject analysis in cataloging.
I harken to the OLAC application profile, which is based on Dublin Core. Dublin Core does not scope the subject element to "about-ness" analysis. UNT curriculum, informed and based (in structure) on Steven J. Miller', Metadata for Digital Collections: A How-To-Do-It Manual. The issue at hand is that for linguists, about-ness is only relevant for Information resources representing analysis. For other kinds of resources such as primary oral texts, or narratives captured via video which are often the object analyzed and discussed in information resources representing analysis, the primary view on subjecthood is through of-ness. As far as I know no-one has discussed audio and of-ness descriptions of audio.
It also makes me wonder if genre is mostly about utility and not about a binding style. To this end then a scholar looking for a phonology corpus, is looking for what—a combination of things—a MIMEType, with a relationship to another MIMEType, with an of-ness of a kind and a subject of "phonology".
By splitting up the concepts of: "about-ness", "is-ness", and "of-ness" it provides analytical space for more articulate descriptions in the dc:description field. But when it comes to language materials, the question is: is language a subject by virtue of "of-ness" or by virtue of "about-ness"? There are several implications here:
The description field ought to be re-thought.
The subject field ought to be re-thought.
Some searches by linguists are likely the concatenation of two or three factors: A relationship between two records, and a subject of a kind and a subject of a different kind.
Variation in accuracy, completeness, or consistency can contribute to lower quality metadata records. Hughes (2006), when looking at OLAC records, rightly points out that coverage (quantity of elements per record) is one way to estimate record quality. However, all three impact end-user perceptions about records and their associated resources.
For OLAC the question is how can it reward data contributors for high quality metadata and also detect low quality metadata while correcting or enhancing low quality?