Dichotomy of Lexical Resources

This post attempts to express the range of resources in the lexical resource category. This is hard to do in a straight forward manner.

All Lexical Resources

Resource Types

A consistent typology of lexical resources is challenging for several reasons. One of those reasons is that lexical resources are usually at the apices of several intersecting continuums. Some of these continuums are presented below.

Wordlists

Encyclopedic multilingual resources

Print

Non-Print
(Oral)

Physical

Digital

Edited

Non-Edited

Single Author

Collaborative Production

Corpus Based

Non-Corpus Based

 

 

Beyond these continuums there is also purpose, where the interactive ideal is established. If we take the dictionary as an example, then there is the "Learner's dictionary", the "Bi-lingual dictionary", the "Picture dictionary", the "Domain specialist dictionary", etc.


Databases Types

Beyond the description of the thing-ness of lexical databases using the continuums above, there is the technical description of the database. We can talk about character sets (UTF-8, UTF-16, etc.), and we can also talk about the description of "the thing" by the application which we used to create "the thing". So it might be a ToolBox database or a FLEx database, etc. But even within these descriptions there issues like database schemas, or customizations which need to be documented if we are going to think about passing our data on to other users.

---------------------------------------------------------------

A second thing to think about is data licensing --- talk here about the onion model

---------------------------------------------------------------
So, what is this "thing" we need to actually submit to the archive?

In a complete toolbox project file one should expect to find the following.

Some-zipped-toolbox-project.zip
├── .typ - File defining the database structure
├── .lng - File defining theLanguage encoding
├── .prj - Project file
└── Datafile - with one of the following file types
   ├── .db
   ├── .dic
   ├── null - meaning no file ending
   ├── .txt
   └── .xml

In a complete FLEx 6 and previous project file one should expect to find the following.

In a complete FLEx 7 and Newer project file one should expect to find the following.

---------------------------------------------------------------
Is a dictionary a lexical database? are they the same thing? - If they are not then should they be put in the same record (Item) or should they be independent items with a relationship connecting them?

Archive Institution
└── DSpace
   ├── Community 1
   │   ├── Collection 1
   │   │   ├── Item 1
   │   │   │   ├── Bitstream 1
   │   │   │   └── Bitstream 2
   │   │   └── Item 2
   │   └── Collection 2
   └── Community 2

What does a dictionary entry look like for archiving?

Best practices for file archiving of lexical databases and Dictionaries in SIL's Archive.

All dictionaries should have a lexical database associated with them.
All dictionaries should have a PDF with them.
All dictionaries should have the cover or jacket PDF.
All fonts and scripts used to format the lexical data into the PDF should be included.
All dictionaries should have a write up of which materials in the Lexical database were included in the dictionary and how this was decided.
All dictionaries with more than lexical content should include source files for those pages of the dictionary.

All ShoeBox files should have_____ file ending
All ToolBox Files should have_____ file ending
In all ToolBox files should be ______ components.
A remark about SFM v.s MDF

All FLEx databases should have_____ file ending
All FLEx databases should have a remark about the FLEx version.

Not all lexical data sets have a dictionary output. All lexical datasets should have a .lift output. (even thought .lift is not everything in a FLEx dataset.)

===Data maintenance strategy===
All Shoebox, ToolBox and FLEx databases should be archived one a year, at project's end and prior to conversion to another format (or version of)- like a FLEx database.

All Data conversion should be first attempted by the active project. All data from inactive projects should be updated annually with the release cycles of newer versions of FLEx. - This might could be scripted and conducted in the collaboration between the SIL Archive and the SIL Lexicography Data Conversion Service.

Lexical content Browser

Anatomy of archived lexical data sets

In a complete toolbox project file one should expect to find the following.
.zip
├── Database\ structure - .typ
├── Datafile - with one of the following file types
│   ├── .db
│   ├── .dic
│   ├── null - meaning no file ending
│   ├── .txt
│   └── .xml
├── Language encoding file - .lng
└── Project file - .prj

Last thing

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.