Understanding Archive Resource Publicity

This post attempts to express an overview of options within the language archiving enterprise with respect to the discoverability and accessibility of resources.

All Archivable Resources

Un-archived Resources

May be:

  • known (common knowledge, grant funded, etc.)
  • unknown (e.g. individual research projects, that only select few know to exist)
  • discoverable (posted on a personal or departmental website)
  • privately kept (without public discovery)

However, no instance (and therefore also record) of these resources exists in the curated catalogues of professional libraries or institutional archives dedicated to the care and stewardship of language resources. Furthermore, in the above scenarios there is no long term preservation plan for these resources, even if a redundancy fallback copy of the data exists.


Archived but Private Resources

These resources are severely restricted. Most people (including specialists in the language family, some archive staff and even some community members) do not know about them.

  • Meta-data is hidden (not shared publicly).
  • Archived objects have restricted access.

While archives may not be able to directly report on these objects, they can indirectly report what percentage of the archive's total content these items comprise.

Example of an indirect report: 10% of XYZ archive's total contents are severely restricted. Most corpora contain less than 0.1% of severely restricted content.
Such reporting is healthy for:

  • Funders - to help understand the nature of how language data is viewed by various communities. It also communicates that the archiving institution is being as transparent as possible with the data it does have - a mark of faithful stewardship.
  • Archive administrators - to monitor basic trends across individual corpora, across their entire archive's submissions, and across the larger language archiving community.
  • Language and linguistic specialists - to realize that these options do exist and if these options need to be exercised, that these options for archiving are used within industry "norms". To this end, linguists also need some example use cases.
  • Communities - to realize that archives have not forgotten that they have a connection with communities which are not listed in more public places.

Some restrictions are necessary. They help to build trust in archiving institutions and appropriate expectations for various stakeholders.
Note: The reasons for these restrictions should be documented so that when archive staff change, the rational for the restrictions is not lost. Additionally, the archive staff and the depositors should be in contact at a pre-determined interval to establish the continued necessity for this level of resource suppression. Frequency of communication can vary (but 3-5 years is a long time in today's world).

Archived but Restricted Resources

Meta-data is publicly advertised via a clean navigable website, is discoverable to industry leading search engines, and through specific archiving and linguistic industry standard venues like OLAC. In contrast the the high visibility of the meta-data, the archived objects have restricted (permissions based) access.

  • Meta-data is open and discoverable.
  • Items have restricted access.

To maintain trust in this context, items should have: a stable endpoint (URI address) for citation purposes and a contact method for requesting access to the item (not necessarily the whole corpus). Resource items also need to be able to display their relationship to (1) the corpus as a whole and (2) other items in the corpus (especially those which are needed to function together). Additionally, archives should have in place a stewardship protocol granting them authority to administer the deposits in such cases that the original depositor is disinclined to remain alive or in contact with the archive.

Archived and Open Resources

Meta-data is publicly advertised, and the resource is openly available.

  • Meta-data is open and discoverable.
  • Items are open to public access either through direct click and download or through an automated human verification (like login or recaptcha).

Leave a Reply

Your email address will not be published. Required fields are marked *