VRA Core and its use of xml:lang

Some information professionals might be confused about the use of language identification metadata in larger bibliographic metadata standards. For example, VRA Core (Visual Resources Association)is a metadata standard which is used to describe visual artifacts. It is implemented in XML and therefore takes on all the descriptive power of XML. Including the use of the xml:lang attribute.

The following observations are made using the VRA (Visual Resources Association) Core 4 XML Schema, version 0.42. This schema implements the final VRA Core 4.0 guidelines, 2007-04-09. It is important to note that in these metadata standards implemented by memory institutions there are really two parts, the first is the "guidelines" and then there is the "implementation" of those guidelines (in this case as an XSD validation file). These two documents may not always be congruent even if that is the intention. In these cases I argue that what is valid is the technical implementation over the guidelines as that seems to be the best way to argue the definitive authority.

The XSD validation document contains the following annotation around the use of the xml:lang attribute.

VRA Core metadata attributes which can be applied to virtually any element. Note that xml:lang should contain ISO 639 language codes, not the English names of languages. Although the XML Schema defines xml:lang as allowing ISO 639-2 (three-letter) codes, some validators will only accept ISO 639-1 (two-letter) codes.

This annotation is misleading. First, the VRA Core authors are trying to alert catalogers and technologists that they need to not use the full text name value as might be done in other "library oriented standards", but rather they need to use language codes. In general this is a good thing. However, the VRA authors fail to understand the XML specification. Specifically, they indicate the need to use ISO 639 language codes. This is not true. XML needs to use BCP-47 language codes. This can be found in the specification for XML 1.0 fifth edition §2.12 https://www.w3.org/TR/xml/#sec-lang-tag. It is true that BCP-47 currently calls for the use of ISO 639 codes, but this might not always be true.

A second issue with the annotation is how the annotation distinguishes use between ISO 639-2 and ISO 639-1. If there are VRA Core data consumers or producers who are not consuming or producing valid XML then this is a transmission machinery issue not a protocol issue. BCP-47 does not call for the use of ISO 639-2/3 tas when there is an equivalent ISO 639-1 tag. If data ingest processes have only implemented ingest of ISO 639-1 then they haven't implemented VRA because VRA stands on XML which stands on BCP-47. BCP-47 is an algorithm which calls upon different standards at different times. Understanding the fall back nature of the algorithm would have clarified this point for VRA authors.

The following resources are useful for a better understanding of Language Tags in XML:

The Journeyler

A walk through: Life, Leadership, Linguistics, Language Documentation, WordPress, and OS X (and a bit of Marketing & Business Administration)

VRA Core and its use of xml:lang

Leave a Reply Cancel reply

Share this:

Leave a Reply Cancel reply