This post is a open draft! It was originally started on April 23, 2011. Almost two years later it makes it's public debut. It might be updated at any time... But was last updated on December 18, 2013 at 8:36 pm.
In a team framework where there are several members of a research team and the job requirements call for the sharing of bibliographic data (of materials referenced) as well as the actual resources being referenced. In this environment there needs to be a central repository for sharing both kinds of data. This is true for small localized (geographically) groups as well as large distributed research teams. New researchers joining a existing team need to be able to “plug-in” to existing foundational work on the project and be able to access bibliographic data as well as the resources those bibliographic details point to. It is my point here to outline some of the current challenges involved in trying to overcoming the collaborative obstacle when working in the fields of Linguistics and Language Documentation[ref 1].This sentiment is echoed by many in the world of science. Here is someone on Zetero’s forums [INSERT LINK]. (Though Zetero does claim to combat some of these issues.)
Bibliographic Data v.s Citation Data
It is important here to make a distinction between citations and the data which make up these citations and bibliographic data as I use the term bibliographic data here. In the creation of a bibliography certain fields are required and their order is determined by a citation style. There are thousands of styles. APACite APA, MLA, almost every academic journal has a different stylesheet governing citations. These stylesheets dictate the format and the content of the bibliography. What is contained in a bibliography is not what I am terming as biblographic data. Bibliographic data is more like the sum of all the information which might be called for by every stylesheet in the world, if a particular resource were to be cited in every stylesheet. This is also not all the meta-data about a given resource. So bibliographic data is a subset of all meta-data about a resource but may be more than what a single stylesheet would call for. It is also important to note that all bibliographic data is also resource type dependent. Resource types can be considered to be the distinction between a Grant, a Book, a Journal article, a Data Set, a Standard, a Web-page, a Web-site, a Manuscript, etc. So, a book might not have a URL associated with it and if it did most citation stylesheets would not call for its inclusion in a bibliography. However, a Journal article might have a DOI and a paper based reference (Journal name, volume, issue, and page numbers, etc.), some journals would call for the inclusion of both, or only one of the two in their citation stylesheet. So, when considering bibliographic data, both the paper based citation and the DOI should be included.
One other infamous example is the case of the et. al. More explicitly how many authors are listed before et. al is used? When is et. al used, with greater than three authors or with greater than four authors? Consider the following examples from the APA and the MLA.
The APA style calls for the following according to the Perdue Online Writing Lab:
- Two Authors
List by their last names and initials. Use the ampersand instead of “and.”
Wegener, D. T., & Petty, R. E. (1994). Mood management across affective states: The hedonic contingency hypothesis. Journal of Personality & Social Psychology, 66, 1034-1048.
- Three to Seven Authors
List by last names and initials; commas separate author names, while the last author name is preceded again by ampersand.
Kernis, M. H., Cornell, D. P., Sun, C. R., Berry, A., Harlow, T., & Bach, J. S. (1993). There's more to self-esteem than whether it is high or low: The importance of stability of self-esteem. Journal of Personality and Social Psychology, 65, 1190-1204.
- More Than Seven Authors
Miller, F. H., Choi, M. J., Angeli, L. L., Harland, A. A., Stamos, J. A., Thomas, S. T., . . . Rubin, L. H. (2009). Web site usability for the blind and low-vision user. Technical Communication, 57, 323-335.
Now Consider the same kinds of resources but in the MLA style according to Cornell University Library.
- Two authors
Cross, Susan, and Christine Hoffman. Bruce Nauman: Theaters of Experience. New York: Guggenheim Museum; London: Thames & Hudson, 2004. Print.
- Three authors
Lowi, Theodore, Benjamin Ginsberg, and Steve Jackson. Analyzing American Government: American Government, Freedom and Power. 3rd ed. New York: Norton, 1994. Print.
- More than three authors
Gilman, Sandor, et al. Hysteria beyond Freud. Berkeley: U of California P, 1993. Print.
INSERT EXAMPLE OF ET.AL HERE
The et al. Case. & the Case of the Middle initial.
SIL Electronic Working Papers Citation Format
Here is a suggested format for bibliographic citation of papers published in the SILEWP series (parts in italics are replaced by specific information):
Author. Year. Article Title. SIL Electronic Working Papers Issue Number. Dallas: SIL International. Online. URL: URL. Access date.
Constable, Peter and Gary Simons. 2000. Language Identification and IT: Addressing Problems of Linguistic Diversity on a Global Scale. SIL Electronic Working Papers 2000-001. Dallas: SIL International. Online. URL: http://www.sil.org/silewp/2000/001/. Accessed on 20 September, 2000.
The URL may also be enclosed by angled brackets, like this:
ALSO INCLUDE AN EXAMLE FROM AILLA
To cite this resource
Zavala Maldonado, Roberto (Researcher), Juan Gervasio (Speaker). (1986). “Juan Gervasio”. MesoAmerican Languages. The Archive of the Indigenous Languages of Latin America: www.ailla.utexas.org. Media: audio. Access: restricted. Resource: COK002R003.
Woodbury, Anthony C. (Researcher), Emiliana Cruz (Researcher), Flavia Mateo Mejía (Speaker), Chatino Language Documentation Project (Researcher). (2009). “Verbs”. Chatino Language Documentation Project Collection. The Archive of the Indigenous Languages of Latin America: www.ailla.utexas.org. Media: audio, text. Access: public. Resource: CTA001R015.
Because the bibliographic data is resource type dependent the data structure for sharing this information is needs to be versatile. Additionally, the scope of what information is available to be packaged for reuse is a consideration. The assessment of what is, or what should be, considered bibliographic data is a question which needs to be answered from a perspective which is larger than just single use for the data. This is not just “how am I (or we) using this data”, but are these the right details and are these details sufficient for the larger community of users. In the field of linguistics and Language Documentation one field necessary for inclusion with the bibliographic data is a field for a method of identifying the language a particular resource references or represents. Fortunately, there is a current best practice way of associating language data and resources to an index of languages. This involves using the ISO 639-3 code set of identifying languages. However, I know of no current style sheet in any journal which requires the inclusion of the ISO 639-3 codes to which a resource references in the bibliography of an item published.
Working in the field of Linguistics and Language Documentation, the need to efficiently and effectively share and reuse bibliographic data is no less true than in any other scientific filed, it just so happens that our resources revolve around language data. To date I have encountered several strategies used by linguist for resolving this challenge. These are:
Each of these processes of sharing call for different containers. In order to share something (digitally) it must be in some sort of container. i.e. a file, a folder, a database, on a hard drive, or contained in a website. So it is easy to see that there petentially are even containers for containers.
Each of these processes ignores the re-usability principle for data as mentioned by Gary Simons in Bird and simons 2003 http://www.sil.org/~simonsg/preprint/Seven%20dimensions.pdf (Cite volume 3.)
3.5 Citation. By citation we mean the problems associated with making bibliographic citations of electronic language documentation and description. The area of citation involves four key concepts: the ability to cite a resource in a bibliography, the persistence of electronic resource identifiers, the immutability of materials that are cited, and the granularity of what may be cited.
BIBLIOGRAPHY. Research publications are normally required to provide full bibliographic citation of the materials used in conducting the research. Citation standards are usually high when citing conventional publications, but are much lower for citations of digital language resources. Many scholars do not know how to cite electronic resources; thus the latter are often incorrectly cited, or not cited at all. Consequently, it is difficult to discover what resource was used in conducting the research or, following the linkage in the reverse direction, to consult a citation index to discover all the ways in which a given resource has been used.
PERSISTENCE. Often a language resource is available on the web, and it is convenient to identify the resource by means of its uniform resource locator (URL) since this may offer the most convenient way to obtain the resource. However, URLs are notorious for their lack of persistence. They break when the resource is moved or when some piece of the supporting infrastructure, such as a database server, ceases to work.
IMMUTABILITY. Even if a URL does not break, the item that it references may be mutable, changing over time. Language resources published on the web are usually not versioned, and a third-party description based on some resource may cease to be valid if that resource is changed. This problem can be solved by archiving each version and ensuring that citations reference a particular version. Publishing a digital artifact, such as a CD, with a unique identifier, such as an ISBN, also avoids this problem.
GRANULARITY. Citation goes beyond bibliographic citation of a complete item. We may want to cite some component of a resource, such as a specific narrative or lexical entry. However, the format of the resource may not support durable citations to internal components. For instance, if a lexical entry is cited by a URL which incorporates its lemma, and if the spelling of the lemma is altered, then the URL will not track the change. In sum, the portability of a language resource suffers when incoming and outgoing links to related materials are fragile.
Mendley: Desktop and Hosted (limited size)
These strategies all fail to met several functional requirements:
Writing linguistic papers in the third wave
Write a response to using third wave tools to produce second wave products.
Sharing bibliographic data
Organizing bibliographic data, http://www.lx.ugent.be/toolbox/biblio.html
Sharing bibliographic data, http://www.lx.ugent.be/toolbox/biblio.html
Citation styles: http://subjectguides.library.american.edu/citation
CSL Citation style language: http://blogs.plos.org/mfenner/2010/09/24/citation-style-language-an-interview-with-rintze-zelle-and-ian-mulvany/
epub wordpress. http://blogs.plos.org/mfenner/2011/02/01/epub-wordpress-plugin-released-today/
Open Citation Data
Why do we need Open citation data?
Why researchers don’t publish Data
How to use CiTO in blogposts
The Trouble with Bibliographies