Citations, Names and Language Documentation

Posted on September 30, 2011 by Hugh Paterson III

I have recently been reading the blog of Martin Fenner and came upon the article Personal names around the world ^[1] Martin Fenner. 14 August 2011. Personal names around the world. PLoS Blog Network. http://blogs.plos.org/mfenner/2011/08/14/personal-names-around-the-world . [Accessed: 16 September 2011]. [Link] . His post is in fact a reflection on a W3C paper on Personal Names around the WorldSeveral other reflections are here: http://www.w3.org/International/wiki/Personal_names (same title). This is apparently coming out of the i18n effort and is an effort to help authors and database designers make informed decisions about names on the web.
I read Martin’s post with some interest because in Language Documentation getting someone’s name as a source or for informed consent is very important (from a U.S. context). Working in a archive dealing with language materials, I see lot of names. One of the interesting situations which came to me from an Ecuadorian context was different from what I have seen in the w3.org paper or in the w3.org discussion. The naming convention went like this:

The elder was known by the younger’s name plus a relationship.

My suspicion is that it is a taboo to name the dead. So to avoid possibly naming the dead, the younger was referenced and the the relationship was invoked. This affected me in the archive as I am supposed to note who the speaker is on the recordings. In lue of the speakers name, I have the young son’s first name, who is well known in the community, and is in his 30’s or so, and I have the relationship. So in English this might sound like John’s mother. Now what am I supposed to put in the metadata record for the audio recordings I am cataloging? I do not have a name but I do have a relationship to a known (to the community) person.

I inquired with a literacy consultant who has worked in Ecuador with indigenous people for some years, she informed me that in one context she was working in everyone knew what family line they were from and all the names were derived from that family line by position. It was of such that to call someone by there name was an insult.

It sort of reminds me of this sketch by Fry and Laurie.

References[+]

References
↑1	Martin Fenner. 14 August 2011. Personal names around the world. PLoS Blog Network. http://blogs.plos.org/mfenner/2011/08/14/personal-names-around-the-world . [Accessed: 16 September 2011]. [Link]

Language maps like heat maps

Posted on September 18, 2011 by Hugh Paterson III

There is a myriad of difficulties in overlaying language data with geographical data. But it has be done and can be done. While I was working in México on a language documentation project, I learned that some of the language mixing (not quite diglossia, rather the living of two people groups with different languages in the same spaces) was due geographical factors and economical factors pulling them into the same geographic locations. In the particular case I am thinking of there was a mountain pass and a valley on the way to the major center of trade. In this sort of context the interesting things are displayed not when a polygon is drawn showing a territorial overlay of where various language speakers living, but where something is drawn showing what the density or population dispersion per general population is. Some of the most detailed (in terms of global perspective) language maps can be found in the Ethnologue ^[1] Lewis, M. Paul (ed.). 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International. .

Western Central Mexico from the Ethnologue

However, as I was working on the language documentation project I found out how much effort actually goes into that sort of map. ArcGIS, the software used to create the maps can not auto-generate a polygon a certain distance around a combined set of given points. A set of points can be selected and each point can get a 5 mile radius. What this means is that each polygon has to be hand drawn. This sort of graphical overly that is used in the the Ethnologue does not show the density of speakers of a language in an area relative to the total population (in the Ethnologue’s defense I am not sure it is supposed to). For instance, if I wanted to know “What is the density of speakers in the Me’phaa area of México relative to speakers of other languages?” that would show me some dispersion, and by implication the peopling of the area. This sort of geographical overlay may be closer to displaying social networks, not really bilingualism or diglossia. There might be some bilinguals or some average level of bilingualism there, but the heat map method of plotting is looking still at the density of speakers to an area. A simular map might be created of New York City where certain languages are given a color based on their distribution density in the area. Additionally, these sorts of data overlays are probably more prone to lend insights on language attrition patterns or language speaker migration patterns. Also these hand drawn polygons change (a little) from edition to edition. Because the data used to create the polygons is not referenced (cited) it is hard to tell if the change is keeping pace with language attrition and/or population movement or if the changes are due to a better linguistic understanding in a particular area. When looking at the large area maps in the Ethnologue, it is hard to tell if the red dots represent “traditional” language area (or geographical center thereof) or if the points represent the current geographical center of the speaking area. Either way the plotting functions as if it were a heat map showing the diversity of languages over a geographical area.

Americas Map from the Ethnologue

gHeat

I am generally on the look out for web apps and APIs which can be used to overlay data to bring new insights to situations through graphical representations. I recently found a tool for overlaying data on Google Maps. This tool creates heat maps given data from another source. This tool is called gHeat. This tool was brough to my attention by Been O’Steen as he modified gHeat to display some prices for student properties ^[4] Ben O’Steen. 2011. Student Property Heatmap. Random Hacks: Hacks, code and other things. [Accessed: 2 September 2011] http://benosteen.wordpress.com/2011/07/26/student-property-heatmap . [Link] in the UK. My initial thought was: “Wow how can we do language maps like this?”

Student Property Heat Map

Obviously I still think that language based heat maps could prove to provide language workers world wide access to visualizations of data that could really add clarity to the language vitality situation.

References[+]

References
↑1	Lewis, M. Paul (ed.). 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International.
↑2	Map of Languages in Western Mexico in the Ethnologue. [Accessed: 9 September 2011] http://www.ethnologue.com/show_map.asp?name=MX&seq=30. [Link]
↑3	Map of Languages in the Americas in the Ethnologue. [Accessed: 9 September 2011] http://www.ethnologue.com/show_map.asp?name=Americas&seq=10. [Link]
↑4	Ben O’Steen. 2011. Student Property Heatmap. Random Hacks: Hacks, code and other things. [Accessed: 2 September 2011] http://benosteen.wordpress.com/2011/07/26/student-property-heatmap . [Link]

The iPad Team

Posted on September 16, 2011 by Hugh Paterson III

The Concept

I have had some ideas I wanted to try out for using the iPad as a tool for collecting photo metadata. Working in a corporate archive, I have become aware of several collections of photos without much metadata associated with them.

The photos are the property of (or are in the custodial care of) the company I work at (in their corporate archive).

The subject of the photos are either of two general areas:

The minority language speaking people groups that the employees of a particular company worked with, including anthropological topics like ways of life, etc.
Photos of operational events significant to telling the story of the company holding the photos.

Archives in more modern contexts are trying to show their relevance to not only academics, but also to general members of communities. In the United States there is a whole movement of social history. There are community preservation societies which take on the task of collecting old photographs and their stories and preserving, and presenting them for future generations.

The challenge at hand is: "How do we enrich photos by adding metadata to photos in the collections of archives?" There are many solutions to this kind of task. The refining, distilling, and reduction of stories and memories to writing and even to metadata fields is no easy task, nor is it a task that one person can do on their own. One solution, which is often employed by community historians is the personal interview. By interviewing the photographers or people who were at an event and asking them questions about a series of photos it presents an atmosphere of inquisitiveness and one where the story-teller is valued because they have a story-listener. This basic personal connection allows for interactions to occur across generational and technological barriers.

The crucial question is: "How do we facilitate an interaction which is positive for all the parties involved?" The effort and thinking behind answering this question has more to do with shaping human interactions than with anything else. We are also talking about using technology in this interaction. This is true UX or (User Experience).

Past Experience

This past summer I have had several experiences with facilitating one-on-one interactions between knowledgeable parties working with photographs and with someone acting on behalf of the corporate archive. To facilitate this interaction a GoogleDoc Spreadsheet was set up and the person acting on the behalf of the archive was granted access to the spreadsheet. The individual conducting the interview and listening to the stories brought their own netbook (small laptop) from which to enter any collected data. They were also given a photo album full of photos, which the interviewee would look through. This set-up required overcoming several local environmental challenges. As discussed below, some of these challenges were better addressed than others.

Association of Data to a Given Photo

The challenge of keeping up to 150 photos organized durring an interview so that metadata about any given photo could be collected and associated with only that photo. This was addressed by adhering an inventory sticker to the back of each photo and assigning each photo a single row in the GoogleDoc Spreadsheet. Using GoogleDocs was not the ideal solution, but rather than a solution of some compromises:

Strengths of GoogleDocs

One of the great things about GoogleDocs is that the capability exists for multiple people to edit the spreadsheet simultaneously.
Another strength of GoogleDocs is that there is a side bar chat feature so that if there is a question durring the interview that help could be had very quickly from management (me, who was offsite).
The Data can be exported in the following formats: .xlsx , .xls , .csv , .pdf.
There was no cost to deploy the technology.
It is accessible through a web-browser in an OS neutral manner.
The document is available wherever the internet is available.
A single solution could be deployed and used by people digitizing photos, recording written metadata on the photos, and gathering metadata during an interview.
Most people acting on behalf of the archive were familiar with the technology.

Pitfalls of GoogleDocs

More columns exist in the spread sheet than can be practically managed (The columns are presented below in a table). There are about 48 values in a record and there are about 40,000 records.

More columns than can be practically managed

Does not display the various levels of data as levels of data as levels in the user interface.
Cannot remove unnecessary fields from the UI of various people. (No role-based support.)
Only available when there is internet.

Maximizing of Interview Time

To maximize time spent with the interviewee the photos and any metadata written or known about a photo was put into the GoogleDoc Spreadsheet prior to the interview. Sometimes this was not done by the interviewer but rather by someone else working on behalf of the archive. Durring the interview the interviewer could tell which data fields were empty by looking for the gray cells in the spreadsheet. However, just because the cells were did not mean that the interviewee was more prone to provide the desired, unknown, information.

Grey Areas Show Metadata fields which are empty

Data Input Challenges

One unanticipated challenge which was encountered in the interviews was that as the interviewer would bring out an album or two of photos that the interviewees would be able to cover more photos than the interviewer could record.

Let me spell it out. There is one interviewer and two interviewees there are 150 photos in an album lying open on the table. All three participants are looking at the photo album. The interviewee A says look that is so-and-so and then interviewee B (because the other page is closer to them) says and this is so-and-so! This happens for about 8 of the 12 facing photos. Because the interviewer is still typing the first name mentioned they ask and when do you think that was? But the metadata still comes in faster, as the second interviewee did not hear the question and the first one did but still thinking. The bottom line is that more photos are viewed and commented on faster than can be recorded.

Something that could help this process would be to in some way to slow-down (or moderate) the ability of the interviewee(s) to access the photos. Something that could synchronize the processing times with the viewing times. By scanning the photos and then displaying them on a tablet it slows down the viewing process and integrates the recording of data with the viewing of photos.

Positional Interaction Challenges

An interview is, at some level, an interaction. One question which comes up is How does the technology used affect that interaction? What we found was that a laptop usually was situated between the interviewer and the interviewees. This positioned the parties in an apposing manner. Rather than the content becoming the central focus of both parties, the content was either in front of the interviewer or in front of the interviewees. A tablet changes this dynamic in the interaction. It brings both parties together over a single set of content, both positionally and cognitively. When the photo is displayed on the laptop, the laptop has to be rotated so that the interviewees can see the image and then turned so that the interviewer can input the data. This is not the case for a tablet.

Content Management Challenges

When Paper is used for collecting metadata it is ideal to have one piece of paper for each photo. Sometimes this method is preferable to using a single computer. I used this method when I had a photo display and about 20 albums and about 200 people all filling out details at once.

People filling out metadata forms infront of a photo display.

People came and went as they pleased. When someone recognized someone or someplace they knew, they wrote down the picture ID and the info they were contributing along with their name. However, carrying around photo albums and paper there is the challenge of keeping all the photos from getting damaged, and maintaining the order of the photos and associated papers.

Connectivity Challenges

When there is no internet there is no access to GoogleDocs. We encountered this when we went to someone's apartment, expecting interent because the interent is available on campus and this apartment was also on campus. Fortunately we did have a back up plan and paper pen was used. But this means that we now had to type out the data, which was written down on the paper; in effect doing the same recording work twice.

Size of Devices

Photo albums have a certain bulk and cumbersome-ness which is multiplied when carrying more than one album at a time. Add to this a computer laptop and one might as well add to the list of required items, a hand truck with which to carry everything. A tablet is all in all a lot smaller and lighter.

</ref></p> " data-medium-file="https://i0.wp.com/hugh.thejourneyler.org/wordpress/wp-content/uploads/2011/09/laptop-vs-tablet-620x410.jpg?fit=300%2C198&ssl=1" data-large-file="https://i0.wp.com/hugh.thejourneyler.org/wordpress/wp-content/uploads/2011/09/laptop-vs-tablet-620x410.jpg?fit=584%2C386&ssl=1" src="https://hugh.thejourneyler.org/wordpress/wp-content/uploads/2011/09/laptop-vs-tablet-620x410.jpg" alt="Laptop and Tablet" width="620" height="410" class="size-full wp-image-2804" srcset="https://i0.wp.com/hugh.thejourneyler.org/wordpress/wp-content/uploads/2011/09/laptop-vs-tablet-620x410.jpg?resize=620%2C410&ssl=1 620w, https://i0.wp.com/hugh.thejourneyler.org/wordpress/wp-content/uploads/2011/09/laptop-vs-tablet-620x410.jpg?resize=300%2C198&ssl=1 300w" sizes="auto, (max-width: 584px) 100vw, 584px" />

Laptop and TabletThis image is credited to Alia Haley ^[2] Alia Haley. 31 August 2011. Tablet vs. Laptop. Church Mag. [Accessed: 11 September 2011] http://churchm.ag/tablet-vs-laptop. [Link]

Proof of Concept Technology

As I mentioned before, I had an iPad in my possession for a few days. So to capitalize on the opportunity, I bought a few apps from the app store, as I mentioned that I would and tried them out.

Software which does not work for our purposes

Photoforge2
The first app I tried was Photoforge2. It is a highly rated app in the app store. I found that it delivered as promised. One could add or edit the IPTC and EXIF metadata. One could even edit where the photo was taken with a pin drop interface.

iPad Fotoforge Location Data

iPad Fotoforge Metadata Editor

Meta Editor
Meta Editor, another iPad app, which was also highly acclaimed performed task almost as well. Photoforge2 had some photo editing features which were not needed in our project. Whereas Meta Editor was focused only on metadata elements.

MetadataEditor Location Data

After using both applications it became apparent that neither would work for this project for at least two reasons:

Both applications edit the Standards based IPTC and EXIF metadata fields in photos. We have some custom metadata which does not fit into either of these fileds.One aspect of the technology being discussed, which might be helpful for readers to understand, is that these iPad applications actually embed the metadata into the photos. So when the photos are then taken off of the iPad the metadata travels with them. This is a desirable feature for presentation photos.
Even if we do embed the metadata with these apps the version of the photo being enriched is not the Archival version of the photo it is the Presentation version of the photo. We still need the data to become associated with the archival version of the photo.

Software with some really functional features

So we needed something with a mechanism for capturing our customized data. Two options were found which seemed to avail themselves as suitable for the task. One was ideal the other rapidly deployable. Understanding the iPads' place in the larger place of corporate architecture, relationship to the digital repository, the process of data flow from the point of collection to dissemination, will help us to visualize the particular challenges that the iPad presents solutions for. Once we see where the iPad sits in relationship to the rest of the digital landscape I think it will be fairly obvious why one solution is ideal and the other rapidly deployable.

Placement in the iPad in the Information Architecture Picture

In my previous post on Social Metadata Collection ^[3]Hugh J. Paterson III. 29 June 2011. The Journeyler. [Accessed: 13 September 2011] https://hugh.thejourneyler.org/social-meta-data-collection. [Link] I used the below image to show where the iPad was used in the metadata collection process.

Meta-data Collection Model

Since that time, as I have shown this image when I talk about this idea, I have become aware that the image is not detailed enough. Because it is not detailed enough it can lead to some wrong assumptions on how the iPad use being proposed actually works. So, I am presenting a new image with a greater level of detail to show how the iPad interacts with other corporate systems and workflows.

iPad Team as they fit with other digital elements.

There are several things to note here:

Member Disporia as represented here is not just members, it is their families, the people with whom these members worked, it is the members currently working and it the members living close at hand on campus, not just in disporia.
It is a copy of the presentation file which is pushed out to the iPad or the website for the Member Disporia. This copy of the file does not necessarily need to be brought back to the archive as long as the metadata is synced back appropriately.
The Institutional Repository for other corporate items is currently in a DSpace instance. However, it has not been decided for sure that photos will be housed in this same instance, or even in DSpace.

That said, it is important that the metadata be embedded in the presentation file of the image, as well as accessible to the Main container for the archival of the photos. The metadata also needs to sync between the iPad application and the Member Diaspora website. Metadata truly needs to flow through the entire system.

FileMaker Pro with File Maker Go

FileMaker Pro is a powerful database app. It could drive the Member Disporia website and then also sync with the iPad. This would be a one-stop solution and therefore and ideal solution. It is also complex and takes more skill to set up than I currently have, or I can currently spare to acquire. Both FileMaker Pro and its younger cousin Bento enable Photos to be embedded in the actual database.Several tips from the Bento forums on syncing photos which are part of the database:
Syncing pictures from Bento-Mac to Bento-iPad
Sync multiple photos or files from desktop to IPad
This is something which is important with regards to syncing with the iPad. To the best of my knowledge (and googling) no other database apps for the iPad or Android platforms allow for the syncing of photos within the app.

Bento
Bento is the rapidly deployable option.What are the differences between Bento 4 for Mac, Bento for iPad 1.1.x, and Bento for iPhone/iPod touch 1.1.x?
It took me about 2 hours (while doing other stuff) to download a trial version, find out how it worked, import my data from the GoogleDoc and then sync my database with the iPad.

Here is a YouTube video demonstrating my proof of concept using Bento.

httpv://youtu.be/_Eo5Ru0BF-k

Here is a series of iPad Screen shots.

Screen Long ways

Inputing Data

Data and Photo seen together.

Some outstanding issues

Geo-location of Photos in Bento. Bento version 4 does have location fileds which can be used with a pin drop interface to add location data to the appropiate fileds in the database. My proof of concept demo does not demonstrate this feature.Using Geo-location fields in Bento: Working with Location Fields in Bento
How to use Location fields in Bento for iPhone/iPad 1.1.1
Rapid reuse of data. Because the interview process naturally lends itself to eliciting the same kind of data over a multitude of photos a UX/UI element which allows the rapid reuse of data would be very practical. The kinds of data which would lend themselves to rapid reuse would be peoples' names, locations, dates, photographer, etc. This may mean being able to query a table of already input'd data values with an auto-suggest type function.

Custom iPad App

Of course there is also the option to develop a custom iPad app for just our purposes. This entails some other kinds of planning, including but not limited to:

Custom App development
Support plan
Deploy or develop possible Web-backend - if needed.

Kinds of custom metadata being collected.

The table in this section shows the kinds of questions we are asking in our interviews. It is not only provided for reference as a discussion of the Information Architecture for the storage and elements of the metadata schema is out of the scope of this discussion. The list of questions and values presented in the table was derived as a minimal set of questions based on issues of Image Workflow Processing, Intelectual Property and Permissions, Academic Merit and input from the controlled vocabulary's Caption and Keywording Guidelines ^[4] Controlled Vocabulary. Caption and Keywording Guidelines. [Accessed: 13 September 2011] http://www.controlledvocabulary.com/metalogging/ck_guidelines.html. [Link] which is part of their series on metalogging. The table also shows corresponding IPTC, and EXIF data fields. (Though they are currently empty because I have not filed them in.) Understanding the relationships of XMP, IPTC, and EXIF also help us to understand why and how the iPad tool needs to interact with other Archiving solutions. However, it is not within the scope of this post to discuss these differences.Some useful resources on these issues are noted here:

Photolinker Metadata Tags ^[5] Early Innovations, LLC. 2011. Photolinker Metadata Tags. [Accessed: 13 September 2011] http://www.earlyinnovations.com/photolinker/metadata-tags.html. [Link] has a nice display outlining where XMP, IPTC and EXIF data overlap. This is not authoritative, but rather practical.
List of IPTC fields: List of IPTC fields. However, a list is not enough we also need to know what they mean so that we know that we are using them correctly.
EXIF and IPTC Header Comments. Here is another list of IPTC fileds. This list also includes a list of list of EXIF fileds. (Again without definitions.)
Various programs and applications also add their own metadata fields in the IPTC section. Here is a mapping of some of the most popular ones: http://www.controlledvocabulary.com/imagedatabases/iptc_core_mapped.pdf
IPTC Standard Photo Metadata ^[6]David Riecks. 2010. IPTC Standard Photo Metadata (July 2010). International Press Telecommunications Council. [Accessed: 13 September 2011] … Continue reading http://www.iptc.org/std/photometadata/documentation/IPTC-PLUS-Metadata-Panel-UserGuide_6.pdf
Doublin Core with Photographs: http://makeit.digitalnz.org/askaquestion/questions/26
Dublin Core Metadata Element Set, Version 1.1: http://dublincore.org/documents/dces/
DCMI Type Vocabulary: http://dublincore.org/documents/dcmi-type-vocabulary/
Describing Digital Content: http://makeit.digitalnz.org/guidelines/describing-digital-content/

It is sufficient to note that there is some, and only some overlap.

Metadata Element	Purpose	Explanation
Photo Collection	This is the name of the collection in which the photos reside
Sub Collection	This is the name of the sub collection in which the photos reside
Letter of Collection	Each collection is given an alpha character or a series of alpha characters, if the collection pertains to one people group then the alpha characters given to that collection are the three digit ISO 639-3 code
Who input the Meta-data	This is the name of the person inputting the metadata
Photo Number	This is the number of the photo as we have inventoried the photo
Negative Number	This is the number of the photo as it appears on the negative (film strip)
Roll	This is the ID of the Roll	Most sets of negatives are cut into strips of 5 or less this allows us to group these sets together to ID a “set” of photos
Section Number	If the items are in a book or a scrap book and that scrap book has a section this is where that is recoreded
Page#	If a scrap book has a set of pages then this is where they are recoreded
Duplicates	This is where the Photo ID of a duplicate item is referenced.
Old Inventory Number(s)	This is the inventory number of an item if it were part of another invenotry system
Photographer	This is the name of the photographer
Subject 1 (who)	Who is in the photo, this should be an unlimited field. That is sveral names should be able to be added to this.
Subject 2		Who is in the photo, this should be an unlimited field. That is sveral names should be able to be added to this.
Subject 3		Who is in the photo, this should be an unlimited field. That is sveral names should be able to be added to this.
Subject 4		Who is in the photo, this should be an unlimited field. That is sveral names should be able to be added to this.
Subject 5		Who is in the photo, this should be an unlimited field. That is sveral names should be able to be added to this.
People group	This is the name of the people group meneined in the ISO 639-3 codes
ISO 639-3 Code	This is the ISO 639-3 code of the people group being photographed
When was the photo Taken?	The date the photo was taken
Country	The country in which the photo was taken
District/City	This is the City where the photo was taken
Exact Place	The exact place name where the photo was taken
What is in the Photo (what)	This is an item in the photo
What is in the Photo	Additional what is in the photo
What is in the Photo	Addtional what is in the photo
Why was the Photo Taken?		This is to help metadata providers think about how events get communicated
Description	This is a description of the photo’s contents	This is not a caption but could be used as a caption
Who Provided This Meta-Data? And when?		We need to keep track of who is the source of certain metadata to understand its authority
Who Provided This Meta-Data? And when?		We need to keep track of who is the source of certain metadata to understand its authority
Who Provided This Meta-Data? And when?		We need to keep track of who is the source of certain metadata to understand its authority
Who Provided This Meta-Data? And when?		We need to keep track of who is the source of certain metadata to understand its authority
Who Provided This Meta-Data? And when?		We need to keep track of who is the source of certain metadata to understand its authority
I am in this photo and I approve it to be on the internet. Put in "yes" or "No" and write your name in the next column.	Permission to distribute
Name:	Name of the person releasing the photo
How was this photo digitized?	Method of digitization and the tools used in digitization
Who digitized This photo	This is the name of the person who did the digitization

References[+]

References
↑1	Alia Haley. 31 August 2011. Tablet vs. Laptop. Church Mag. [Accessed: 11 September 2011] http://churchm.ag/tablet-vs-laptop. [<a href="http://churchm.ag/tablet-vs-laptop" title="Tablet vs Laptop">Link</a>]
↑2	Alia Haley. 31 August 2011. Tablet vs. Laptop. Church Mag. [Accessed: 11 September 2011] http://churchm.ag/tablet-vs-laptop. [Link]
↑3	Hugh J. Paterson III. 29 June 2011. The Journeyler. [Accessed: 13 September 2011] https://hugh.thejourneyler.org/social-meta-data-collection. [Link]
↑4	Controlled Vocabulary. Caption and Keywording Guidelines. [Accessed: 13 September 2011] http://www.controlledvocabulary.com/metalogging/ck_guidelines.html. [Link]
↑5	Early Innovations, LLC. 2011. Photolinker Metadata Tags. [Accessed: 13 September 2011] http://www.earlyinnovations.com/photolinker/metadata-tags.html. [Link]
↑6	David Riecks. 2010. IPTC Standard Photo Metadata (July 2010). International Press Telecommunications Council. [Accessed: 13 September 2011] http://www.iptc.org/std/photometadata/documentation/IPTC-PLUS-Metadata-Panel-UserGuide_6.pdf [Link]

Language Documentation and the Datum

Posted on September 3, 2011 by Hugh Paterson III

The importance of knowing about the Datum ^[1]Wikipedia contributors. Datum (geodesy). Wikipedia, The Free Encyclopedia. 3 April 2011, 00:28 UTC. Available at: http://en.wikipedia.org/w/index.php?title=Datum_(geodesy)&oldid=422063702. [Accessed … Continue reading recently came to my attention as I was working with GIS data on a Language Documentation project. We were collecting GPS coordinates with a handheld GPS unit and comparing these coordinates with data supplied by the national cartographic office. End goal was to compare data samples collected with conclusions proposed by the national cartographic office.

So, what am I talking about?

GIS data is used in a Geographical Information System. Basically, you can think of maps and what you might want to show with a map: rivers, towns, roads, language features, dialect markers, etc. Well, maps are shapes superimposed with a grid. And coordinates are a way of naming where on a particular grid a given point is located.

Continue reading →

References[+]

References
↑1	Wikipedia contributors. Datum (geodesy). Wikipedia, The Free Encyclopedia. 3 April 2011, 00:28 UTC. Available at: http://en.wikipedia.org/w/index.php?title=Datum_(geodesy)&oldid=422063702. [Accessed 5 May 2011] [Link]

Letting Go

Posted on August 29, 2011 by Hugh Paterson III

Working in an archive, one can imagine that letting go of materials is a real challenge. Both in that it is hard to do becasue of policy, but also because it is hard to do because of the emotional “pack-rat” nature of archivist. This is no less the case of the archive where I work. We were recently working through a set of items and getting rid of the duplicates. (Physical space has its price; and the work should soon be available via JASOR.) However, one of the items we were getting rid of was a journal issue on a people group/language. The journal has three articles, of these, only one of them article was written by someone who worked for the same organization I am working for now. So the “employer” and owner-operator of the archive only has rights to one of the three works. (Rights by virtue of “work-for-hire” laws.) We have the the off-print, which is what we have rights to share, so we keep and share that. It all makes sense. However, what we keep is catalogued and inventoried. Our catalogue is shared with the world via OLAC. With this tool someone can search for a resource on a language, by language. It occurs to me that the other two articles on this people group/language will not show in the aggregation of results of OLAC. This is a shame as it would be really helpful in many ways. I wish there was a groundswell, open source, grassroots web facilitated effort where various researchers can go and put metadata (citations) of articles and then they would be added to the OLAC search.

Interoperability of online dictionary data:   A test case using WordPress as a CMS

Posted on August 12, 2011 by Hugh Paterson III

Linked data is an effort to enhance applications and thereby lives with structured knowledge. This structure at its core is developed by human interaction. The challenge to consumers of linked data is to convince holders of unstructured data to structure it into actionable, manipulatable knowledge. Continue reading →

Metadata Magic

Posted on August 10, 2011 by Hugh Paterson III

The company I work for has an archive for many kinds of materials. In recent times this company has moved to start a digital repository using DSpace. To facilitate contributions to the repository the company has built an Adobe AIR app which allows for the uploading of metadata to the metadata elements of DSpace as well as the attachement of the digital item to the proper bitstream. Totally Awesome.

However, one of the challenges is that just because the metadata is curated, collected and properly filed, it does not mean that the metadata is embedded in the digital items uploaded to the repository. PDFs are still being uploaded with the PDF’s author attribute set to Microsoft-WordMore about the metadata attributes of PDF/A can be read about on pdfa.org. Not only is the correct metadata and the wrong metadata in the same place at the same time (and being uploaded at the same time) later, when a consumer of the digital file downloads the file, only the wrong metadata will travel with the file. This is not just happening with PDFs but also with .mp3, .wav, .docx, .mov, .jpg and a slew of other file types. This saga of bad metadata in PDFs has been recognized since at least 2004 by James Howison & Abby Goodrum. 2004. Why can’t I manage academic papers like MP3s? The evolution and intent of Metadata standards.

So, today I was looking around to see if Adobe AIR can indeed use some of the available tools to propagate the correct metadata in the files before upload so that when the files arrive in DSpace that they will have the correct metadata.

The first step is to retrieve metadata from files. It seems that Adobe AIR can do this with PDFs. (One would hope so as they are both brain children of the geeks at Adobe.) However, what is needed in this particular set up is a two way street with a check in between. We would need to overwrite what was there with the data we want there.
However, as of 2009, there were no tools in AIR which could manipulate exif Data (for photos).
But it does look like the situation is more hopeful for working with audio metadata.

One way around the limitations of JavaScript itself might be to use JavaScript to call a command-line tool or execute a python, perl, or shell script, or even use a library. There are some technical challenges which need bridged when using these kinds of tools in a cross-platform environment. (Anything from flavors of Linux to, OS X 10.4-10.7 and Windows XP – Current.) This is mostly because of the various ways of implementing scripts on differnt platforms.

The technical challenge is that Adobe AIR is basically a JavaScript environment. As such there are certain technical challenges around implementation of command-line tools like Xpdf from fooLabs and Coherent PDF Tools or Phil Harvey’s ExifTool, Exifv2, pdftk, or even TagLib. One of the things that Adobe AIR can do is call an executable via something called actionscript. There are even examples of how to do this with PDF Metadata. This method uses PurePDF, a complete actionscript PDF library. Actionscript is powerful in and of itself, it can be used to call the XMP metadata of a PDF, Though one could use it to call on Java to do the same “work”.

Three Lingering Thoughts

Even if the Resource and Metadata Packager has the abilities to embed the metadata in the files themselves, it does not mean that the submitters would know about how to use them or why to use them. This is not, however, a valid reason to not include functionality in a development project. All marketing aside, an archive does have a responsibility to consumers of the digital content, that the content will be functional. Part of today’s “functional” is the interoperability of metadata. Consumers do appreciate – even expect – that the metadata will be interoperable. The extra effort taken on the submitting end of the process, pays dividends as consumers use the files with programs like Picasa, iPhoto, PhotoShop, iTunes, Mendeley, Papers, etc.
Another thought that comes to mind is that When one is dealing with large files (over 1 GB) It occurs to me that there is a reason for making a “preview” version of a couple of MB. That is if I have a 2 GB audio file, why not make 4 MB .mp3 file for rapid assessment of the file to see if it is worth downloading the .wav file. It seems that a metadata packager could also create a presentation file on the fly too. This is no-less true with photos or images. If a command-line tool could be used like imagemagick, that would be awesome.
This problem has been addressed in the open source library science world. In fact a nice piece of software does live out there. It is called the Metadata Extraction Tool. It is not an end-all for all of this archive’s needs but it is a solution for some needs of this type.

Review of Garmin eTrex Venture HC for Language Documentation

Posted on July 4, 2011 by Hugh Paterson III

In a recent (2010-2011) Language Documentation Project we decided to also collect GIS data (GPS Coordinates), about our consultants (place of origin and place of current dwelling), about our recording locations and for Geo-tagging Photos. We used a Garmin eTrex Venture HC to collect the data and then we compared this data with GIS information from Google maps and the National GIS information service. This write up and evaluation of the Garmin eTrex Venture HC is based on this experience.
Continue reading →

Software I would load on my Windows machine, because I can’t on my Mac…

Posted on June 9, 2011 by Hugh Paterson III

While I was in Mexico I realized that for the way I work, virtualization was not the best solution… so here is a list of applications I would use:

Scan Taylor http://sourceforge.net/projects/scantailor/
Qiqqa http://www.qiqqa.com/About/Features#Compare
StatPlanet http://www.sacmeq.org/statplanet
FLeX http://fieldworks.sil.org/flex/
SayMore http://saymore.palaso.org/about
Chrome http://www.google.com/chrome/intl/en/make/features.html
GSpot www.headbands.com/gspot/

Network Language Documentation File Management

Posted on May 4, 2011 by Hugh Paterson III

This post is a open draft! It might be updated at any time… But was last updated on at .

Meta-data is not just for Archives

Bringing the usefulness of meta-data to the language project workflow
It has recently come to my attention that there is a challenge when considering the need for a network accessible file management solution during a language documentation project. This comes with my first introduction to linguistic field experience and my first field setting for a language documentation project.The project I was involved with was documenting 4 Languages in the same language family. The Location was in Mexico. We had high-speed Internet, and a Local Area Network. Stable electric (more than not). The heart of the language communities were a 2-3 hour drive from where we were staying, so we could make trips to different villages in the language community, and there were language consultants coming to us from various villages. Those consultants who came to us were computer literate and were capable of writing in their language. The methods of the documentation project was motivated along the lines of: “we want to know ‘xyz’ so we can write a paper about ‘xyz’ so lets elicit things about ‘xyz'”. In a sense, the project was product oriented rather than (anthropological) framework oriented. We had a recording booth. Our consultants could log into a Google-doc and fill out a paradigm, we could run the list of words given to us through the Google-doc to a word processor and create a list to be recorded. Give that list to the recording technician and then produce a recorded list. Our consultants could also create a story, and often did and then we would help them to revise it and record it. We had Geo-Social data from the Mexican government census. We had Geo-spacial data from our own GPS units. During the corse of the project massive amounts of data were created in a wide variety of formats. Additionally, in the case of this project language description is happening concurrently with language documentation. The result is that additional data is desired and generated. That is, language documentation and language description feed each other in a symbiotic relationship. Description helps us understand why this language is so important to document and which data to get, documenting it gives us the data for doing analysis to describe the language. The challenge has been how do we organize the data in meaningful and useful ways for current work and future work (archiving)?People are evidently doing it, all over the world… maybe I just need to know how they are doing it. In our project there were two opposing needs for the data:

Data organization for archiving.
Data organization for current use in analysis and evaluation of what else to document.It could be argued that a well planned corpus would eliminate, or reduce the need for flexibility to decide what else there is to document. This line of thought does have its merits. But flexibility is needed by those people who do not try to implement detailed plans.

Continue reading →

The Journeyler

A walk through: Life, Leadership, Linguistics, Language Documentation, WordPress, and OS X (and a bit of Marketing & Business Administration)

Category Archives: Language Documentation

Citations, Names and Language Documentation

Language maps like heat maps

gHeat

Language Documentation and the Datum

So, what am I talking about?

Letting Go

Interoperability of online dictionary data:   A test case using WordPress as a CMS

Metadata Magic

Three Lingering Thoughts

Review of Garmin eTrex Venture HC for Language Documentation

Software I would load on my Windows machine, because I can’t on my Mac…

Network Language Documentation File Management

Meta-data is not just for Archives