January 4-5, 2012, I had the opportunity to participate in the LSA's Satellite Workshop for Sociolinguistic Archival Preparation in Portland, Oregon. There were a great many things I learned there. So here are only a few thoughts.
Part of the discussion at the workshop was on how we can make corpora which are collected by Sociolinguists available to the larger Sociolinguistic community. In particular the discussion I am referencing revolved around the standardisation of metadata in the corpora. (In the discussion it was established that are two levels of metadata, "event level" and "corpus level".) While OLAC gives us some standardization about the corpus level metadata, the event metadata is still unique to each investigation, and arguably this is necessary. However, it was also pointed out that not all "event level" metadata need to be encoded or tracked uniquely. That is, data like date of recording, name of participants, location of recording, gender (male/female) of participant, can all be regularized across the community.
With the above as preface, it is important to realize that we do need to understand that there are still various kinds of metadata which need to be collected. In the workshop it was acknowledged that the field of language documentation was about 10 years ahead of this community of sociolinguists.What was not well defined in the workshop was what the distinction is between a language documentation corpus and a sociolinguistics corpus. It seems to me as a new practitioner that the chief difference between these two types of corpora is the self identifying quality of researcher. That is does the researcher self-identify as a Sociolinguist or as a Language Documenter. Both types of corpora attempt to get at the vernacular, and both types of corpora collect sociolinguistic facts. It would seem that both corpora are essentially the same (give or take a few metadata attributes). So, I will take an example from the metadata write-up I did for the Meꞌphaa language documentation project. In that project we collected metadata about:
People
Equipment
Equipment settings during recording
Locations
Recording Environments
Times
Situations
Linguistic Dynamics
Sociolinguistic Attitudes
In the following diagram I illustrate the cross cutting of a corpus with these "kinds" of metadata. The heavier, darker line represents the corpus, while the medium heavy lines represent the "kinds" of metadata. Finally, the lighter lines represent the sub-kinds of metadata, where the sub-kinds might be the latitude, longitude, altitude, datum, country, and place name of the location.
Corpora metadata categories with some sub-categories
This does not mean that the corpus does not also need to be cross cut with these other "sub-kinds". However, these sub-kinds are significantly more in number and will very from project to project. Some of these metadata kinds will be collected in a speaker profile questionnaire. But some of these metadata can only be provided with reflection on the event. To demonstrate the cross cutting of these metadata elements on a corpus I have provided the following diagram. It uses categories which were mentioned in the workshop and is not intended to be comprehensive. In this second diagram, the cross cutting elements might themselves be taxonomies. They may have controlled vocabularies or they may have an open set of possible values, they may also represent a scale.
Taxonomies for social demographics and social dynamics for speakers in corpora
Both of these diagrams tend to illustrate what in this workshop were referred to a "event level" metadata, rather than "corpus level" metadata.
A note on corpus level metadata v.s. descriptive metadata
There is one more thing which I would like to say about "corpus level" metadata. Metadata is often separated out by function. That is what does the metadata allow us to do, or why is the metadata there?
I have been exposed to the following taxonomy of metadata types though course work and in working with photographs and images. These classes of metadata are also similar to those posted by JISC Digital Media as they approach issues with Metadata for digital audio.
Descriptive meta-data: supports discovery, attribution and identification of resources created.
Administrative meta-data: supports management, preservation, and appropriate usage of resources created.
Technical: About the machinery used to create the resource and the technical aspects of the resource.
Use and Rights: Copyright, license and moral ownership of the items.
Structural meta-data: maintains relationships between the parts of complex, multi-part resources (Spanne 2008).
Situational: this is metadata which describes the events around the creation of the work. Asking questions about the social setting, or the precursory events. It follows ideas put forward by Bergqvist (2007).
Use metadata: metadata collected from or about the users themselves (e.g. user annotations, number of people accessing a particular resource)
I think it is only fair to point out to archivist and to librarians that linguists and language documenters do not see a difference between descriptive and non-descriptive metadata in their workflows. That is sometimes we want to search all the corpora by licenses or by a technical attribute. This elevates the these attributes to the function of discovery metadata. It does not remove the function of descriptive metadata from its role in finding things but it does functionally mean that the other metadata is also viable as discovery metadata.
Over the last several months I have been looking for and comparing digitization services for audio, film, and for images (slides and more). I have been doing this as part of the ongoing work at the Language and Culture Archive to preserve the linguistic and cultural heritage of the people groups SIL International has encountered and served. I have not come to any hard and fast conclusions on “what is the best service provider”. This is partially because we are still looking at various out sourcing options and looking at multiple mediums is time consuming. Then there is also the issue of looking for archival standards and the creation of corporate policy for the digitization of these materials. I am presenting several names here as the results of several searches for digitization services providers.
Another option the Archive has been looking at is to determine if the the quantity of the work is cost prohibitive to have professional done. Meaning that, we would be better served by buying the equipment and doing the work in house. So in the process I have also been looking at people’s experience with various kinds of equipment and technology used in scanning.
I have been reading a lot of user stories like Dave Dyer’s reflection on Slide Transfer and MacIntouch Reader Reports from 26 April 2006 on Slide Digitization.
For a couple of months I have been looking into options for presenting academics' CVs on the web in semantic xHtml. Of the options out there hResume rises to the surface. There are several reasons for this:
Popularity of hResume in presenting CV's and Resumes.
Microformats are about the interoperability of data through semantic markup - Academics generally want people to cite them, and resume publishers usually hope to have resume users and readers.
Semantic markup of content allows for the semantic styling of content.
The largest challenge in implementing academic CVs in hResueme versus business resumes in hResume format is citations from publications and presentations along with that is the semantic markup of citations with standards like COinS (though COinS might not be true Semantic Markup). However, there are other challenges too. For instance how to categorize the sections of a CV. I work mostly with linguists and with the CV sections that linguists use. Therefore, I may be missing some crucial section of a CV as used by another academic discipline.CVs like resumes are unique to each individual so these categories are an abstraction and not all sections will be in every CV. These abstractions are included the following chart along with a mapping of how these sections are (in my opinion) best expressed in the hResume microformat. hResume builds on other microformats, like hCalendar and hCard. So I have also mapped the elements of an hResume back to the building block microformat (per this list on microformat's website). These dependency formats are also presented. In the last column I have presented some remarks specific to that section.
Sections of a Linguist's CV
Sections in hResume
inclusion in hResume
Building block
Microformat status
Outstanding issue or question
Contact info
Contact info
Obligatory
must use hCard; should use <address> + hCard
hCard is a Recommendation
how does the adr and the geo relate to the contact info or hCard?
Personal Info
Not designated
What would fit into this section which would not fit into the Contact info section? (married status) But that might be able to be expressed through XFN, unless the spouse is unnamed.
One or more hcalendar events with the class name 'education', with an embedded hCard indicating the name of school, address of school etc.
hCalendar is a Recommendation
Education Abroad (Could be considered a sub-category of "Education")
Education
One or more hcalendar events with the class name 'education', with an embedded hCard indicating the name of school, address of school etc.
hCalendar is a Recommendation
Research interests
Not designated.
One could argue that this might be related to "skills", or marked with the rel-tag format.
Positions Held
Experience
Optional
One or more hcalendar events with the class name 'experience', with an embedded hCard indicating the job title, name of company, address of company etc.
hCalendar and hCard
The [hResume] draft should describe a way to handle a series of assignments at various employers within the context of one job working for a contracting, consulting, or temporary firm/agency. per mfreeman (2009)
Field Work
Experience
Optional
One or more hcalendar events with the class name 'experience', with an embedded hCard indicating the job title, name of company, address of company etc.
hCalendar and hCard, (my recommendation is to also consider using hGeo)
Field Work often has a Geo-Location and a language involved so I am not sure if it shouldn't also be marked up with hGeo and some rel-tag to the language.
Embedding hCard for job title leads to ambiguities. per TobyInk (2010)
Awards & Honors
Not Designated
Support for Awards and for Service sections are not currently implemented. per jeffmcneill (2007)
Grants Received
Not Designated
This might be considered simular to Awards and Honors. But in most CVs I have seen it is given its own section level.
Publications
publications
Optional
A lot of work has gone into description or a hCite type of format. But nothing has evolved yet. To this end I have resolved myself to using CoinS. Although the official recomendation is to use the <cite> tag.
Peer Reviewed
publications
Optional
Articles (PR)
publications
Optional
Chapters (PR)
publications
Optional
Books (PR)
publications
Optional
Monographs (PR)
publications
Optional
Edited Volumes (PR)
publications
Optional
Not Peer Reviewed
publications
Optional
Articles (NPR)
publications
Optional
Chapters (NPR)
publications
Optional
Books (NPR)
publications
Optional
Papers (NPR)
publications
Optional
Presentations
Not Designated
Generally these are cited like a publication but put in their own section.
Invited Talks
Not Designated
Generally these are cited like a publication but put in their own section.
Dissertations and Thesis supervised.
Not designated (but possibly like publications)
Generally these should be treated like publications.
Professional Associations
Affiliations
Optional
The class name affiliation along with an hcard of the organization.
I am not clear on how XFN can be used in this context. But it seems that this is the sort of thing that XFN was created for. There is also still the same objection as mentioned by TobyInk (2010) because there is no way to tell who the primary hCard on the page referes to.
[table-info field “abbreviations_used” not found in table “11” /]
Last year I wrote about Selected Works™ & BePress because I was looking at how SIL International might best display the professional abilities of their personnel. This means putting their CV’s and past project activity in an accessible portfolio. I have also been looking at apps like Bibapp, which pulls info from DSpace. Since sil.org is looking at Drupal as a CMS I recently ran across Open Scholar, with an example by harvard.
Because I have been on the team doing the SIL.org redesign, I have been looking at the Open Source landscape looking at what is available to connect Drupal with DSpace data stores. We are planning on making DSpace the back-end repository, with another CMS running the presentation and interactive layers. I found a module which parses DSpace's XML feeds in development. However, this is not the only thing that I am looking at. I am also looking at how we might deploy Omeka. Presenting the entire contents of a Digital Language and Culture Archive, and citations for their physical contents is no small task. In addition to past content there is also future content. That is to say archiving is also not devoid of publishing - so there is also the PKP project [sic redundant]. (SIL also currently has a publishing house, whose content need CSV or version control and editorial workflows, which interact with archiving and presentation functions.)
Omeaka
Wally Grotophorst has a really good reflection on Omeaka and DSpace, I am not sure that it is current but it does present the problem space quite well. Tom Scheinfeldt at Omeka also has a nice write up on why Omeka exists, titled "Omeka and It's peers". It is really important to understand Omeka's place in the eco system of content delivery to content consumers by qualified site administrators.
Over the past several months I have been wrestling with academic expression on the web. I have been trying to think through what it should look like. What do I want my footprint to be? How do I want to participate in the discussions I am involved in? Part of the struggle has been with content distribution v.s. content publishing. In using the web as a content distribution platform the web technology question looks more like "how are we going to arrange these PDFs". Where as the web publishing question looks more like blog posts published directly to web browser oriented venues. Academic writing has traditionally been the written discussion between professionals in various pursuits of life. But as the web has shaped how we communicate academia must (and is) consider how it is going to participate in the discussion. If social media and its various forms are where the discussion is happening then how is academia going to stay relevant or connected? This is most relevant in the area of citations and links. These questions are not just relevant for the individual but are also relevant to academic institutions like, SIL International, Linguistic Society of America, Academy of the Sciences, etc.
Should be considering is that in a world where academic writing is reduced, where is their place. The LSA has a journal, Language, which I enjoy reading. But are they the center of academic thought that they once were? is their presentation of knowledge really the medium of use today?
While I agree that the web is radically changing the way information is decimated. I doubt that the structure of argumentation will change. We may have to find new ways of expressing the points of the argument but an argument will still have points. So, till our professors stop making us write papers, and allow us to tweet our contradictions, and assertions of scientific fact.... How to build an argument and how to write a paper are still important.
I have come across some interesting resources. One of them reminded me of something taught in my undergraduate degree. My philosophy professor made us learn an outline for paper writing which has proven most helpful.
Here is the original outline
1:Issue:What is integration
2. Position: for integration
+3. Argument 1: social pluses -4. Objection 1 social negatives +5. Reply 1: with out the negatives of life are we really preparing students for life
-15. Objection to the position: Separation -16. Argument for the objection to the position: separation is necessary for lower salaries on school budgets +17. Objection to the objection to the position: -18. Reply to the Argument for the objection to the position: Even with a higher budgets on salaries lowering the student to teacher ratio and paying more would help all student overall.
This argument still needed to demonstrate the dichotomy of a paragraph.
Introduction :: why should the reader read this? - the grab. Definition :: What are you talking about? Relationship to higher-level thought :: how does this relate to what the reader knows? Conclusion :: what does your claim imply? Transition :: What question does this lead us to ask?In this outline he showed that one needs a
Proposition
Some supporting elements
Some supporting elements
Some supporting elements
Then to strengthen the argument a counter proposition is needed.
One could choose to be very crafty and make the counter argument a counter to one of the supporting elements of the original proposition. But regardless of the quality of the counter proposition, it still needs several supporting elements.
Element supporting counter position
Element supporting counter position
Element supporting counter position
Then the author needs some discourse to deconstruct the counter supporting elements and explain why they are not valid contradictions supporting the counter position. During this discourse the opposing opinion is clearly presented. Eventually, this discourse will then refute the counter proposition. At which time a second counter proposition is needed.
Second counter proposition
Element supporting counter position
Element supporting counter position
Element supporting counter position
More discourse.... and the process repeats itself until a point is proven or considered well laid out.
The importance of the point in explaining the opposing side better than the opposing side can, was recently brought back into focus as I read a post by Nagesh Belludi . Recently I have also encountered several works of interest regarding academic discourse. The following presentation from Beyond the PDF Has a really good break down in the first 10 minutes of the presentation on the discourse structure of an academic paper.
From time to time, I read an academic paper, or journal article which really shines. It is engaging, it tells a compelling story, presents new insights and knowledge, and it brings me to a new conclusion or awareness of my surroundings.
I recently had the pleasure of reading a paper by Alexandre François, on some phonology aspects of a language he was doing research in. He did a marvelous job at presenting an issue, the evidence to be considered, and then also the propositions and the objections. He brought the reader with him as he explained the issues. The level of background knowledge needed was minimized, yet this work was not focused on presenting just the background issues and story. It is a recommended read if you are interested in phonology sorts of things, but also if you are interested in looking at the presentation of argumentation.
I have recently been reading the blog of Martin Fenner and came upon the article Personal names around the world . His post is in fact a reflection on a W3C paper on Personal Names around the WorldSeveral other reflections are here: http://www.w3.org/International/wiki/Personal_names (same title). This is apparently coming out of the i18n effort and is an effort to help authors and database designers make informed decisions about names on the web.
I read Martin’s post with some interest because in Language Documentation getting someone’s name as a source or for informed consent is very important (from a U.S. context). Working in a archive dealing with language materials, I see lot of names. One of the interesting situations which came to me from an Ecuadorian context was different from what I have seen in the w3.org paper or in the w3.org discussion. The naming convention went like this:
The elder was known by the younger’s name plus a relationship.
My suspicion is that it is a taboo to name the dead. So to avoid possibly naming the dead, the younger was referenced and the the relationship was invoked. This affected me in the archive as I am supposed to note who the speaker is on the recordings. In lue of the speakers name, I have the young son’s first name, who is well known in the community, and is in his 30’s or so, and I have the relationship. So in English this might sound like John’s mother. Now what am I supposed to put in the metadata record for the audio recordings I am cataloging? I do not have a name but I do have a relationship to a known (to the community) person.
I inquired with a literacy consultant who has worked in Ecuador with indigenous people for some years, she informed me that in one context she was working in everyone knew what family line they were from and all the names were derived from that family line by position. It was of such that to call someone by there name was an insult.
It sort of reminds me of this sketch by Fry and Laurie.
httpvh://youtu.be/hNoS2BU6bbQ
Jetpack is in no way new… But I have never installed it (it seems that half a million other people have though). The only service I have used from Automattic is akismet. Then about a month ago I installed after the dead line as a Google Chrome plugin to help me with my spelling mistakes. It seemed to work so I thought I would give it a go as a WordPress plugin.
What was new was that I had not integrated a sharing solution for readers of my blog. So as of now there is a share this option at the end of my posts.
Sharing options
Of course Sharedaddy, the sharing plugin did not have a Google +1 sharing option, nor del.ic.ious sharing option. So I had to find some solutions. I found a fork of Sharedaddy on github which had added Google+ and LinkedIn. (I am not on Google+ but I just joined LinkedIn last week as I was redoing my resume).
To add delicious I followed a post by Ryan Markel to find the right share service URLs.
Menus
The other thing I figured out this week was how to use the Menus Feature under the Appearance tab. I have been using K2 since 2005 and have always thought that the menus in the default theme were sufficient. I have usually not had complex menu desires. So there was no real need to learn these new features, however. Now I wanted to put several picture pages under the same menu. So wal-la. It is done now.
New menu settings
Others
(Mostly RDFa and HTML5)
I also have a plugin that is adding Open Graph RDFa tags to my theme. My current version of K2 is HTML5 but, it is not validating with the RDFa tags in it. So I was trying to validate them but have not been successful. I looked at this answer which said to add something to the doctype. But then there is more answers too. Sometimes these answers are beyond me. I which I had some structured learning in this subject area.
Why RDF?
And RDFa is the basis of Open Graph, the technology used to sync FaceBook Likes between my site and FaceBook.
I have had some ideas I wanted to try out for using the iPad as a tool for collecting photo metadata. Working in a corporate archive, I have become aware of several collections of photos without much metadata associated with them.
The photos are the property of (or are in the custodial care of) the company I work at (in their corporate archive).
The subject of the photos are either of two general areas:
The minority language speaking people groups that the employees of a particular company worked with, including anthropological topics like ways of life, etc.
Photos of operational events significant to telling the story of the company holding the photos.
Archives in more modern contexts are trying to show their relevance to not only academics, but also to general members of communities. In the United States there is a whole movement of social history. There are community preservation societies which take on the task of collecting old photographs and their stories and preserving, and presenting them for future generations.
The challenge at hand is: "How do we enrich photos by adding metadata to photos in the collections of archives?" There are many solutions to this kind of task. The refining, distilling, and reduction of stories and memories to writing and even to metadata fields is no easy task, nor is it a task that one person can do on their own. One solution, which is often employed by community historians is the personal interview. By interviewing the photographers or people who were at an event and asking them questions about a series of photos it presents an atmosphere of inquisitiveness and one where the story-teller is valued because they have a story-listener. This basic personal connection allows for interactions to occur across generational and technological barriers.
The crucial question is: "How do we facilitate an interaction which is positive for all the parties involved?" The effort and thinking behind answering this question has more to do with shaping human interactions than with anything else. We are also talking about using technology in this interaction. This is true UX or (User Experience).
Past Experience
This past summer I have had several experiences with facilitating one-on-one interactions between knowledgeable parties working with photographs and with someone acting on behalf of the corporate archive. To facilitate this interaction a GoogleDoc Spreadsheet was set up and the person acting on the behalf of the archive was granted access to the spreadsheet. The individual conducting the interview and listening to the stories brought their own netbook (small laptop) from which to enter any collected data. They were also given a photo album full of photos, which the interviewee would look through. This set-up required overcoming several local environmental challenges. As discussed below, some of these challenges were better addressed than others.
Association of Data to a Given Photo
The challenge of keeping up to 150 photos organized durring an interview so that metadata about any given photo could be collected and associated with only that photo. This was addressed by adhering an inventory sticker to the back of each photo and assigning each photo a single row in the GoogleDoc Spreadsheet. Using GoogleDocs was not the ideal solution, but rather than a solution of some compromises:
Strengths of GoogleDocs
One of the great things about GoogleDocs is that the capability exists for multiple people to edit the spreadsheet simultaneously.
Another strength of GoogleDocs is that there is a side bar chat feature so that if there is a question durring the interview that help could be had very quickly from management (me, who was offsite).
The Data can be exported in the following formats: .xlsx , .xls , .csv , .pdf.
There was no cost to deploy the technology.
It is accessible through a web-browser in an OS neutral manner.
The document is available wherever the internet is available.
A single solution could be deployed and used by people digitizing photos, recording written metadata on the photos, and gathering metadata during an interview.
Most people acting on behalf of the archive were familiar with the technology.
Pitfalls of GoogleDocs
More columns exist in the spread sheet than can be practically managed (The columns are presented below in a table). There are about 48 values in a record and there are about 40,000 records.
More columns than can be practically managed
Does not display the various levels of data as levels of data as levels in the user interface.
Cannot remove unnecessary fields from the UI of various people. (No role-based support.)
Only available when there is internet.
Maximizing of Interview Time
To maximize time spent with the interviewee the photos and any metadata written or known about a photo was put into the GoogleDoc Spreadsheet prior to the interview. Sometimes this was not done by the interviewer but rather by someone else working on behalf of the archive. Durring the interview the interviewer could tell which data fields were empty by looking for the gray cells in the spreadsheet. However, just because the cells were did not mean that the interviewee was more prone to provide the desired, unknown, information.
Grey Areas Show Metadata fields which are empty
Data Input Challenges
One unanticipated challenge which was encountered in the interviews was that as the interviewer would bring out an album or two of photos that the interviewees would be able to cover more photos than the interviewer could record.
Let me spell it out. There is one interviewer and two interviewees there are 150 photos in an album lying open on the table. All three participants are looking at the photo album. The interviewee A says look that is so-and-so and then interviewee B (because the other page is closer to them) says and this is so-and-so! This happens for about 8 of the 12 facing photos. Because the interviewer is still typing the first name mentioned they ask and when do you think that was? But the metadata still comes in faster, as the second interviewee did not hear the question and the first one did but still thinking. The bottom line is that more photos are viewed and commented on faster than can be recorded.
Something that could help this process would be to in some way to slow-down (or moderate) the ability of the interviewee(s) to access the photos. Something that could synchronize the processing times with the viewing times. By scanning the photos and then displaying them on a tablet it slows down the viewing process and integrates the recording of data with the viewing of photos.
Positional Interaction Challenges
An interview is, at some level, an interaction. One question which comes up is How does the technology used affect that interaction? What we found was that a laptop usually was situated between the interviewer and the interviewees. This positioned the parties in an apposing manner. Rather than the content becoming the central focus of both parties, the content was either in front of the interviewer or in front of the interviewees. A tablet changes this dynamic in the interaction. It brings both parties together over a single set of content, both positionally and cognitively. When the photo is displayed on the laptop, the laptop has to be rotated so that the interviewees can see the image and then turned so that the interviewer can input the data. This is not the case for a tablet.
Content Management Challenges
When Paper is used for collecting metadata it is ideal to have one piece of paper for each photo. Sometimes this method is preferable to using a single computer. I used this method when I had a photo display and about 20 albums and about 200 people all filling out details at once.
People filling out metadata forms infront of a photo display.
People came and went as they pleased. When someone recognized someone or someplace they knew, they wrote down the picture ID and the info they were contributing along with their name. However, carrying around photo albums and paper there is the challenge of keeping all the photos from getting damaged, and maintaining the order of the photos and associated papers.
Connectivity Challenges
When there is no internet there is no access to GoogleDocs. We encountered this when we went to someone's apartment, expecting interent because the interent is available on campus and this apartment was also on campus. Fortunately we did have a back up plan and paper pen was used. But this means that we now had to type out the data, which was written down on the paper; in effect doing the same recording work twice.
Size of Devices
Photo albums have a certain bulk and cumbersome-ness which is multiplied when carrying more than one album at a time. Add to this a computer laptop and one might as well add to the list of required items, a hand truck with which to carry everything. A tablet is all in all a lot smaller and lighter.
Laptop and TabletThis image is credited to Alia Haley
Proof of Concept Technology
As I mentioned before, I had an iPad in my possession for a few days. So to capitalize on the opportunity, I bought a few apps from the app store, as I mentioned that I would and tried them out.
Software which does not work for our purposes
Photoforge2
The first app I tried was Photoforge2. It is a highly rated app in the app store. I found that it delivered as promised. One could add or edit the IPTC and EXIF metadata. One could even edit where the photo was taken with a pin drop interface.
iPad Fotoforge Location Data
iPad Fotoforge Metadata Editor
Meta Editor
Meta Editor, another iPad app, which was also highly acclaimed performed task almost as well. Photoforge2 had some photo editing features which were not needed in our project. Whereas Meta Editor was focused only on metadata elements.
MetadataEditor Location Data
After using both applications it became apparent that neither would work for this project for at least two reasons:
Both applications edit the Standards based IPTC and EXIF metadata fields in photos. We have some custom metadata which does not fit into either of these fileds.One aspect of the technology being discussed, which might be helpful for readers to understand, is that these iPad applications actually embed the metadata into the photos. So when the photos are then taken off of the iPad the metadata travels with them. This is a desirable feature for presentation photos.
Even if we do embed the metadata with these apps the version of the photo being enriched is not the Archival version of the photo it is the Presentation version of the photo. We still need the data to become associated with the archival version of the photo.
Software with some really functional features
So we needed something with a mechanism for capturing our customized data. Two options were found which seemed to avail themselves as suitable for the task. One was ideal the other rapidly deployable. Understanding the iPads' place in the larger place of corporate architecture, relationship to the digital repository, the process of data flow from the point of collection to dissemination, will help us to visualize the particular challenges that the iPad presents solutions for. Once we see where the iPad sits in relationship to the rest of the digital landscape I think it will be fairly obvious why one solution is ideal and the other rapidly deployable.
Placement in the iPad in the Information Architecture Picture
In my previous post on Social Metadata Collection I used the below image to show where the iPad was used in the metadata collection process.
Meta-data Collection Model
Since that time, as I have shown this image when I talk about this idea, I have become aware that the image is not detailed enough. Because it is not detailed enough it can lead to some wrong assumptions on how the iPad use being proposed actually works. So, I am presenting a new image with a greater level of detail to show how the iPad interacts with other corporate systems and workflows.
iPad Team as they fit with other digital elements.
There are several things to note here:
Member Disporia as represented here is not just members, it is their families, the people with whom these members worked, it is the members currently working and it the members living close at hand on campus, not just in disporia.
It is a copy of the presentation file which is pushed out to the iPad or the website for the Member Disporia. This copy of the file does not necessarily need to be brought back to the archive as long as the metadata is synced back appropriately.
The Institutional Repository for other corporate items is currently in a DSpace instance. However, it has not been decided for sure that photos will be housed in this same instance, or even in DSpace.
That said, it is important that the metadata be embedded in the presentation file of the image, as well as accessible to the Main container for the archival of the photos. The metadata also needs to sync between the iPad application and the Member Diaspora website. Metadata truly needs to flow through the entire system.
FileMaker Pro with File Maker Go
FileMaker Pro is a powerful database app. It could drive the Member Disporia website and then also sync with the iPad. This would be a one-stop solution and therefore and ideal solution. It is also complex and takes more skill to set up than I currently have, or I can currently spare to acquire. Both FileMaker Pro and its younger cousin Bento enable Photos to be embedded in the actual database.Several tips from the Bento forums on syncing photos which are part of the database: Syncing pictures from Bento-Mac to Bento-iPad Sync multiple photos or files from desktop to IPad This is something which is important with regards to syncing with the iPad. To the best of my knowledge (and googling) no other database apps for the iPad or Android platforms allow for the syncing of photos within the app.
Rapid reuse of data. Because the interview process naturally lends itself to eliciting the same kind of data over a multitude of photos a UX/UI element which allows the rapid reuse of data would be very practical. The kinds of data which would lend themselves to rapid reuse would be peoples' names, locations, dates, photographer, etc. This may mean being able to query a table of already input'd data values with an auto-suggest type function.
Custom iPad App
Of course there is also the option to develop a custom iPad app for just our purposes. This entails some other kinds of planning, including but not limited to:
Custom App development
Support plan
Deploy or develop possible Web-backend - if needed.
Kinds of custom metadata being collected.
The table in this section shows the kinds of questions we are asking in our interviews. It is not only provided for reference as a discussion of the Information Architecture for the storage and elements of the metadata schema is out of the scope of this discussion. The list of questions and values presented in the table was derived as a minimal set of questions based on issues of Image Workflow Processing, Intelectual Property and Permissions, Academic Merit and input from the controlled vocabulary's Caption and Keywording Guidelines which is part of their series on metalogging. The table also shows corresponding IPTC, and EXIF data fields. (Though they are currently empty because I have not filed them in.) Understanding the relationships of XMP, IPTC, and EXIF also help us to understand why and how the iPad tool needs to interact with other Archiving solutions. However, it is not within the scope of this post to discuss these differences.Some useful resources on these issues are noted here:
Photolinker Metadata Tags has a nice display outlining where XMP, IPTC and EXIF data overlap. This is not authoritative, but rather practical.
List of IPTC fields: List of IPTC fields. However, a list is not enough we also need to know what they mean so that we know that we are using them correctly.
EXIF and IPTC Header Comments. Here is another list of IPTC fileds. This list also includes a list of list of EXIF fileds. (Again without definitions.)
It is sufficient to note that there is some, and only some overlap.
Overlap of EXIF, XMP, and IPTC metadata for images.This image is taken from the Metadata Working Group's Guidelines for Handling Image Metadata.
Metadata Element
Purpose
Explanation
Doublin Core
IPTC Tags
EXIF Tags
Photo Collection
This is the name of the collection in which the photos reside
Sub Collection
This is the name of the sub collection in which the photos reside
Letter of Collection
Each collection is given an alpha character or a series of alpha characters, if the collection pertains to one people group then the alpha characters given to that collection are the three digit ISO 639-3 code
Who input the Meta-data
This is the name of the person inputting the metadata
Photo Number
This is the number of the photo as we have inventoried the photo
Negative Number
This is the number of the photo as it appears on the negative (film strip)
Roll
This is the ID of the Roll
Most sets of negatives are cut into strips of 5 or less this allows us to group these sets together to ID a “set” of photos
Section Number
If the items are in a book or a scrap book and that scrap book has a section this is where that is recoreded
Page#
If a scrap book has a set of pages then this is where they are recoreded
Duplicates
This is where the Photo ID of a duplicate item is referenced.
Old Inventory Number(s)
This is the inventory number of an item if it were part of another invenotry system
Photographer
This is the name of the photographer
Subject 1 (who)
Who is in the photo, this should be an unlimited field. That is sveral names should be able to be added to this.
Subject 2
Who is in the photo, this should be an unlimited field. That is sveral names should be able to be added to this.
Subject 3
Who is in the photo, this should be an unlimited field. That is sveral names should be able to be added to this.
Subject 4
Who is in the photo, this should be an unlimited field. That is sveral names should be able to be added to this.
Subject 5
Who is in the photo, this should be an unlimited field. That is sveral names should be able to be added to this.
People group
This is the name of the people group meneined in the ISO 639-3 codes
ISO 639-3 Code
This is the ISO 639-3 code of the people group being photographed
When was the photo Taken?
The date the photo was taken
Country
The country in which the photo was taken
District/City
This is the City where the photo was taken
Exact Place
The exact place name where the photo was taken
What is in the Photo (what)
This is an item in the photo
What is in the Photo
Additional what is in the photo
What is in the Photo
Addtional what is in the photo
Why was the Photo Taken?
This is to help metadata providers think about how events get communicated
Description
This is a description of the photo’s contents
This is not a caption but could be used as a caption
Who Provided This Meta-Data? And when?
We need to keep track of who is the source of certain metadata to understand its authority
Who Provided This Meta-Data? And when?
We need to keep track of who is the source of certain metadata to understand its authority
Who Provided This Meta-Data? And when?
We need to keep track of who is the source of certain metadata to understand its authority
Who Provided This Meta-Data? And when?
We need to keep track of who is the source of certain metadata to understand its authority
Who Provided This Meta-Data? And when?
We need to keep track of who is the source of certain metadata to understand its authority
I am in this photo and I approve it to be on the internet. Put in "yes" or "No" and write your name in the next column.
Permission to distribute
Name:
Name of the person releasing the photo
How was this photo digitized?
Method of digitization and the tools used in digitization
Who digitized This photo
This is the name of the person who did the digitization
The importance of knowing about the Datum recently came to my attention as I was working with GIS data on a Language Documentation project. We were collecting GPS coordinates with a handheld GPS unit and comparing these coordinates with data supplied by the national cartographic office. End goal was to compare data samples collected with conclusions proposed by the national cartographic office.
So, what am I talking about?
GIS data is used in a Geographical Information System. Basically, you can think of maps and what you might want to show with a map: rivers, towns, roads, language features, dialect markers, etc. Well, maps are shapes superimposed with a grid. And coordinates are a way of naming where on a particular grid a given point is located.