Category Archives: Meta-data
The Data Management Space for Linguists
This week I have been outlining the types of data that linguists need to be able to use and relate to each other as they do Language Documentation and Linguistic Research. I try to express these things graphically and then also express where some of the leading tools which SIL International is offering sit in the problem space.
Useful or Not?
This post is a open draft! It might be updated at any time... But was last updated on < ?php the_modified_date() ?> at < ?php the_modified_time()?>.
The online version of the SIL Bibliography contains a subset of over 29,000 citations from the more than 40,000 publications representing 75 years of SIL International's language research in over 2,700 languages.
Finding Resources through SIL.org's (as of 2 August 2012) Bibliography can be a challenge at times - Maybe even a time-wasting endeavor. Time wasting because it might not be very useful to consult the online Bibliography.
The challenging aspect which affects usefulness is primarily three fold:
- Items known by SIL to have been created by SIL staff may or may not be listed. (The on-line Bibliography is a sub-set.)
- Items listed in the Bibilography may or may not have digitally accessible resources.
- Items created by SIL staff may or may not be in the bibliography because they have not been submitted to the Language and Culture Archive (managing division of the SIL Bibliography).
The Citation Problem
This post is a open draft! It might be updated at any time… But was last updated on < ?php the_modified_date() ?> at < ?php the_modified_time()?>.
In a team framework where there are several members of a research team and the job requirements call for the sharing of bibliographic data (of materials referenced) as well as the actual resources being referenced. In this environment there needs to be a central repository for sharing both kinds of data. This is true for small localized (geographically) groups as well as large distributed research teams. New researchers joining a existing team need to be able to “plug-in” to existing foundational work on the project and be able to access bibliographic data as well as the resources those bibliographic details point to. It is my point here to outline some of the current challenges involved in trying to overcoming the collaborative obstacle when working in the fields of Linguistics and Language Documentation.1
Bibliographic Data v.s Citation Data
Notes
- ↑1 This sentiment is echoed by many in the world of science. Here is someone on Zetero’s forums [INSERT LINK]. (Though Zetero does claim to combat some of these issues.)
Socio-linguisitc Profiles for Language Documentation
Some researchers in linguistics (in my acquaintance) have been less than excited about the notion of asking for socio-linguistic data or socio-personal data from language informants. The objection has been that it is just bad form. While I am a great advocate of personal privacy (especially in digital formats), I see that one of the most informative parts of the language documentation process is understanding who the speakers being recording or being worked with are. Language variation is fundamentally connected with identity. While crucial elements of how a community segments itself along identity lines may not be known for several years, having a robust socio-cultural or socio-personal questionare about the language informants will later help place the documentation data in perspective of the larger waves of variation in the community.
This is to say, I am thoroughly convinced that a socio-linguistic questionare is important as part of the language documentation process. It might not need to be done first, but it will help researchers and future users of archived material understand where to place these speech samples in context of that speakers society.
The outstanding question, and one with a variable answer is how to appropriately approach the questions in the questionare. Should the questionare be approached formally? Or should it be asked in conversational format? Should it be elicited digitally? One of the interesting things about eliciting things digitally is that they may have the appearance to be less intrusive because they are less formal. While I have no empirical evidence based on years of cross cultural work, I do have the Facebook phenomena. That is minority language users all over the world are using Facebook. And Facebook is collection (and allowing the users to volunteer) and then verifying the users’ provided data.
Below is a list of elements which Facebook is collecting (it is also collecting log-in locations and times). So, some of these questions are certainly in-scope of what language documenters would minimally like to know about their indigenous language speaking informants and collaborators. Others of these questions are certainly not in-scope for the recommended socio-linguistic profile from language documenters or socio-linguists.
FaceBook data catagories on user profiles.
Data Facebook Collects about users through their profile and activities.from: https://www.facebook.com/help/326826564067688 on 23 August 2012.
What info is available? | What is it? | Where can I find it? |
|---|---|---|
| About Me | Information you added to the About section of your timeline like relationships, work, education, where you live and more. It includes any updates or changes you made in the past and what’s currently in the About section of your timeline. | Activity Log |
| Account Status History | The dates when your account was reactivated, deactivated, disabled or deleted. | Expanded Archive |
| Address | Your current address or any past addresses you had on your account. | Expanded Archive |
| Alternate Name | Any alternate names you have on your account (ex: a maiden name or a nickname). | Expanded Archive |
| Apps | All of the apps you subscribe to. | Expanded Archive |
| Birthday Visibility | How your birthday appears on your timeline. | Expanded Archive |
| Chat | A history of the conversations you’ve had on Facebook Chat. | Downloaded Info |
| Check-ins | All of the places you’ve checked into. | Downloaded Info Activity Log |
| Connections | The people who have liked your Page or Place, RSVPed to your event, installed your app or checked in to your advertised place within 24 hours of viewing or clicking on an ad or Sponsored Story. | Activity Log |
| Currency | Your preferred currency on Facebook. If you use Facebook Payments, this will be used to display prices and charge your credit cards. | Expanded Archive |
| Current City | The city you added to the About section of your timeline. | Downloaded Info |
| Date of Birth | The date you added to Birthday in the About section of your timeline. | Downloaded Info |
| Deleted Friends | The people you’ve unfriended. | Expanded Archive |
| Education | Any information you added to Education in the About section of your timeline. | Downloaded Info |
| Emails | Email addresses added to your account (even those you may have removed). | Expanded Archive |
| Events | Events you’ve joined or been invited to. | Activity Log |
| Family | Friends you’ve indicated are family members. | Expanded Archive |
| Favorite Quotes | Information you’ve added to the Favorite Quotes section of the About section of your timeline. | Downloaded Info |
| Friend Requests | Pending sent and received friend requests. | Expanded Archive |
| Friends | A list of your friends. | Downloaded Info |
| Gender | The gender you added to the About section of your timeline. | Downloaded Info |
| Groups | A list of groups you belong to on Facebook. | Downloaded Info |
| Hidden from News Feed | Any friends, apps or pages you’ve hidden from your News Feed. | Expanded Archive |
| Hometown | The place you added to hometown in the About section of your timeline (profile). | Downloaded Info |
| IP Addresses | A list of addresses where you’ve logged into your Facebook account. | Expanded Archive |
| Last Location | The last location associated with an update. | Activity Log |
| Likes on Other’s Posts | Posts, photos or other content you’ve liked. | Activity Log |
| Likes on Your Posts from others | Likes on your own posts, photos or other content. | Activity Log |
| Likes on Other Sites | Likes you’ve made on other sites off of Facebook. | Activity Log |
| Locale | The language you see on Facebook is based on where you’re located. | Expanded Archive |
| Logins | IP address, date and time associated with logins to your Facebook account. | Expanded Archive |
| Logouts | IP address, date and time associated with logouts from your Facebook account. | Expanded Archive |
| Messages | Archive of messages you’ve sent and received on Facebook. | Downloaded Info |
| Name | The name on your Facebook account. | Downloaded Info |
| Name Changes | Any changes you’ve made to the original name you used when you signed up for Facebook. | Expanded Archive |
| Networks | Networks (affiliations with schools or workplaces) that you belong to on Facebook. | Expanded Archive |
| Notes | Any notes you’ve written and published to your account. | Activity Log |
| Notification Settings | A list of all your notifications and whether you have email and text enabled or disabled for each. | Expanded Archive |
| Pages You Admin | A list of pages you admin. | Expanded Archive |
| Phone Numbers | Mobile phone numbers you’ve added to your account. | Expanded Archive |
| Photos | Any photos you’ve uploaded to your account. | Downloaded Info |
| Physical Tokens | Badges you’ve added to your account. | Expanded Archive |
| Pokes | A list of who’s poked you and who you’ve poked. | Expanded Archive |
| Political Views | Any information you added to Political Views in the About section of timeline. | Downloaded Info |
| Your Posts | Anything you posted to your own timeline, like photos, videos and status updates. | Activity Log |
| Posts by Others | Anything you posted to someone else’s timeline (profile), like photos, videos and status updates. | Activity Log |
| Recent Activities | Actions you’ve taken and interactions you’ve recently had. | Activity Log |
| Registration Date | The date you joined Facebook. | Activity Log |
| Religious Views | The information you added to Religious Views in the About section of your timeline. | Downloaded Info |
| Screen Names | The screen names you’ve added to your account, and the service they’re associated with. You can also see if they’re hidden or visible on your account. | Expanded Archive |
| Searches | Searches you’ve made on Facebook. | Activity Log |
| Spoken Languages | The languages you added to Spoken Languages in the About section of your timeline. | Expanded Archive |
| Status Updates | Any status updates you’ve posted. | Activity Log |
| Subscribers | A list of people who are subscribed to you. | Expanded Archive |
| Subscriptions | A list of people you subscribe to. | Activity Log |
| Tag Suggestions Template | A unique number based on a comparison of the photos you're tagged in. We use this template to help your friends tag you in the photos they upload. | Expanded Archive |
| Work | Any information you’ve added to Work in the About section of your timeline. | Downloaded Info |
| Videos | Videos you’ve posted. | Activity Log |
Grants being aggregated in OLAC Search results
I have been doing some thinking about what would make OLAC search more valuable to its current users and to its targeted users. One of the things which would make it more useful would be if the NSF, a partial funder for OLAC and OLAC search, would aggregate its language related grants, scholarships, fellowships and awards through OLAC.
Some of these Grant proposals are really well written, and well cited documents which explain a certain snapshot of the language situation. Even the announcements that a grants like From Endangered Language Documentation to Phonetic Documentation has been awarded would allow other researchers to know that someone has applied or been awarded a block of funding to work on a particular language situation.
I was particularly happy to find that NSF does have a grant offering and grant awarded search section. But aggregating this knowledge with prior research would really give interested parties in particular languages the integrated perspective.
The role of relationships in an data centric industry
iPhone geo-data
I have been playing around with data available from the iPhone (and also separately visualizing Map data).
I came across a project, iPhoneTracker which was done to show iPhone users the kind of data that the iPhone collects about a users travel and whereabouts. I downloaded the app and ran it. Looks like about a complete history since I activated the phone… The interesting thing for me was that this app did not collect the data from my phone directly but rather from my computer.
DOIs and URLs same or different?
A document’s DOI (http://www.doi.org/ or on Wikipedia under Digital Object Identifier) is an important part of the citation of a document. Many style sheets allow for just the DOI of a paper as the citation. Because DOIs are unique they can act as URIs which are resolvable and look like URLs. However, a DOI is different than a URL for where a digital object might be located. It might be well argued that a DOI should be tracked in the metadata schemes of archives which collect language and linguistic data.
Continue reading
From Folksonomies to Taxonomies with Linguistic Metadata
This post is a open draft! It might be updated at any time... But was last updated on < ?php the_modified_date() ?> at < ?php the_modified_time() ?>.
Metadata is very important - Everyone agrees. However, there is some discussion when it comes to how to develop metadata and also how to ensure that the metadata is accurate. Taxonomies are limited vocabularies (a set number of items) where each term has a predefined definition. A folksonomy is a vocabulary where people, usually users of data, assign their own useful words or metadata to an item. Folksonomies are like taxonomies in that they are both sets but are unlike taxonomies in the sense that they are an open set where taxonomies are closed sets.
An example of a taxonomy might be the colors of a traffic light: Red, Yellow, and Green. If this were a folksonomy people might suggest also the colors of Amber, Orange, Blue-Green and Blue. These additional terms may be accurate to some viewers of traffic lights or in some cases but they do not fit the stereo-typical model for what are the colors of traffic lights.
Continue reading



