I have found the following two links helpful when considering data anonymization and privacy issues in general.
Some researchers in linguistics (in my acquaintance) have been less than excited about the notion of asking for socio-linguistic data or socio-personal data from language informants. The objection has been that it is just bad form. While I am a great advocate of personal privacy (especially in digital formats), I see that one of the most informative parts of the language documentation process is understanding who the speakers being recording or being worked with are. Language variation is fundamentally connected with identity. While crucial elements of how a community segments itself along identity lines may not be known for several years, having a robust socio-cultural or socio-personal questionare about the language informants will later help place the documentation data in perspective of the larger waves of variation in the community.
This is to say, I am thoroughly convinced that a socio-linguistic questionare is important as part of the language documentation process. It might not need to be done first, but it will help researchers and future users of archived material understand where to place these speech samples in context of that speakers society.
The outstanding question, and one with a variable answer is how to appropriately approach the questions in the questionare. Should the questionare be approached formally? Or should it be asked in conversational format? Should it be elicited digitally? One of the interesting things about eliciting things digitally is that they may have the appearance to be less intrusive because they are less formal. While I have no empirical evidence based on years of cross cultural work, I do have the Facebook phenomena. That is minority language users all over the world are using Facebook. And Facebook is collection (and allowing the users to volunteer) and then verifying the users’ provided data.
Below is a list of elements which Facebook is collecting (it is also collecting log-in locations and times). So, some of these questions are certainly in-scope of what language documenters would minimally like to know about their indigenous language speaking informants and collaborators. Others of these questions are certainly not in-scope for the recommended socio-linguistic profile from language documenters or socio-linguists.
[table id=13 /]
I have been playing around with data available from the iPhone (and also separately visualizing Map data).
I came across a project, iPhoneTracker which was done to show iPhone users the kind of data that the iPhone collects about a users travel and whereabouts. I downloaded the app and ran it. Looks like about a complete history since I activated the phone… The interesting thing for me was that this app did not collect the data from my phone directly but rather from my computer.