Real Data, Live Data, Not just Ethnologue maps

There have been several interesting projects which have created language use visualizations over the last few years. The Ethnologue project produces a particular kind of visualization. In the past I have talked about the need to socialize and make the data which the Ethnologue apps are based on more accurate to WGS 84. I talk about that need in two places, on insite here: Geographical Data and on my non-insite blog: https://hugh.thejourneyler.org/2012/some-current-challenges-in-using-gis-information-in-the-sil-international-corporate-knowledge-system/

There are several challenges with the basic assumptions put forward with the current Ethnologue visualizations. 

  1. they project a language homogeny which is not necessarily accurate to real life.
  2. they project a geographical display which is not indicative of real language use. That is language use may actually be in digital mediums which can not be heard at certain locations. 
  3. Ethnologue maps make no overt claims about digital communications devices and their use by minority language speakers, however, my feeling in general is that SIL (especially in our training programs) does not assume a digital device using minority language user.

One of the tools which SIL could use to inform its business intelligence is the language of use in digital social mediums. For instance Wikipedia allows any ISO 639-3 language community to form their own wikipedia. This means that all of the IP edits are recorded and public. This also means that that would give us a language use location based on IP addresses. This can then be super imposed on additional data collected from Geo-enabled tweets. With such information, prior to a survey the pre survey data available about language use (in certain contexts) just got more interesting. – if of course survey is about questions of language use. 

Some people have taken to mapping Wikipedia edits. Such a map shows that there are a lot of people in a lot of places, speakers of minority languages included, who are able to edit content centrally hosted like that which is found on wikipedia. Here is a map created from the English language wikipedia, which is available from http://www.dailydot.com/society/wikipedia-conflict-map-flame-wars/.

As I state previously, the homogeneity of language use within a given geographical region is difficult to map. There are questions of speaker population density, and questions of social environments.  While the Ethnologue maps are very detailed in terms of their global scope one of the challenges for this kind of visualization is expressing diversity. Below is a map of language diversity based on tweets in New York City. The power of using tweets to measure the linguistic diversity of a region is that tweets are usually connected between two or more people and reveals the social connection between those people. This is a powerful bit of information. SIL could leverage this data in several ways, one way would be to make this data available to its scripture use partners. Language may not always be a barrier to understanding the gospel but I have yet to see it not be an inroad to a relationships in and through which the gospel can not be shown or presented.

Language Diversity as demonstrated on twitter

Image from http://ny.spatial.ly/

If our conceptualization about language and its geographical distribution is at all reflected in the way that we look at Ethonlogue maps then we can often miss the wide distribution that many language communities have. For instance this language map show the use of Irish as twitter users are using it. Notice that the language is not bound to Ireland.

Irish language Twitter conversations, Kevin Scannell (CC-BY-SA) http://indigenoustweets.blogspot.com/2013/12/mapping-celtic-twittersphere.html

Something fantastic with Webonary data

The UK data explorer has a very interesting set up using a powerful (free and open) visualization software tool called D3.js The tool allows you to type in a word and see how it is spelled in a variety of languages. It uses Google Translate Check it out here: http://ukdataexplorer.com/european-translator/?word=man

WordPress is equally capable to serve up Webonary data if it is configured correctly.

Man Across Europe

Some other thoughts on linguistic cartography and the display of language vitality.

Back in 2011 Lars Huttar and I played around with a heat mapping JavaScript tool called gheat. The idea was to plot the heavily populated towns with a higher gradient than lower populated towns based on speaker population densities I had from Mexican statistics data. The idea was to incorporate two important aspects of analysis, remoteness and vitality. I talk about remoteness on my blog here: https://hugh.thejourneyler.org/2012/remoteness-index/, and I talk about my the visualization here: https://hugh.thejourneyler.org/2011/language-maps-like-heat-maps/. The data may not be perfect, but it was a start. The paper has not gone anywhere since that time. I still have the draft paper, and would like to pursue this with a co-author. If there is someone else who might be interested please comment, I can give more details and the Paterson & Hutter paper draft.

If you just like looking at language maps you might enjoy this post: https://hugh.thejourneyler.org/2012/types-of-linguistic-maps-the-mapping-of-linguistic-features/

One final thought

Here is an interesting set of maps for language use. While the Enthologue maps first language use, second language remains a mystery. These efforts are trying to add visualizations to the second most popularly spoken language for a geographical region.

A second way to look at the earth is what are the places? This as been a recent hot topic in the Language Documentation circles. However, on the single language level there may or may not be a lot of interesting information to a lot of people. However, to look at the earth by which languages are taking about certain places is interesting. One point of large interaction for this conversation is wikipedia.

Client-Side Content Restrictions for Archives and Content Providers

Two times since the launch of the new SIL.org website colleagues of mine have contacted me about the new requirement on SIL.org to log-in before downloading content from the SIL Language and Culture Archive. Both know that I relate to the website implementation team. I feel as if they expect me to be able to speak into this situation (as if I even have this sort of power) - I only work with the team in a loose affiliation (from a different sub-group within SIL), I don't make design decisions, social impact decisions, or negotiate the politics of content distribution.

However, I think there are some real concerns by web-users users about being required to log-in prior to downloading, and some real considerations which are not being realized by web-users.

I want to reply to these concernes.

Continue reading

Software Needs for a Language Documentation Project

In this post I take a look at some of the software needs of a language documentation team. One of my ongoing concerns of linguistic software development teams (like SIL International's Palaso or LSDev, or MPI's archive software group, or a host of other niche software products adapted from main stream open-source projects) is the approach they take in communicating how to use the various elements of their software together to create useful workflows for linguists participating in field research on minority languages. Many of these software development teams do not take the approach that potential software users coming to their website want to be oriented to how these software solutions work together to solve specific problems in the language documentation problem space. Now, it is true that every language documentation program is different and will have different goals and outputs, but many of these goals are the same across projects. New users to software want to know top level organizational assumptions made by software developers. That is, they want to evaluate how software will work in a given scenario (problem space) and to understand and make informed decisions based on the eco-system that the software will lead them into. This is not too unlike users asking which is better Android or iPhone, and then deciding what works not just with a given device but where they will buy their music, their digital books, and how they will get those digital assets to a new device, when the phone they are about to buy no-longer serves them. These digital consequences are not in the mind of every consumer... but they are nonetheless real consequences.
Continue reading

Audio Dominant Texts and Text Dominant Audio

As linguistics and language documentation interface with digital humanities there has been a lot of effort to time-align texts and audio/video materials. At one level this is rather trivial to do and has the backing of comercial media processes like subtitles in movies. However, at another level this task is often done in XML for every project (digital corpus curation) slightly differently. At the macro-scale the argument is that if the annotation of the audio is in XML and someone wants to do something else with it, then they can just convert the XML to whatever schema they desire. This is true.

However, one antidotal point that I have not heard in discussion of time aligned texts is specifications for Audio Dominant Text vs. Text Dominant Audio. This may not initially seem very important, so let me explain what I mean.
Continue reading

The SIL archive and its two sided markets

I have been thinking about the language data marketplace (exchange if one prefers), and the role of archives in a world where minority language speakers are also internet users and digital file consumers. In particular I have been thinking about SIL’s Language and Culture Archive and the economic model called a two sided market. So, SIL as “Partners in Language Development” seems to be well situated for analysis using the two sided market analysis (matching linguist and professionals with language development skills, and persons with language development skills with interested parties in developing their language). On the surface, it seems that the SIL archive would also benefit from being the center of exchange between these same two groups. This is the subject of one of my slides for an upcoming presentation, therefore I sketched out the interactions various SIL staff might have with the archive to see if I could diagram the social interactions around language data in SIL’s two sided market. To my surprise, the two sided nature of access to data in the archive is not supported, thereby blocking a data-centric archiving service. It makes me wonder what the perceived value of the archive really is, and if the perceived value is low, then why bother? What is the return on investment (ROI) for users on either side of the market?

I tried to summarize the relationships between the various clients of the archive in the following image.

Media and relationships among different roles in SIL projects.

Media and relationships among different roles in SIL projects.

What do I want users to say?

I have been working with SIL team members to help create a better experience on SIL.org. So, I am constantly looking at how people on different web projects talk about user experience making a difference. Today I was visiting the Noun Project. There were some things I didn’t like about the website, so, I tried to give them some feedback. I found out that my ideas had already been suggested and that they were under review by the management and implementation team. A+ to the management team of the Noun Project – not for being perfect, but for communicating through imperfection and being concerned enough with users to add a feedback loop and for listening to user suggestions. The Noun Project has the edge on being Wikipedia for icons. However, it is the project and organizational commitment to User Experience and User Interaction which will make them succeed. As I look at what they are doing, I noticed this quote by their co-founder:

I find working on The Noun Project inspiring because I know what we’re doing is making a difference. I constantly get emails from teachers, designers, architects…and it’s never about how much they just “like” the service. People who use The Noun Project fall in love with it, and that’s when you know you’ve built something worthwhile. –
Sofya, Cofounder

At the end of the day, I want people to fall in love with the things I help build.

Some videos I like…

I was looking through Facebook to see if I could generate a list of videos which I have shared from YouTube… I wanted to see what I have “liked”. It would appear that though this information is available to businesses it is not available to me as a user… Sad… I kinda wanted to see what my longitudinal tastes were for videos and how much YouTube watching I do do… and has it increased over time…
Branding and video provider

In some respects this is motivated by wanting to become more able to communicate in video forms. Some of the videos I have enjoyed have been both on various video-graphic styles and various content genres. I have noticed that some of the creative videos I like to watch have sound tracks to MTV culture and music to which I have never been acquainted, but Becky has.

I think this stop motion video of head phones is an example:

Continue reading

The Power of Interns

Today I was reading about how an intern at FaceBook created their new Mobile ad interface. For those of you who watch the business news, FaceBook being able to monetize their mobile market has been a big concern for their investors. I think this really speaks to several things in the corporate culture at Facebook:

  1. They are willing to listen to the ideas of young, fresh people.
  2. They are willing to work with temporary staff.
  3. They are willing to mentor.
  4. They are willing trust (things like project goals and budding technologies).

Each of these things listed above are social issues. They are social issues within the context of the corporate environment. Additionally, the company has to be contentious of them to the point that they implement HR processes to allow these sorts of things to happen. In this respect these four things have to be something that is fought for (in order to maintain them as part of the corporate culture). I currently look at the NGO I work for and wonder, What it would take to have harness the power of Interns? We don’t currently have the corporate culture to facilitate interns, but why is that? Is our walled garden so well constructed with bricks from the baby-boomer generation that we forget the power which comes when we can run with young people? For businesses, even for NGOs, if we don’t fight for relevance within the social networks of the up-coming generation then we will marginalize our significance.

Modular Courses for Linguistics

In 2008 I was contacted by a professor who wanted to be able to share various linguistics exercises with fellow professors. He asked for a website to be build so that if a professor were to translate the directions of these exercises that they could in turn put these translated versions back into the “set of exercises”. Continue reading