This paper is motivated by an experience in collecting, analyzing, and then redeploying (sharing while making relevant to other corporate SIL functions) corporate intellectual assets. These assets are relevant to both products SIL products and services and corporate processes. This paper attempts to document some of the current challenges presented to the SIL staff person as well as present some items for consideration in overcoming these challenges.

The Context

In preparation for the Me’phaa Language Documentation Project (Mexico) partially sponsored by the NSF [ref 1] our team has done some research related to GIS data and mapping the geographical distribution of the languages being investigated. This research has involved contacting the Ethnologue Cartographers Ireene Tucker, and Matt Benjamin. Both have been very helpful, providing the Ethnologue’s data points for inhabited places and the polygons (shape files) showing the distribution of the languages being investigated. It is our teams hope that through our research and collaboration with the Ethnologue department we might improve the geographical accuracy of Ethnologue maps[ref 2] . In addition to the improved accuracy, in the event that our research results in a change to the ISO 639-3 codes, as in the addition or combination of languages to the code, that we would be able to provide the GIS data relevant to those changes. However, it is realized that the ISO 639-3 code registrar or standard does not keep track of language points or language area polygons. This is a function of the Ethnologue, not the ISO 639-3 standard.

Some research questions

To reach these collaborative objectives at an academic level of quality we have had to ask several questions:

  1. If an SIL staff researcher (or non-SIL staff researcher) has new GIS data, how do they submit that data to SIL? Then once it is submitted to SIL, how does the Ethnologue editorial team access and use the data?
  2. If a researcher wants to obtain GIS data from SIL, how do they go about getting that data?
  3. When that researcher wants to update the data that SIL has how do they go about submitting these edits to SIL?
  4. How does SIL process and track the edits to the map and GIS data? Are these edits referenced to a research document? Yesterday’s polygons might have been accurate yesterday, and new shapes may reflect language shift issues, how is this change reflected to the end user of the polygons?
  5. How are the sources for the maps tracked; how do we, as academics cite these data sources? (We could cite the Ethnologue but the Ethnologue is not always original research. As academics we are interested in and concerned with the Ethnologue’s data sources. These sources are not just the linguistic facts but also the place names, dialect or language variant names, latitude, longitude, altitude, datum, epoch and sources.)It might appear that geographers, cartographers and GIS practitioners do not generally cite their data. (Hoch and Hayes 2010 p.23-24)[ref 3]

Because I am an SIL staff researcher, and a person familiar with (some of the) SIL business processes, these questions have lead me to ask some questions about SIL corporate processes.

  1. Does SIL collect, track, curate, store, and otherwise handle GIS data related to its language projects and treat this data as valuable intellectual property as it does other kinds of intellectual property?This would assume that SIL International has a corporate value for valuing intellectual property. Intellectual property could be seen as either an asset or a liability.
  2. Is SIL International corporate data systems prepared to exchange data with field teams and other researchers or communities?
  3. Does SIL manage and deploy this data? Or is that solely the responsibility of the Ethnologue under its business department (an organizational unit within SIL International)?

The Current Process in SIL of creating Ethnologue maps

As I looked for ways to share and improve language data, and verify sources for data which are used to create SIL’s maps I learned some very interesting things. Mostly about the business model which is employed to create the maps used in the Ethnologue, but also about map and GIS data in general.
Maps are made up of layers of certain kinds details being applied on each layer. So the rivers might be in a layer, the county borders in a layer, the national borders in another layer, etc.
All this data does not make up a map. A map is a selection of layers presented in an image. A map is a product not a data set. In a sense, a map is a visual analysis of data, a selection of sets of details. If a researcher wanted to reuse that data or to verify that data was accurate, then the data, not just the analysis needs to be accessible, usable, and citable. For the most part this was not possible with the Ethnologue maps. Let me generally describe the data gathering an analysis process. This process is roughly approximated in the diagram below and may be somewhat simplified from what actually takes place.

SIL GIS Data Processes

SIL GIS Data Processes

What this process roughly looks like is:

  • A researcher, does some sort of linguistic investigation and collects location and place data about where speakers of minority languages live.
  • Name and approximate place data would be passed on to appropriate administrators in the form of reports. The data might also be published in a journal article or some other such academic venue.
  • Finally a conversation would occur with SIL cartographers, working for the Ethnologue for a specific area of the world.
  • Cartographers would look for the place names provided by the researchers and then find the place names on GMI’s dataset of places in the world. There are two issues which present themselves with this stage of the communication flow:
    1. Not all place names are in the GMI data set of populated place locations.
    2. Some of the coordinates in the GMI data set are rounded and today with GPS technology, more accurate data coordinates can be found.
  • The next stage in the flow of data is for the cartographers to take the data they have gleaned from their conversations and to create shape files (polygons) out of it. This seems to be common practice for language cartographers as of 2006.[ref 4]
  • These shape files are then loaded together and produced into maps. Maps which are part of a final publication, like the Ethnologue.

In regards to the collection of GIS data concerning minority language use, the fundamental question being asked by the corporate cartography service is How do I (as a cartographer) create an accurate map for an SIL product? Not How do I (as a cartographer) enable people to visualize language, culture, population and social attitude related data on geographical overlays and thereby foster collaboration among interested parties? In that sense, SIL runs a map making operation which is product centric rather than an operation which is service (consumer) and sharing centric. Now, SIL does enable their maps to be shared (for a price through GMI), and one can hire an SIL cartographer to create custom maps. So, this venue of sharing might be considered to be service centric at a different level (SIL provides a service to GMI so that GMI can serve individual clients). However, this is not the same level of data sharing and enabling that say Google Maps or LL-Maps enables its users to achieve as they endeavor to share and use GIS data. The saddest part of this is that this affects SIL’s efficiency with respect to SIL staff researchers being able to collaborate on the maintenance and use of GIS data.
Is it a fault that in corporate information structures GIS data is not considered a corporate asset?
The current organizational structures prevent the use of cartographers without cost to internal researchers. This cost is restrictive both to field researchers and to publishing by these internal researcher, ultimately affecting impact. But more to the point, the service being offered is not really what linguists want or needed. What is truly needed is a method for linguists to intact with the data they are providing and exchanging and create their own maps which tell the stories they are trying to convey.

Assuming a social context and a social layer around GIS data is made available to the general public and the researcher community, there is still the issues of citation and accuracy. If this social interactive platform is what the online version of the Ethnologue becomes, and the Ethnologue evolves to eventually presents data and maps based on data offered through such an interactive service the challenge for accuracy is one where the points are not rounded, and that the datums are congruent for inbound data and attested data. The challenge for citation is one where inbound sources need to be traced. This applies for data over time. Old maps may be based on incomplete data, inaccurate data, or accurate data in a different period of global history. These sorts of changes to attested data sets need to be noted and provided to researchers who are using the GIS data. As for the results from the language documentation project in Me'phaa these results can be viewed in SIL Mexico's electronic working paper series, particularly Las Conexiones Externas e Internas.[ref 5] This is partially, because the data does belong in two places, a dynamic GIS system but also as part of the language documentation corpus.

