Types of Linguistic Maps: The Mapping of linguistic Features and Researcher Interactivity

A couple of years ago I had a chance meeting with a cartographer in North Dakota. It was interesting because he asked us (a group of linguists) What is a language or linguistic map? So, I grabbed a few examples and put them into a brief for him. This past January at the LSA meeting in Portland, Oregon, I had several interesting conversations with the folks at the LL-Map Project under Linguists’ List. It occurred to me that such a presentation of various kinds of language maps might be useful to a larger audience. So this will be a bit unpolished but should show a wide selection of language and linguistic based maps, and in the last section I will also talk a bit about interactive maps. Continue reading

Some current challenges in using GIS Information in the SIL International Corporate Knowledge System

Preface

This paper is motivated by an experience in collecting, analyzing, and then redeploying (sharing while making relevant to other corporate SIL functions) corporate intellectual assets. These assets are relevant to both products SIL products and services and corporate processes. This paper attempts to document some of the current challenges presented to the SIL staff person as well as present some items for consideration in overcoming these challenges.

The Context

In preparation for the Me’phaa Language Documentation Project (Mexico) partially sponsored by the NSF our team has done some research related to GIS data and mapping the geographical distribution of the languages being investigated. This research has involved contacting the Ethnologue Cartographers Ireene Tucker, and Matt Benjamin. Both have been very helpful, providing the Ethnologue’s data points for inhabited places and the polygons (shape files) showing the distribution of the languages being investigated. It is our teams hope that through our research and collaboration with the Ethnologue department we might improve the geographical accuracy of Ethnologue maps . In addition to the improved accuracy, in the event that our research results in a change to the ISO 639-3 codes, as in the addition or combination of languages to the code, that we would be able to provide the GIS data relevant to those changes. However, it is realized that the ISO 639-3 code registrar or standard does not keep track of language points or language area polygons. This is a function of the Ethnologue, not the ISO 639-3 standard.

Some research questions

To reach these collaborative objectives at an academic level of quality we have had to ask several questions:

  1. If an SIL staff researcher (or non-SIL staff researcher) has new GIS data, how do they submit that data to SIL? Then once it is submitted to SIL, how does the Ethnologue editorial team access and use the data?
  2. If a researcher wants to obtain GIS data from SIL, how do they go about getting that data?
  3. When that researcher wants to update the data that SIL has how do they go about submitting these edits to SIL?
  4. How does SIL process and track the edits to the map and GIS data? Are these edits referenced to a research document? Yesterday’s polygons might have been accurate yesterday, and new shapes may reflect language shift issues, how is this change reflected to the end user of the polygons?
  5. How are the sources for the maps tracked; how do we, as academics cite these data sources? (We could cite the Ethnologue but the Ethnologue is not always original research. As academics we are interested in and concerned with the Ethnologue’s data sources. These sources are not just the linguistic facts but also the place names, dialect or language variant names, latitude, longitude, altitude, datum, epoch and sources.)1

Because I am an SIL staff researcher, and a person familiar with (some of the) SIL business processes, these questions have lead me to ask some questions about SIL corporate processes.

  1. Does SIL collect, track, curate, store, and otherwise handle GIS data related to its language projects and treat this data as valuable intellectual property as it does other kinds of intellectual property?2
  2. Is SIL International corporate data systems prepared to exchange data with field teams and other researchers or communities?
  3. Does SIL manage and deploy this data? Or is that solely the responsibility of the Ethnologue under its business department (an organizational unit within SIL International)?

The Current Process in SIL of creating Ethnologue maps

As I looked for ways to share and improve language data, and verify sources for data which are used to create SIL’s maps I learned some very interesting things. Mostly about the business model which is employed to create the maps used in the Ethnologue, but also about map and GIS data in general.
Maps are made up of layers of certain kinds details being applied on each layer. So the rivers might be in a layer, the county borders in a layer, the national borders in another layer, etc.
All this data does not make up a map. A map is a selection of layers presented in an image. A map is a product not a data set. In a sense, a map is a visual analysis of data, a selection of sets of details. If a researcher wanted to reuse that data or to verify that data was accurate, then the data, not just the analysis needs to be accessible, usable, and citable. For the most part this was not possible with the Ethnologue maps. Let me generally describe the data gathering an analysis process. This process is roughly approximated in the diagram below and may be somewhat simplified from what actually takes place.

SIL GIS Data Processes

SIL GIS Data Processes

What this process roughly looks like is:

  • A researcher, does some sort of linguistic investigation and collects location and place data about where speakers of minority languages live.
  • Name and approximate place data would be passed on to appropriate administrators in the form of reports. The data might also be published in a journal article or some other such academic venue.
  • Finally a conversation would occur with SIL cartographers, working for the Ethnologue for a specific area of the world.
  • Cartographers would look for the place names provided by the researchers and then find the place names on GMI’s dataset of places in the world. There are two issues which present themselves with this stage of the communication flow:
    1. Not all place names are in the GMI data set of populated place locations.
    2. Some of the coordinates in the GMI data set are rounded and today with GPS technology, more accurate data coordinates can be found.
  • The next stage in the flow of data is for the cartographers to take the data they have gleaned from their conversations and to create shape files (polygons) out of it. 3
  • These shape files are then loaded together and produced into maps. Maps which are part of a final publication, like the Ethnologue.

In regards to the collection of GIS data concerning minority language use, the fundamental question being asked is how do I create an accurate map for an SIL product? Not how do I enable people to visualize language related data on geographical overlays and thereby foster collaboration among interested parties? In that sense, SIL runs a map making operation which is product centric rather than an operation which is service and sharing centric. Now, SIL does enable their maps to be shared (for a price through GMI), and one can hire an SIL cartographer to create custom maps. So, this might be considered to be service centric at a different level. However, this is not the same level of data sharing and enabling that say Google Maps or LL-Maps enables its users to share and use GIS data. The saddest part of this is that this affects SIL’s efficiency with respect to SIL staff researchers being able to collaborate on the maintenance and use of GIS data.
Is it current fault in corporate information structures, that this data (GIS Data) is not considered a corporate asset?
The current organizational structures prevent the use of cartographers without cost to internal researchers. This cost is restrictive both to field researchers and to corporate publishing. But more to the point, the service being offered is not really what linguists want or needed. What is truly needed is a method for linguists to intact with the data they are providing and exchanging and create their own maps which tell the stories they are trying to convey. Then if the Ethnologue presents data based on data offered through such an interactive service and platform knowledge provided from fieldwork can be appropriately cited. As for the results from the language documentation project in Me'phaa these results can be viewed in SIL Mexico's electronic working paper series, particularly Las Conexiones Externas e Internas.

Notes

  1. ↑1 It might appear that geographers, cartographers and GIS practitioners do not generally cite their data. (Hoch and Hayes 2010 p.23-24)
  2. ↑2 This would assume that SIL International has a corporate value for valuing intellectual property. Intellectual property could be seen as either an asset or a liability.
  3. ↑3 This seems to be common practice for language cartographers as of 2006.

Review of Garmin eTrex Venture HC for Language Documentation

In a recent (2010-2011) Language Documentation Project we decided to also collect GIS data (GPS Coordinates), about our consultants (place of origin and place of current dwelling), about our recording locations and for Geo-tagging Photos. We used a Garmin eTrex Venture HC to collect the data and then we compared this data with GIS information from Google maps and the National GIS information service. This write up and evaluation of the Garmin eTrex Venture HC is based on this experience.

The Technical Context:

  • The Device: eTrex Venture HC
  • Some good information, including reviews of the device can be found on gpstracklog.com and on Garmin's website.

  • I use my Garmin with:
    • OS X 10.6.5 via a USB cable
    • Garmin BaseCamp 3.1.2 (for downloading and editing GPS Tracks, Waypoints and other data.)
    • GPS BabelFE 1.4.0 (for downloading "original/archivable" copies of my GPS data.)
    • PhotoLinker 2.2.7 (for embedding GPS Data into the photos. )

Purpose in using the Device:

  • Geo-tagging Photos:
  • What is involved in geo-tagging: I match the time I took the photos to a time mark on my GPS tracks. I use PhotoLinker to do this. I then can display the photos on a map or in conjunction with other geo-tagged files. Although Garmin BaseCamp does offer the feature of geo-tagging photos, the implementation is not as robust as PhotoLinker. With PhotoLinker I can process hundreds of photos at a time. In addition to geo-tagging, PhotoLinker can also write meta-data to other fields in the photo's meta-data. i.e. The photographer's contact info, copyright and license info, info about the subject of the photo, IPTC tags, etc. (These additional features are not available in iPhoto but are supposedly available in Aperture. I have not yet purchased a license of Aperture to do testing to compare it with what PhotoLinker does.) These additional meta-data1 editing features makes PhotoLinker have a more central part in my photo processing workflow.

  • Language Documentation:
  • Tasks in Language Documentation:

    Mapping language boundaries: Mapping language boundaries, is difficult. If a region is marked as speaking a particular language what does that mean? Does it mean that the ground there speaks a different language? No, language is incarnate in people. So mapping languages is really taking a geo-point and declaring what language is spoken by people who live at that point. Obviously, one of the challenges is that people are not stationary like sediment or layers of rock in the earth. So in a sense there is a degree of ambiguity in mapping language boundaries before one even considers issues like diglossia.

    Plotting Language Data: This is a complex issue because we are not plotting the land owned, farmed, or lived on by the people who speak a particular language. We are plotting the language as it is incarnate in people. People who might be mobile, and in some sense the relationship between language data and land coordinates are temporal. What is easier to plot is where a language consultant was born, where they live at the time of the elicitation, where the elicitation took place or the presence or absence of a linguistic feature. These linguistic features might be in the form of a phonological process, a syntactic construction, or the use of a lexical item to refer to a particular concept, etc.2

    Geo-Photos and the role in elicitation: Comparatively, the mapping of geo-tagged photos is a relatively simple thing to add or to do during the course of a language documentation project because they are representative of a single time and place. Photos of objects and events can also be a real asset in describing the environment in which a language is spoken, and the concepts to which lexical items refer. The ability for a researcher to send a GPS unit and a Camera with a language consultant to the consultant's village allows the researcher to see (upon the successful return of the GPS unit and Camera) geo-spacial relationships from a perspective which might not be possible or practical otherwise. The photos from such a trip also provide material for discussion and the elicitation of lexical items and or concepts. One particular area of elicitation which can be explored through this method is ethno-botany. In traditional field linguistic research there is often an under described wealth of ethnic knowledge and linguistic terms relating to plants and their uses.

    Data types created with the Venture HC: There are basically three different data types which the Venture HC can create. These are:

    • Waypoints: Particular points in time with a specialized name and icon.
    • Activity Log: A series of coordinates over time.
    • Track Files: Series of coordinates without time.

    For a discussion on these data types in other models of Garmin GPS's see these hints.

    These types of data are all transmitted to the computer via a .gpx file (more on .gpx files). However, GPS BabelFE can directly access the data on the Venture HC, and it supposedly downloads the data as any of a variety of formats. It is unclear if the data is first accessed in the GPX format and then converted or if it is accessed directly and then converted. GPX is the standard GPS data exchange format. So, unless my workflow dictates otherwise I always download my data as .gpx files.

    Under testing, the dynamics of this particular Garmin model allow for about 9 hours of continuous recording of coordinates while recording a coordinate every 15 seconds. This equates to about 2200 points in an activity log. Garmin's Website describing the Venture HC says that it has room for 10,000 points in the device This is also confirmed by a very helpful Manual put together by the The Virginia Geospatial Extension Program. However, the Garmin manual does not explicitly say that all of the memory can be used to store a really long Activity log file. It is my experience that not all of these points can be used for the active Actively log. It is my assumption that some must be designated for saved track logs (up to 10), routs, and waypoints (up to 500).3

Experience Updating the Software and Firmware:

Using Garmin Web Updater: The unit I have came with software version 2.30 installed. On the third attempt to update the software with Web Updater It was successfully and updated the unit to version 3.30 (Update on Garmin's Website). Then Web Updater said there was a GPS Chipset Type M2 (Region File) update that needed to be applied. So I clicked "Apply" and let it run for 8 hours over night with no results. Very scary, when one considers the manny reports out there on various forums which say something to the effect of "I turned my GPS into a brick by disconnecting it while it was updating...". However, I disconnected the unit after 8 hours and took the batteries out (total power off). I then reconnected the unit and tried the update again an it took in under 3 minutes.

There are some reports of Web Updater not working with some units (also eTrex Ventrue HC models). There is an alternative place to find software updates and apply them to a unit using WindowsXP or newer Windows OS. However, this is not the Garmin recommend method.
The updates Garmin updates can be found at: http://www.tramsoft.ch/gps/garmin_etrex-venture-hc-firmware-upgrades_en.html

Taking photos of your screen: If the Garmin unit is used with a Windows OS (XP or newer), Garmin has a utility for taking screen shots. In this way tutorials can be created for working with language consultants. This utility is called xImage and is freely available from Garmin.

Example Screenshot

Custom Maps:

  • Putting Maps on:
  • For more advanced Garmin GPS units there are two methods for changing the onboard maps included in each device.

    However, there are several things to consider when looking at these two options. First, map coverage from purchased maps from Garmin usually do not include significant detail in the areas of the world which are prone for language documentation (remote parts of Africa, Highlands or Islands of Asia, or in my case the highlands of Guerrero State, Mexico). Aside from issues of cost, availability, and detail, Garmin maps are not as accurate as open source maps available through projects like OpenStreetMap (where data is available from both sources on the same location).

    There are two cross-platform utilities for loading open source maps onto the Venture HC:

    However, both of these utilities require more skill to operate than iTunes requires. I was not able to successfully install an open source map on my Venture HC. This may have been due to the software, my proficiency with the utilities, or it may have been due to the hardware. It is supposedly possible with this Venture HC. I have found several references to this ability on forums. Also of consideration, the Venture HC only has 24MB of room for maps. Given the challenge of adding new maps to this unit. I figured that my efforts would be better spent if I were to wait to attempt a map install on a device which could handle a larger quantity (MB) of maps.

  • Restoring the Default Map:
  • Because I was not able to install custom maps I was also not able to test the ability to restore Garmin's Maps to the device. However, there appears to be two points of consideration here.

    • One could use Garmin Mapsource to "re-install" the default maps.
    • Or one could just un-install the custom maps they have loaded. Some discussions in the forums indicate that the base/default map is not actually deleted but rather not accessible when a custom map is installed.
  • Acquiring Maps:
  • Aside from OpenStreetMap there are some other places to find maps or images of maps. The best bet is to get connected with a local cartography group. One more international group, which I have found helpful in finding resources for working with OS X is www.maps-gps-info.com.

Conflated/Simplified Tracts:

After a 9 hour trip I decided to save the trip to the track log. This had the effect of deleting the time coordinate from the track log while maintaining some the GPS coordinates. I say "some GPS coordinates" because it also limited the file to 300 coordinate readings from the 2000+ reading I had in the Activity log (further reading). This was not discovered until I had downloaded the activity log new and the track file (which I thought was the full version of the Activity log). Since the time stamp details are crucial for photo tagging and relating a GPS coordinate to other events in the flow of the research event. This "feature" in the Venture HC limits the useful life of the GPS unit to 9 hours before one must download the activity log, unless one takes fewer GPS readings. i.e. if a GPS reading was only taken every 30 Seconds rather than every 15 then the life of the device would theoretically double to 18 hours. This "feature" is discussed on this forum as well.

Recommendation:

At this point I am not recommending the Venture HC. Rather, I am recommending the eTrex Legend HCx, or the eTrex Vista HCx. This Recommendation is not based on any experience with these devices. But based on two recommendations R1, R2. I was rather depressed when I realized that I lost those 9 hours of data. So for me, in my workflow, being able to write the activity log to a MicroSD card became a high ranking constraint. I was not able to text the accuracy of the Venture HC relative to other hand-held GPS units. But in looking at the exported tracks on GoogleEarth the off set was measurable in feet. In my test as shown below, I did mark the waypoint outside of town and then walked down the dirt path. When one considers that the margin of error for the Venture HC is ±24' (because it is a GPS device and there is an introduced amount of inaccuracy) and that there might also be a slight margin of error in GoogleEarth, the results were acceptable and impressive.

GoogleEarth Displaying GPS Trace from Garmin Venture HC

GoogleEarth Displaying GPS Trace from Garmin Venture HC

When comparing the Venture HC to other Garmin had-held devices I have noticed that it does find its position via triangulation faster and in more areas (like in buildings and under dense foliage or between skyscrapers).

Other Garmin Devices I have used:

Geko 201.
eTrex Legend.
eTrex 12 channel.

What to Look for in a GPS unit:

  1. Seamless exchange of data with your other devices (most often your computer.) I recommend USB, but Bluetooth is sometimes an option. Avoid serial connections when possible.
  2. Long, even Expandable Activity Logs. (Expandable via micoSD cards)
    Make sure that parts of the activity log is not removed upon storage. (Some devices do not store the altitude or the time data).
  3. High Sensitivity GPS Receivers.
  4. Replaceable Batteries.
  5. Long Battery Life. (This may mean a black and white screen v.s. a color screen.)

Understand how you are going to be using the data collected so that you can master your workflow. Your workflow may require the use of a particular feature like, Custom Maps. So that feature might become an evaluation point for you.

This is all OS X talk... is there anything for Windows?
Well, I don't use Windows when I can avoid it. But I have seen people doing similar things to what I am doing with various applications. One application I have seen for adding metadata to photos is iTag. Another application I have seen is PhotoME. There is a Firefox Plugin called Opanda. I cannot comment on the usability or results of any of these applications, because I have not used it. However, if you want to write up a workflow or a recommendation for doing this kind of thing on a different OS then email me and I will consider what you have written and post it here.

Notes

  1. several major meta-data formats for photos. Before one decides to use any of them an attempt to familiarize oneself with the technical aspects of each, the use of any particular meta-data standard the archive with which the depositor wishes to deposit the results of the Language Documentation project. Also not to be overlooked is the way the tools one is considering using actually implement that particular meta-data standard. (Example Case with Aperture.) The three Meta-data standards I have worked with are IPTC, XMP, and Exif.
  2. ↑2 Plotting information about speakers of a language is important from a language documentation perspective. However, the public availability of this information can raise some privacy concerns. Consider some points raised in: Using Google Earth to Access Language Resources (p.7)
    • Languages are spoken in areas that can be unclear, may overlap, and may be contested; is it appropriate or possibly advantageous to represent language locations by a point on a map?
    • In projects that study a number of languages, or that record the same language in different countries, how can the projects and languages best be represented?
    • Could placemarks that reveal locations of where recordings were made cause problems, for example in difficult political situations?
    • The exact location for many old recordings is not known, so should a “prototypical” place be selected?
  3. ↑3 Although it is common practice to preserve the original files in Language Documentation and send the original files as they were created to the archive. It has been my experience that sometimes extra Waypoints from a trip might be accidentally added to an Activity Log or that the Waypoints from a previous trip were not deleted. This adds "noise" to a .gpx file. This "noise" should be considered and perhaps removed when archiving a .gpx file (or at least a noise free version included in the set of files sent to the archive). It has also been my experience in working between the Venture HC and BaseCamp that the icons attached to the Waypoints are not always transferred from the device to BaseCamp correctly. This is rather frustrating and I am not sure how to correct the problem. Therefore icons alone should not be the bases for describing a way point. Editing software like BaseCamp can add longer names to Waypoints than one can add directly from the device. These longer names can make more sense to people using the GPS data and may make more sense in the final presentation form of the data.