Legal issues in Linguistics

I've been looking a the growing scholarship around legal issues linguists face in their research. The following is a list of links.

  • https://www.clarin.eu/content/legal-information-platform
  • http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-Legal-Issues.pdf
  • http://www.elra.info/en/dissemination/legal-issues-papers/legal-issues-webcrawling-report/
  • http://www.elra.info/en/dissemination/legal-issues-papers/
  • http://www.elra.info/en/tag/207/
  • http://www.elra.info/en/elra-events/legal-issues-workshop-lrec2016/

git for field linguistics

https://www.protocols.io/
https://royalsocietypublishing.org/doi/10.1098/rsta.2020.0210
https://oad.simmons.edu/oadwiki/Data_repositories
https://codeocean.com/
https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/bes2.1801
https://www.ncbi.nlm.nih.gov/books/NBK547546
https://link.springer.com/article/10.1007/s00799-020-00288-2
https://www.pachyderm.com/

https://www.carlboettiger.info/2013/06/03/DOI-citable.html
https://mclm2022.github.io/git/cheatsheet.html
https://git-scm.com/docs/gitattributes
https://stackoverflow.com/questions/66845434/gitattributes-linguist-language-declaration
https://www.hiramring.com/posts/using-git-for-linguistics
https://www.dolthub.com/blog/2020-03-06-so-you-want-git-for-data/

Using Git for Database: Why It Works and How to Persist Where It Doesn’t


https://towardsdatascience.com/a-guide-to-git-for-data-scientists-fd68bc1c729
https://gitforteams.com/
https://pepa.holla.cz/wp-content/uploads/2016/01/Git-for-Teams.pdf
The book is a good introduction to basic workflows, however if you are looking for help with managing complex projects, look elsewhere. It does not even mention submodules (or its alternatives), which is the cornerstone of managing independent but shared subprojects.
https://michaelstepner.com/blog/git-vs-dropbox/#:~:text=In%20git%20you%20have%20versions,Did%20you%20get%20them%20all%3F

https://www.pachyderm.com/

https://dvc.org/

The Guide to Data Versioning

Data Version Control – A Data Engineering Best Practice You Must Adopt

https://research.aimultiple.com/data-versioning/

https://towardsdatascience.com/git-for-data-engineers-a8b979d8b2ab

https://dagshub.com/blog/data-version-control-tools/

https://neptune.ai/blog/dvc-alternatives-for-experiment-tracking

https://terminusdb.com/blog/git-for-data/

https://terminusdb.com/

https://www.kdnuggets.com/git-for-data-science-cheatsheet.html

http://karthik.github.io/git_intro/#/slide-title

https://towardsdatascience.com/git-a-complete-guide-d49675d02a5d

https://valohai.com/blog/git-for-data-science/

Git for Data Science – A Guide For Data Scientists

https://phoenixnap.com/kb/how-to-use-git

https://gucorpling.org/gitdox/

https://phoenixnap.com/kb/how-to-use-git

https://www.nimirea.com/blog/2019/05/10/git-for-social-scientists/

Minorities in US Law

Granted there are different parts of US Law, but I'm just reading this for the first time and find the definitions interesting and far from how linguistic departments often think of minorities...

https://www2.ed.gov/about/offices/list/ocr/edlite-minorityinst.html

These MSI's outlined in law were brought to my attention through the work the the department of homeland security... opinions aside on the role and mission of DHS, it makes me wonder if this "research" is to better serve DHS regardless of how it is directed and does that mean minorities are more severely impacted due to this research, so what is the ethical component of this "research"... though I may not be fully understanding the context of DHS's purpose...

Originally from: https://www.zintellect.com/Opportunity/Details/DHS-SRTMSI-2023-FacultyApp

The U.S. Department of Homeland Security (DHS) Summer Research Team (SRT) Program for Minority Serving Institutions (MSIs) is now accepting applications from faculty at Minority Serving Institutions (MSI) interested in participating in a summer research team experience. Selected Faculty will be invited to submit a Team Application including a Research Project Proposal developed in collaboration with a DHS Center researcher and applications from one or two qualified students.

The program seeks to increase and enhance the scientific leadership at MSIs in research areas that support the mission and goals of DHS. This program provides faculty and student research teams with the opportunity to conduct research at the university-based DHS Centers of Excellence (DHS Centers). At the end of the ten week appointment, faculty collaborate with center to apply for up to $100,000 in follow-on funding to continue research during the 2023-2024 academic year at the faculty’s home academic institution.

This year’s participating Centers of Excellence are:

Center for Accelerating Operational Efficiency (CAOE)
Center of Excellence for Cross-Border Threat Screening and Supply Chain Defense (CBTS)
Criminal Investigations and Network Analysis (CINA)
Critical Infrastructure Resilience Institute (CIRI)
National Counterterrorism Innovation, Technology, and Education (NCITE)
Soft-target Engineering to Neutralize the Threat RealitY (SENTRY)

Capitalization in indigenous writing systems

I was recently visiting a small remote village. There were large sorghum fields all around. This village was notable for some of the environmental literacy which on could find in the area. Particularly the use of capitalization in names. In fact the name of the village had two capital letters.

Village name sign

This sort capitalization pattern of the use of capitalization word medially has seen its objections among onomastists. The suggestion has been that English does not allow for names to contain two capital letters and therefore references materials written in English containing non-English names should normalize capitalization so that only the first letter of names is capitalized. Obviously this is an uninformed but principled position to take. It is a serious matter to regularize a reference resource because it gives a filtered (and biased) view to users.

Is and am

Katja has two verbs of note: is and am. Last night we heard “am” for the first time. We heard the response “me am” to “you should be laying on the pillow”. In contrast to new verbs “is” has been a long standing verb of location. “Me is up”, “me is down”, or “me blanket is”. Is is almost exclusively used along side ideas of location. And is often in phrase final position. As in “is mommy?” For something like “where is it mommy?” Whereas “mommy is?” would be “where is mommy?”

It is cute how her language choice evolves. The new lexicon is displayed, the old home speak word diaper, pronunciation evolves. In one way I loath the change. It is sad to loose the old forms. They are often so straightforward and morphology simple. Part of me says I should be recording this speech, but I’ll never review it. Maybe 5-10 minutes of it the night before she gets married. But in reality, not really ever.

Color terms

I’m not really sure how my daughter started to learn color terms. How does one learn that the word is not a noun (the name of the object) and is the name of a color? I mean think about it as a parent you point to something and say “blue” or “red”. How does the child know that the parent is talking about color?

For about 4 months Katja has correctly identified and labeled blue objects. This is likely due to blue bear being such a prominent part of her life.

I think it was back in June and May that we started coloring with Crayons and writing with pens. This increases the exposure to color terms.

Somewhere along the line we started talking about the colors of the handholds on her climbing wall. I think her second color term about two months ago was “Black”.

“Black” quickly became overshadowed by “pink”. Katja has a special blanket which is pink. But there is her pink backpack, and shoes, and tonnes of other pink things.

Along the way “yellow” has been identified and pronounced in short sessions, like reading curious George. But as a color term has not been a stable self produced word. This may be in part to the challenge Katja has with pronouncing liquids.

This week “green” became a stable word and within a few hours red did too. Though there is a lot of phonological reduction going on right now.

Real Data, Live Data, Not just Ethnologue maps

There have been several interesting projects which have created language use visualizations over the last few years. The Ethnologue project produces a particular kind of visualization. In the past I have talked about the need to socialize and make the data which the Ethnologue apps are based on more accurate to WGS 84. I talk about that need in two places, on insite here: Geographical Data and on my non-insite blog: https://hugh.thejourneyler.org/2012/some-current-challenges-in-using-gis-information-in-the-sil-international-corporate-knowledge-system/

There are several challenges with the basic assumptions put forward with the current Ethnologue visualizations. 

  1. they project a language homogeny which is not necessarily accurate to real life.
  2. they project a geographical display which is not indicative of real language use. That is language use may actually be in digital mediums which can not be heard at certain locations. 
  3. Ethnologue maps make no overt claims about digital communications devices and their use by minority language speakers, however, my feeling in general is that SIL (especially in our training programs) does not assume a digital device using minority language user.

One of the tools which SIL could use to inform its business intelligence is the language of use in digital social mediums. For instance Wikipedia allows any ISO 639-3 language community to form their own wikipedia. This means that all of the IP edits are recorded and public. This also means that that would give us a language use location based on IP addresses. This can then be super imposed on additional data collected from Geo-enabled tweets. With such information, prior to a survey the pre survey data available about language use (in certain contexts) just got more interesting. – if of course survey is about questions of language use. 

Some people have taken to mapping Wikipedia edits. Such a map shows that there are a lot of people in a lot of places, speakers of minority languages included, who are able to edit content centrally hosted like that which is found on wikipedia. Here is a map created from the English language wikipedia, which is available from http://www.dailydot.com/society/wikipedia-conflict-map-flame-wars/.

As I state previously, the homogeneity of language use within a given geographical region is difficult to map. There are questions of speaker population density, and questions of social environments.  While the Ethnologue maps are very detailed in terms of their global scope one of the challenges for this kind of visualization is expressing diversity. Below is a map of language diversity based on tweets in New York City. The power of using tweets to measure the linguistic diversity of a region is that tweets are usually connected between two or more people and reveals the social connection between those people. This is a powerful bit of information. SIL could leverage this data in several ways, one way would be to make this data available to its scripture use partners. Language may not always be a barrier to understanding the gospel but I have yet to see it not be an inroad to a relationships in and through which the gospel can not be shown or presented.

Language Diversity as demonstrated on twitter

Image from http://ny.spatial.ly/

If our conceptualization about language and its geographical distribution is at all reflected in the way that we look at Ethonlogue maps then we can often miss the wide distribution that many language communities have. For instance this language map show the use of Irish as twitter users are using it. Notice that the language is not bound to Ireland.

Irish language Twitter conversations, Kevin Scannell (CC-BY-SA) http://indigenoustweets.blogspot.com/2013/12/mapping-celtic-twittersphere.html

Something fantastic with Webonary data

The UK data explorer has a very interesting set up using a powerful (free and open) visualization software tool called D3.js The tool allows you to type in a word and see how it is spelled in a variety of languages. It uses Google Translate Check it out here: http://ukdataexplorer.com/european-translator/?word=man

WordPress is equally capable to serve up Webonary data if it is configured correctly.

Man Across Europe

Some other thoughts on linguistic cartography and the display of language vitality.

Back in 2011 Lars Huttar and I played around with a heat mapping JavaScript tool called gheat. The idea was to plot the heavily populated towns with a higher gradient than lower populated towns based on speaker population densities I had from Mexican statistics data. The idea was to incorporate two important aspects of analysis, remoteness and vitality. I talk about remoteness on my blog here: https://hugh.thejourneyler.org/2012/remoteness-index/, and I talk about my the visualization here: https://hugh.thejourneyler.org/2011/language-maps-like-heat-maps/. The data may not be perfect, but it was a start. The paper has not gone anywhere since that time. I still have the draft paper, and would like to pursue this with a co-author. If there is someone else who might be interested please comment, I can give more details and the Paterson & Hutter paper draft.

If you just like looking at language maps you might enjoy this post: https://hugh.thejourneyler.org/2012/types-of-linguistic-maps-the-mapping-of-linguistic-features/

One final thought

Here is an interesting set of maps for language use. While the Enthologue maps first language use, second language remains a mystery. These efforts are trying to add visualizations to the second most popularly spoken language for a geographical region.

A second way to look at the earth is what are the places? This as been a recent hot topic in the Language Documentation circles. However, on the single language level there may or may not be a lot of interesting information to a lot of people. However, to look at the earth by which languages are taking about certain places is interesting. One point of large interaction for this conversation is wikipedia.