In 2008 I was contacted by a professor who wanted to be able to share various linguistics exercises with fellow professors. He asked for a website to be build so that if a professor were to translate the directions of these exercises that they could in turn put these translated versions back into the “set of exercises”. Continue reading
This week I have been outlining the types of data that linguists need to be able to use and relate to each other as they do Language Documentation and Linguistic Research. I try to express these things graphically and then also express where some of the leading tools which SIL International is offering sit in the problem space.
This post is a open draft! It might be updated at any time... But was last updated on < ?php the_modified_date() ?> at < ?php the_modified_time()?>.
The online version of the SIL Bibliography contains a subset of over 29,000 citations from the more than 40,000 publications representing 75 years of SIL International's language research in over 2,700 languages.
Finding Resources through SIL.org's (as of 2 August 2012) Bibliography can be a challenge at times - Maybe even a time-wasting endeavor. Time wasting because it might not be very useful to consult the online Bibliography.
The challenging aspect which affects usefulness is primarily three fold:
- Items known by SIL to have been created by SIL staff may or may not be listed. (The on-line Bibliography is a sub-set.)
- Items listed in the Bibilography may or may not have digitally accessible resources.
- Items created by SIL staff may or may not be in the bibliography because they have not been submitted to the Language and Culture Archive (managing division of the SIL Bibliography).
In July I presented a paper at CRASSH in Cambridge. It was a small conference, but being in Europe it was good to see many of the various kinds of projects which are going on in Digital Humanities and Linguists, or also Cloud Computing and Linguistics. One particular project, TypeCraft, stands out as being rather well done and promising was presented by Dorothee Beermann Hellan. I think the ideas presented in this project are well thought out and seem to be well implemented. It would be nice to see this product integrated with some other linguistics and language documentation cloud offerings. i.e. Project Lego from the Linguist’s List or the Max Planck Institute’s LEXUS project. While TypeCraft does allow for round tripping of data with XML, what I am talking about is a consolidated User Experience for both professional linguists and for Minority language users.
A note on foundational technologies:
- It appears that Lexus is is built on BaseX with Cocoon and XML.
- The front page of TypeCraft has a very Wikipedia like feel, but this might not be the true foundational technology.
- Linguist’s List often does their work in ColdFusion and the LEGO project definitely has this feel about it.
I have been thinking through some of the presentation issues for presenting SIL International’s work on the web. As part of this I have also been looking at other organizations which are part of the language documentation and minority language revitalization movement. I recently ran across several nicely done web sites.Continue reading
I once listend to a Creative Commons Salon titled: What Does it Mean to Be Open in a Data-Driven World? and in that discussion there was a great discussion on what it means to have data which flows and is openMinute 50 has a really interesting comment about sharing scientific data.
A document’s DOI (http://www.doi.org/ or on Wikipedia under Digital Object Identifier) is an important part of the citation of a document. Many style sheets allow for just the DOI of a paper as the citation. Because DOIs are unique they can act as URIs which are resolvable and look like URLs. However, a DOI is different than a URL for where a digital object might be located. It might be well argued that a DOI should be tracked in the metadata schemes of archives which collect language and linguistic data.
As I work with a particular NGO, one of the interesting questions which has come up in discussions is whither or not the NGO should put their logo on their web page with instructions for proper use. There were two main questions asked:
- Is this something which needs to be on the web publicly (as apposed to privately on an intranet)?
- Is this even a common practice?
I am listing a few use cases here to show some of the variety and breadth of the kinds of people who are sharing their logos and providing display and license guidelines to potential users of their logos.
I think there are two primary reasons for organizations to provide access to branding information in a public venue:
- Help partners accurately visually display the offering organization’s brand.
- Help staff have a visible, consistent and authoritative reference point when communicating with partners. Because this conversation with partners is about the partners displaying their affiliation with the NGO it is something which can be facilitated publicly.
I go through some of the use cases in the video below. The blog post in that video about teaching in Malaysia can be read here.
However, the IBM logo is text based and does not meet the threshold for copyright originalityThis information is what is provided on Wikipedia about the IBM icon used here.. However it is still a logo and covered under registered trade mark rules.
Another organization with a rather popular logo among internal and external users is U.S. military. This would include logos like that of the U.S. Air Force. They also have specific guidelines posted for different uses of their logo. As well as a page explaining the symbology of the logo.
Apple is another popular company with several programs and logos specifically designed for use by business partners. One of the things which is required in these kinds of relationships is for the organization granting the logo’s use to be firm in their organizational identity. This means: defining the relationship – who is the NGO and who is not the NGO. For some organizations it means defining what items are trademarks, products and logos.
The next three brands have a particularly visual representation and presentation of their branding guidelines.
WordPress logos are made freely available under their about section.
http://wordpress.org/about/logos While WordPress is an opensource product, it is also a community. About a year and a half ago one there was quite a stir made by Automatic about proper logo usage. The community had some who were less than thrilled with the emphasis Automatic brought on branding an open source project, but in the end even the controversy made the brand stronger. The consistent iconization of the product also made the brand more recognizable. Today the WordPress project has a lot of logo options which conform to established branding guidelines. This gives the community flexibility and continuity at the same time.
Adobe is a company whose name is almost synonymous with the term digital art. It is well known for products like Photoshop and for files like PDFs. When we think of PDFs we often think of the Acrobat Logo on the image of a file.Part of this visibility is due to Adobe Icons and Logos which it has made available.
Perhaps my favorite logo explanation is the simple (yet detailed) approach that Twitter has taken on its page
Twitter.com/logo. Here are some screen shots.
Over the last few weeks I have been contemplating how multi-lingual content could work on sil.org. (I have had several helpful conversations to direct my thinking.)
As I understand the situation there is basically three ways which multi-lingual content could work.
First let me say that there is a difference between, multi-lingual content, multi-lingual taxonomies, and multi-lingual menu structures. We are talking about content here, not menu and navigation structures or taxonimies. Facebook has probably presented the best framework to date for utilizing on the power crowds to translate navigation structures. In just under two years they added over 70 languages to Facebook. However, Facebook has had some bumps along the way as DropBox points out in their post talking about their experience in translating their products and services.
- Use a mechanism which shows all the available languages for content and highlights which ones are available to the user. Zotero has an implementation of this on their support forums.
- Basically create a subsite for each language and then only show which pages have content in that language. Wikipedia does this. Wikipedia has a menu on the left side with links to articles with this same title in other languages. Only languages which have an article started in them on that title are shown in the menu.
- Finally, create a cascading structure for each page or content area. So there is a primary language and a secondary language or a tertiary, or a quaternary language etc. based on the browser language of choice with country IP playing a secondary role. If there is no page for the primary language then the next in preference will show. This last option has been preferred by some because if an organization wants to present content to a user, then obviously, it would be in the users’ primary language. But if the content is not available in the primary language then the organization would want to still let the user know that the content exists in another language.
It would also be good to understand the concepts used in Drupal 7 (and Drupal 8) for multi-lingual content. There are several resources which I have found helpful:
- Localized and Multi-Lingual Content in Drupal 7
- Drupal 7’s new multilingual systems (part 4) – Node translation
- Drupal 7’s new multilingual systems compilation
- Drupal 8 Multilingual Initiative
It would appear that from this list of resources that Drupal’s default behavior is more in line with part two of the three examples given above.