I have a PDF that I would like to crop to text and then add consistent white space (margin). The PDF was generated by a Bookeye 4 scanner. Which exported the content straight to PDF. So, I am trying to do this with Adobe Acrobat 9.2. SIL Americas Area Publishing suggested that I use ScanTailor - An excellent program, but one which I find crashes on OS X.
Some days I am more clever than others. Today, I was working on digitizing about 50 older (30 years old) cassettes for a linguist. To organize the data I have need of creating a folder for each tape. Each folder needs to be sequentially numbered. It is a lot of tedious work - not something I enjoy.
So I looked up a few things in terminal to see if I could speed up the process. I needed to create a few folders so I looked up on hints MacWorld:
So I looked at the
mkdir command, which creates new folders or directories. It uses the following syntax:
mkdir folder1 folder2 folder3
Now I needed a list of the folders I needed... something like 50.
So I created a formula in a google spreadsheet using the Concatenate command. I was able in one column to add the Alpha characters I needed and in the next column I was able to add the sequential numerics I needed.
Now I had a list of 50 names of my folders, but I still needed to remove the return characters which separated them from each other to allow the
mkdir command to work. So I opened up TextEdit and did a search for return tabs in the document and deleted them.
Now I could just paste the 50 folder names in terminal and hit enter and it created 50 folders... But I wonder if there was a way to add sequential numbers to a base folder-name in terminal without using google spreadsheets...
Two times since the launch of the new SIL.org website colleagues of mine have contacted me about the new requirement on SIL.org to log-in before downloading content from the SIL Language and Culture Archive. Both know that I relate to the website implementation team. I feel as if they expect me to be able to speak into this situation (as if I even have this sort of power) - I only work with the team in a loose affiliation (from a different sub-group within SIL), I don't make design decisions, social impact decisions, or negotiate the politics of content distribution.
However, I think there are some real concerns by web-users users about being required to log-in prior to downloading, and some real considerations which are not being realized by web-users.
I want to reply to these concernes.
As linguistics and language documentation interface with digital humanities there has been a lot of effort to time-align texts and audio/video materials. At one level this is rather trivial to do and has the backing of comercial media processes like subtitles in movies. However, at another level this task is often done in XML for every project (digital corpus curation) slightly differently. At the macro-scale the argument is that if the annotation of the audio is in XML and someone wants to do something else with it, then they can just convert the XML to whatever schema they desire. This is true.
However, one antidotal point that I have not heard in discussion of time aligned texts is specifications for Audio Dominant Text vs. Text Dominant Audio. This may not initially seem very important, so let me explain what I mean.
I feel that in the language and culture documentation community that there is a tension between “documenting” and “globalizing”. In the sense that what we as digital natives and cultural technologists think is “living” is in part “documenting”.
Now, in some sense “Language Documentation” is an academic pursuit of its own right independent of linguistics if it has a plan and tries to capture elements of the expression of the culture and language as it is spoken or acted out. I think there is a bit of confusion in the literature as linguists move from linguistics to language development and community development. This is particularly evident with the use of video in language documentation. Continue reading
This the start of a cross-language archive look at the current state of UX design presenting Content generated in Language Documentation.