Working in an archive, one can imagine that letting go of materials is a real challenge. Both in that it is hard to do becasue of policy, but also because it is hard to do because of the emotional “pack-rat” nature of archivist. This is no less the case of the archive where I work. We were recently working through a set of items and getting rid of the duplicates. (Physical space has its price; and the work should soon be available via JASOR.) However, one of the items we were getting rid of was a journal issue on a people group/language. The journal has three articles, of these, only one of them article was written by someone who worked for the same organization I am working for now. So the “employer” and owner-operator of the archive only has rights to one of the three works. (Rights by virtue of “work-for-hire” laws.) We have the the off-print, which is what we have rights to share, so we keep and share that. It all makes sense. However, what we keep is catalogued and inventoried. Our catalogue is shared with the world via OLAC. With this tool someone can search for a resource on a language, by language. It occurs to me that the other two articles on this people group/language will not show in the aggregation of results of OLAC. This is a shame as it would be really helpful in many ways. I wish there was a groundswell, open source, grassroots web facilitated effort where various researchers can go and put metadata (citations) of articles and then they would be added to the OLAC search.
MiniCard Review
I have been looking at the WP Theme, MiniCard. It is really cool. The design follows a Tim van Damme style layout.
I have been playing around with Minicard for some time. I have used it as my splash page for about a year. (I have been using K2 since 2005 so any change in theme is a big step). There are some things I really like and some things I think could be improved upon. (Granted I am using and looking at the free version.) I really like the minimalist business card design. However, one of the things that I find difficult is separating what is too much info from what is just enough. Right now I have quite a few social networks loaded on my front page so, even though it is Minimalist, it is almost not business card.Most of my suggestions have to do with the options page, but a few have to do with layout.
Options:
- A place to store a Child Theme.
K2 has a really cool way of selecting where the author wants to store their child theme so that when the theme is upgraded the child theme is not written over. Because the Code is GPL’d, I think this code could be copied from K2 into the GPL’d version of MiniCard. Being that the whole Tim Van Damme (TVD) idea is to be unique with style, it seems that Minicard would benefit from embracing child themes by providing a user the option to not just use a child theme, but also facilitate where to store that child theme. - More Networks.
It seems that it would be really easy for users of MiniCard to use more or some Social networks which are not on the list provided on the options page. I think it is crazy for any user to expect a developer to have anticipated all the possible social networks out there. I went through TVD’s wall of fame just to get some inspiration and noticed a few networks that minicard does not offer out of the box:- iusethis.com
- ffffound.com
- vi.sualize.us
- corkd.com
- wikipedia.org
- www.colourlovers.com
- soundcloud.com
- filmreviewfriday.com
- github.com
- pandora.com
- themeforest.net
In one of the past revisions to the Minicard theme there was released, an easy way to add a custom social network. This is much improved over earlier versions of this theme. (I think this is still the case in the current 2011 release.)
The Plug-in Find Me On has an interesting interface for adding new network. It is sort of Drag and Drop. I use this plugin on hugh.thejourneyler.org.
- A Contact Info page separate from my social networks page.
One thing that might be helpful too is separating Messaging and Contact from social networks. “Messaging and Contact info” is usually treated differently from “social networks”. That is social network info like skype, aol, google chat, IRC, etc. are not really conceptualized in the minds of the people on the TVD wall of fame as “social networks”. If this information is provided then it most often falls under the “contact” section rather than “my social networks” section. Out of the box MiniCard does not have a contact section, so I can understand how this info is lumped together with the users social networks.Perhaps one solution to this is add an optional (included in the theme by default but not active by default) template page that could be added to MiniCard for contact info and pull data from the hCard data as well.
This contact info page might also display Online Status of Messaging information. One caveat suggestion pertains to aim v.s. iChat. That is the syntax to open these protocols is a little different if the website admin wants ichat to open… it just aint going to work on a windows machine… I am wondering if a little javascript magic might be able to sniff out a OS X machine visiting the site as apposed to a Windows OS and put in the proper syntax for opening up iChat.
Interesting enough Themeforest had a theme much like MiniCard.
There are more color options “out of the Box” on the pro version of the theme. However, the color options are not as user selectable as they could be. I have seen color wheels and a palates for suggesting associated colors as option panels for selecting css values. A color selector for the background, and the various parts of the theme would be nice.
MiniCard does support hCard, but as I was looking over the format of hCard I think that more can be embedded in hCard content than what MiniCard allows for out of the box. That is I think that MiniCard could be improved with more fields in the admin section for the site admin to input their data. There is an hCard creator on the Microformats website. It shows the supported values in the hCard speck.
Importing Youtube Comments to WordPress
When I uploaded my first YouTube video, I got some comments and I wanted to reflect them on my blog where I was also displaying my video. Traffic to my blog is important as well as is the permanent record in my database of these comments. However, I need a two way solution. If someone comments on the video on my blog I want those to appear on my YouTube account. I need to Sync comments between my blog and Youtube.
The Genki YouTube Comments plug-in works well (I use it on my site now.) for pulling comments from YouTube to a WordPress blog:
http://wordpress.org/extend/plugins/genki-youtube-comments/
However, this is only half the sync. I have yet to find a solution for when someone comments on a WordPress blog that the comment is then sync’d over to YouTube. If someone knows a solution for this then please share.
Dictionary term markup on Wiktionary
xhtml elements - <dt> Definition Term
- <dl> Definition List
- <dd> Definition
TIFFs, PDFs and OCR
In the course of my experience I have been asked about PDFs and OCR several times. The questions usually follow the main two questions of this post.
So is OCR built into PDFs? or is there a need for independent OCR?
In particular an image based PDF, is it searchable?
The Short answer is Yes. Adobe Acrobat Pro has an OCR function built in. And to the second part: No, an image is not searchable. But what can happen is that Adobe Acrobat Pro can perform an OCR function to an image such as a .tiff file and then add a layer of text, (the out put of the OCR process) behind the image. Then when the PDF is searched it actually searches the text layer which is behind the image and tries to find the match. The OCR process is usually between 80-90% accurate on texts in english. This is usually good enough for finding words or partial words.
The Data Conversion Laboratory has a really nice and detailed write up on the process of converting from images to text with Adobe Acrobat Pro.
Daily Designer has a tutorial on how to do it on OS X.
David R. Mankin explains on his blog what the process looks like using Windows.
One of the beauties of Adobe Acrobat Pro is that this process can be scripted and the TIFFs processed in batches.
[On Windows] :: [On OS X using AppleScript] :: [Cross platform help from Adobe]
University Illinois Chicago explains how to do use Adobe Acrobat Pro and OCR with a scanner using a TWAIN driver.
The better OCR option
Since I work in an industry where we are dealing with multiple languages and the need to professionally OCR thousands of documents I thought I would provide a few links on the comparison of OCR software on the market.
Lifehacker has short write up of the top five OCR tools.
Of those top 5, in this article, two, ABBYY Fine Reader and Adobe Acrobat are compared side by side on both OS X and Windows.
Are all files used to create an orignal PDF included in the PDF?
One thing to remember, Which I have said before, is that not all PDFs are created equal. This Manual talks a bit about different settings inside of PDFs when using Adobe’s PDF printer.
The Short answer is No. But the long answer is Yes. Depending on the settings of the PDF creator the original files might be altered before they are wrapped in a PDF wrapper.
So the objection, usually in the form of a question sometimes comes up:
Is the PDF file just using the PDF framework as a wrapper around the original content? Therefore, to archive things “properly” do I still need to keep the .tiff images if they are included in the PDF document?
The answer is: “it depends”. It depends on several things, one of which is, what program created the PDF and how it created the PDF. – Did it send the document through PostScript first? Another thing that it depends on is what else might one want to do with the .tiff files?
In an archiving mentality, the real question is: “Should the .tiff files also be saved?” The best practice answer is Yes. The reason is that the PDF is viewed as a presentation version and the .tiff files are views as the digital “originals”.
Agricola
This past weekend I had the opportunity to play Agricola for the first time. It is an interesting game. I have played it a few times now. Twice with 4 people and twice with 2 people. It takes a while to wrap one’s head around the game play. But once I got it I had to evaluate it. The verdict is in. I would rather play Ticket to Ride than Agricola. Agricola is a great game, don’t get me wrong, it has many intricate plays, and a lot of variety. It is completly a different game as a two player game than it is as a four player game. – it is just that it takes too long to put the pieces back in the box.
Seriously though for the brain strain that it causes (and I enjoy brain strain) I would rather play Ticket to Ride.
Interoperability of online dictionary data: A test case using WordPress as a CMS
Linked data is an effort to enhance applications and thereby lives with structured knowledge. This structure at its core is developed by human interaction. The challenge to consumers of linked data is to convince holders of unstructured data to structure it into actionable, manipulatable knowledge. Continue reading
Metadata Magic
The company I work for has an archive for many kinds of materials. In recent times this company has moved to start a digital repository using DSpace. To facilitate contributions to the repository the company has built an Adobe AIR app which allows for the uploading of metadata to the metadata elements of DSpace as well as the attachement of the digital item to the proper bitstream. Totally Awesome.
However, one of the challenges is that just because the metadata is curated, collected and properly filed, it does not mean that the metadata is embedded in the digital items uploaded to the repository. PDFs are still being uploaded with the PDF’s author attribute set to Microsoft-WordMore about the metadata attributes of PDF/A can be read about on pdfa.org. Not only is the correct metadata and the wrong metadata in the same place at the same time (and being uploaded at the same time) later, when a consumer of the digital file downloads the file, only the wrong metadata will travel with the file. This is not just happening with PDFs but also with .mp3, .wav, .docx, .mov, .jpg and a slew of other file types. This saga of bad metadata in PDFs has been recognized since at least 2004 by James Howison & Abby Goodrum. 2004. Why can’t I manage academic papers like MP3s? The evolution and intent of Metadata standards.
So, today I was looking around to see if Adobe AIR can indeed use some of the available tools to propagate the correct metadata in the files before upload so that when the files arrive in DSpace that they will have the correct metadata.
- The first step is to retrieve metadata from files. It seems that Adobe AIR can do this with PDFs. (One would hope so as they are both brain children of the geeks at Adobe.) However, what is needed in this particular set up is a two way street with a check in between. We would need to overwrite what was there with the data we want there.
- However, as of 2009, there were no tools in AIR which could manipulate exif Data (for photos).
- But it does look like the situation is more hopeful for working with audio metadata.
One way around the limitations of JavaScript itself might be to use JavaScript to call a command-line tool or execute a python, perl, or shell script, or even use a library. There are some technical challenges which need bridged when using these kinds of tools in a cross-platform environment. (Anything from flavors of Linux to, OS X 10.4-10.7 and Windows XP – Current.) This is mostly because of the various ways of implementing scripts on differnt platforms.
The technical challenge is that Adobe AIR is basically a JavaScript environment. As such there are certain technical challenges around implementation of command-line tools like Xpdf from fooLabs and Coherent PDF Tools or Phil Harvey’s ExifTool, Exifv2, pdftk, or even TagLib. One of the things that Adobe AIR can do is call an executable via something called actionscript. There are even examples of how to do this with PDF Metadata. This method uses PurePDF, a complete actionscript PDF library. Actionscript is powerful in and of itself, it can be used to call the XMP metadata of a PDF, Though one could use it to call on Java to do the same “work”.
Three Lingering Thoughts
- Even if the Resource and Metadata Packager has the abilities to embed the metadata in the files themselves, it does not mean that the submitters would know about how to use them or why to use them. This is not, however, a valid reason to not include functionality in a development project. All marketing aside, an archive does have a responsibility to consumers of the digital content, that the content will be functional. Part of today’s “functional” is the interoperability of metadata. Consumers do appreciate – even expect – that the metadata will be interoperable. The extra effort taken on the submitting end of the process, pays dividends as consumers use the files with programs like Picasa, iPhoto, PhotoShop, iTunes, Mendeley, Papers, etc.
- Another thought that comes to mind is that When one is dealing with large files (over 1 GB) It occurs to me that there is a reason for making a “preview” version of a couple of MB. That is if I have a 2 GB audio file, why not make 4 MB .mp3 file for rapid assessment of the file to see if it is worth downloading the .wav file. It seems that a metadata packager could also create a presentation file on the fly too. This is no-less true with photos or images. If a command-line tool could be used like imagemagick, that would be awesome.
- This problem has been addressed in the open source library science world. In fact a nice piece of software does live out there. It is called the Metadata Extraction Tool. It is not an end-all for all of this archive’s needs but it is a solution for some needs of this type.
The British Kerfuffle
I was rather shocked when I heard about the Riots in the UK. The UK‽ Really‽ That’s different. Kind of disturbing really.Jamie Oliver’s Restaurant was burnt. Why? This is perhaps the saddest news out of England in the last 48 hours. (Not that the BBC’s Photographs don’t make the situation look like a scene from V for Vendetta.)
I got to thinking about some of the implications of the reporting coming out of the BBC.
Officers believe some rioters have used BlackBerry Messenger – a service allowing users to send free real-time messages – to organise violence.I often see the words gang and masked youths used in reporting. One must keep in mind that gangs in the UK do not mean the Crips and Bloods, rather a more generic idea of swarm of people. Also keep in mind that the UK has more street video cameras and video surveillance than any other country, so wearing a mask is logical for this kind of activity.
And I was told that there was some use of social media platforms like FaceBook and Twitter. In particular, Facebook groups. Allegedly, locals on Facebook were reporting the groups as fast as they were being created. An interesting social dynamic as opposing social forces collide. I got to thinking about these alleged locals using social media and wondered: What if someone who was not in the UK was organizing or creating the Facebook groups.
That would be something. It would be a way that sympathizers who were not local or in the UK could lend a hand to people on the ground. This lead me to think about how would law enforcement react to this sort of help to the rioters.
Law in the U.S.A. is different than in the U.K. But none the less if it were in the U.S. and people were to lend a hand then those lending a hand might be considered to be Terrorist. Using the definition preferred by the state department, terrorism is: “Premeditated, politically motivated violence perpetrated against noncombatant* targets by subnational groups or clandestine agents, usually intended to influence an audience.” [Quote taken from The definition of terrorism]It seems though that this definition agrees with what I read in the Patriot Act some years ago. It also agrees with Wikipedia and About.com. One thing that I see in the current news is that there is no political motivation. So some might say that this is not terrorism because it lacks motive.
But if it were terrorism, and the individuals were known through FaceBook, would the UK ask for their extradition to the UK? – to put them on trial as terrorists? Would UK law apply outside of the UK? If the person is creating a FaceBook group and is doing from outside of the UK and several blokes join a group does that make the creator of the group a terrorist? and also guilty of breaking UK law? Can someone break another country’s law without leaving their current country – even country of citizenship? What gives the UK legal rights prosecute?
The other thing which is disturbing in this social media business is the global social reaction to the difference between what is happening in the UK right now and what happened in Libya, Morocco, Egypt, and Syria earlier this year. Social media was part of that unrest as well, however, political motivation was definitely part of that. Yet the rioting was deemed to be for a “worthy” cause. Yet if the same actions were taken in the U.S. we would call it Terrorism. I think The Guardian really points out some of the idiosyncrasy of the U.S. definition and the application of the term Terrorism as it relates to crime.
AKAI GX-220D
This past weekend I walked across the street from our house with Becky and went to an estate sale. In one of the rooms there was a stack of electronics. There on a dresser was a Reel-to-Reel machine in almost perfect condition.
Specifications
Track system: 4 track, 2 channel stereo/monaural system
Maximum reel capacity: 7″ reel
Wow and flutter: 0.08% at 7.5 ips
Frequency response: 30Hz to 24kHz at 7.5ips
Distortion: less than 1.5%
Signal to noise ratio: >50dB
Heads: 3 heads
Motors: 3 motors
Dimensions: 430 x 425 x 230mm
Weight: 19kg
One of my concerns is with the 30Hz playback.
Was this standard for reel-to-reels or was there better recordings done but I will never know because of the model of playback machine I have?
Another concern I have is about the heads:
If they are magnetized will they erase my tape? How do I tell if they are magnetized?
Here is a listing of Reel-to-Reel machines by AKAI.
Here is a youtube video of the same model I have.





