xhtml elements - <dt> Definition Term
- <dl> Definition List
- <dd> Definition
xhtml elements In the course of my experience I have been asked about PDFs and OCR several times. The questions usually follow the main two questions of this post.
In particular an image based PDF, is it searchable?
The Short answer is Yes. Adobe Acrobat Pro has an OCR function built in. And to the second part: No, an image is not searchable. But what can happen is that Adobe Acrobat Pro can perform an OCR function to an image such as a .tiff file and then add a layer of text, (the out put of the OCR process) behind the image. Then when the PDF is searched it actually searches the text layer which is behind the image and tries to find the match. The OCR process is usually between 80-90% accurate on texts in english. This is usually good enough for finding words or partial words.
The Data Conversion Laboratory has a really nice and detailed write up on the process of converting from images to text with Adobe Acrobat Pro.
Daily Designer has a tutorial on how to do it on OS X.
David R. Mankin explains on his blog what the process looks like using Windows.
One of the beauties of Adobe Acrobat Pro is that this process can be scripted and the TIFFs processed in batches.
[On Windows] :: [On OS X using AppleScript] :: [Cross platform help from Adobe]
University Illinois Chicago explains how to do use Adobe Acrobat Pro and OCR with a scanner using a TWAIN driver.
Since I work in an industry where we are dealing with multiple languages and the need to professionally OCR thousands of documents I thought I would provide a few links on the comparison of OCR software on the market.
Lifehacker has short write up of the top five OCR tools.
Of those top 5, in this article, two, ABBYY Fine Reader and Adobe Acrobat are compared side by side on both OS X and Windows.
One thing to remember, Which I have said before, is that not all PDFs are created equal. This Manual talks a bit about different settings inside of PDFs when using Adobe’s PDF printer.
The Short answer is No. But the long answer is Yes. Depending on the settings of the PDF creator the original files might be altered before they are wrapped in a PDF wrapper.
So the objection, usually in the form of a question sometimes comes up:
Is the PDF file just using the PDF framework as a wrapper around the original content? Therefore, to archive things “properly” do I still need to keep the .tiff images if they are included in the PDF document?
The answer is: “it depends”. It depends on several things, one of which is, what program created the PDF and how it created the PDF. – Did it send the document through PostScript first? Another thing that it depends on is what else might one want to do with the .tiff files?
In an archiving mentality, the real question is: “Should the .tiff files also be saved?” The best practice answer is Yes. The reason is that the PDF is viewed as a presentation version and the .tiff files are views as the digital “originals”.
This past weekend I had the opportunity to play Agricola for the first time. It is an interesting game. I have played it a few times now. Twice with 4 people and twice with 2 people. It takes a while to wrap one’s head around the game play. But once I got it I had to evaluate it. The verdict is in. I would rather play Ticket to Ride than Agricola. Agricola is a great game, don’t get me wrong, it has many intricate plays, and a lot of variety. It is completly a different game as a two player game than it is as a four player game. – it is just that it takes too long to put the pieces back in the box.
Seriously though for the brain strain that it causes (and I enjoy brain strain) I would rather play Ticket to Ride.
Linked data is an effort to enhance applications and thereby lives with structured knowledge. This structure at its core is developed by human interaction. The challenge to consumers of linked data is to convince holders of unstructured data to structure it into actionable, manipulatable knowledge. Continue reading
The company I work for has an archive for many kinds of materials. In recent times this company has moved to start a digital repository using DSpace. To facilitate contributions to the repository the company has built an Adobe AIR app which allows for the uploading of metadata to the metadata elements of DSpace as well as the attachement of the digital item to the proper bitstream. Totally Awesome.
However, one of the challenges is that just because the metadata is curated, collected and properly filed, it does not mean that the metadata is embedded in the digital items uploaded to the repository. PDFs are still being uploaded with the PDF’s author attribute set to Microsoft-WordMore about the metadata attributes of PDF/A can be read about on pdfa.org. Not only is the correct metadata and the wrong metadata in the same place at the same time (and being uploaded at the same time) later, when a consumer of the digital file downloads the file, only the wrong metadata will travel with the file. This is not just happening with PDFs but also with .mp3, .wav, .docx, .mov, .jpg and a slew of other file types. This saga of bad metadata in PDFs has been recognized since at least 2004 by James Howison & Abby Goodrum. 2004. Why can’t I manage academic papers like MP3s? The evolution and intent of Metadata standards.
So, today I was looking around to see if Adobe AIR can indeed use some of the available tools to propagate the correct metadata in the files before upload so that when the files arrive in DSpace that they will have the correct metadata.
One way around the limitations of JavaScript itself might be to use JavaScript to call a command-line tool or execute a python, perl, or shell script, or even use a library. There are some technical challenges which need bridged when using these kinds of tools in a cross-platform environment. (Anything from flavors of Linux to, OS X 10.4-10.7 and Windows XP – Current.) This is mostly because of the various ways of implementing scripts on differnt platforms.
The technical challenge is that Adobe AIR is basically a JavaScript environment. As such there are certain technical challenges around implementation of command-line tools like Xpdf from fooLabs and Coherent PDF Tools or Phil Harvey’s ExifTool, Exifv2, pdftk, or even TagLib. One of the things that Adobe AIR can do is call an executable via something called actionscript. There are even examples of how to do this with PDF Metadata. This method uses PurePDF, a complete actionscript PDF library. Actionscript is powerful in and of itself, it can be used to call the XMP metadata of a PDF, Though one could use it to call on Java to do the same “work”.
I was rather shocked when I heard about the Riots in the UK. The UK‽ Really‽ That’s different. Kind of disturbing really.Jamie Oliver’s Restaurant was burnt. Why? This is perhaps the saddest news out of England in the last 48 hours. (Not that the BBC’s Photographs don’t make the situation look like a scene from V for Vendetta.)
I got to thinking about some of the implications of the reporting coming out of the BBC.
Officers believe some rioters have used BlackBerry Messenger – a service allowing users to send free real-time messages – to organise violence.I often see the words gang and masked youths used in reporting. One must keep in mind that gangs in the UK do not mean the Crips and Bloods, rather a more generic idea of swarm of people. Also keep in mind that the UK has more street video cameras and video surveillance than any other country, so wearing a mask is logical for this kind of activity.
And I was told that there was some use of social media platforms like FaceBook and Twitter. In particular, Facebook groups. Allegedly, locals on Facebook were reporting the groups as fast as they were being created. An interesting social dynamic as opposing social forces collide. I got to thinking about these alleged locals using social media and wondered: What if someone who was not in the UK was organizing or creating the Facebook groups.
That would be something. It would be a way that sympathizers who were not local or in the UK could lend a hand to people on the ground. This lead me to think about how would law enforcement react to this sort of help to the rioters.
Law in the U.S.A. is different than in the U.K. But none the less if it were in the U.S. and people were to lend a hand then those lending a hand might be considered to be Terrorist. Using the definition preferred by the state department, terrorism is: “Premeditated, politically motivated violence perpetrated against noncombatant* targets by subnational groups or clandestine agents, usually intended to influence an audience.” [Quote taken from The definition of terrorism]It seems though that this definition agrees with what I read in the Patriot Act some years ago. It also agrees with Wikipedia and About.com. One thing that I see in the current news is that there is no political motivation. So some might say that this is not terrorism because it lacks motive.
But if it were terrorism, and the individuals were known through FaceBook, would the UK ask for their extradition to the UK? – to put them on trial as terrorists? Would UK law apply outside of the UK? If the person is creating a FaceBook group and is doing from outside of the UK and several blokes join a group does that make the creator of the group a terrorist? and also guilty of breaking UK law? Can someone break another country’s law without leaving their current country – even country of citizenship? What gives the UK legal rights prosecute?
The other thing which is disturbing in this social media business is the global social reaction to the difference between what is happening in the UK right now and what happened in Libya, Morocco, Egypt, and Syria earlier this year. Social media was part of that unrest as well, however, political motivation was definitely part of that. Yet the rioting was deemed to be for a “worthy” cause. Yet if the same actions were taken in the U.S. we would call it Terrorism. I think The Guardian really points out some of the idiosyncrasy of the U.S. definition and the application of the term Terrorism as it relates to crime.
This past weekend I walked across the street from our house with Becky and went to an estate sale. In one of the rooms there was a stack of electronics. There on a dresser was a Reel-to-Reel machine in almost perfect condition.
Track system: 4 track, 2 channel stereo/monaural system
Maximum reel capacity: 7″ reel
Wow and flutter: 0.08% at 7.5 ips
Frequency response: 30Hz to 24kHz at 7.5ips
Distortion: less than 1.5%
Signal to noise ratio: >50dB
Heads: 3 heads
Motors: 3 motors
Dimensions: 430 x 425 x 230mm
Weight: 19kg
One of my concerns is with the 30Hz playback.
Was this standard for reel-to-reels or was there better recordings done but I will never know because of the model of playback machine I have?
Another concern I have is about the heads:
If they are magnetized will they erase my tape? How do I tell if they are magnetized?
Here is a listing of Reel-to-Reel machines by AKAI.
Here is a youtube video of the same model I have.
I found this image which I think explains many of the kinds of things that Website builders need to think through. For some in the industry this is like.. Duh! but for others this kind of layout really helps us see the complexity and the parts we need to be thinking through to implement the website.
Who would have thunk I that I would do a post on cars? Mostly, I avoid car discussions. I have avoided these kinds of discussions since high-school, when all my friends raved about NASCAR and North American muscle cars, I have regularly reminded them that the best cars in the world prove their excellence in a race by negotiating turns in multiple directions, as well as negotiating changes in track elevation. By-the-way, RACECAR is a palindrome.
This posture has landed me quite a bit of criticism over the years. Aside from this, my practical side has suggested that a good car is one that allows one to get from ‘point A’ to ‘point B’.
My first car was a Renault, followed by a 1983 Honda Prelude, which in turn was followed by 1996 Honda Accord LX. I currently drive a 2005 Kia Spectra, which my wife bought before we were married.
But recently Becky and I have been looking at cars, not to buy one, just to look at them… and so I have been looking at some cars with more of an artistic eye rather than a functional or economical (esp. fuel efficiency) eye.
Several cars have caught my eye. I have long been a fan of the Porsche 911 and Porsche Boxter. But here are some other cars that look rather appealing. Most of these cars I have never even ridden in. Concerning color, I really like yellow. However, Bi-tonal color schemes like is common with the Toyota FJ and the Mini Cooper are also really cool. Of course Blue and white, Blue and golden (yellow), or Black and White all go nicely together.
The Union Jack looks really cool on either the roof or the hood of this car. But being that my family color is blue, I thought that perhaps a Scottish flag might fit better.
I always find the Dodge Challenger an agressive and smart looking car.
The Subaru Baja is also a nice car.
Go to start of metadata
Matching IPTC to Dublin Core:
http://metadatadeluxe.pbworks.com/w/page/25784393/W3C,-IPTC,-Dublin-Core,-and-Adobe
© 2005-2026 Hugh Paterson III All Rights Reserved.
By submitting a comment here you grant this site a perpetual license to reproduce your Words, Name & Website URL in attribution.
Details of your viewing experience maybe retained and used. -- Copyright notice by Blog Copyright