In a recent (2010-2011) Language Documentation Project we decided to also collect GIS data (GPS Coordinates), about our consultants (place of origin and place of current dwelling), about our recording locations and for Geo-tagging Photos. We used a Garmin eTrex Venture HC to collect the data and then we compared this data with GIS information from Google maps and the National GIS information service. This write up and evaluation of the Garmin eTrex Venture HC is based on this experience.
Continue reading
Category Archives: OS X
OS X Error -36
I had an OS failure while I was in Mexico. I managed to reinstall the combo update and things started working again. However, some of my big files (movies) will not copy, Time Machine fails, Some PDFs are now failing to copy. It always comes back to a -36 Error. I cannot find the error report for this online. It seems to be some sort of I/O error. I left a comment over on this blog.
prompt%>
mv /Users/phil/Desktop/movie.avi
.
I tried the command line mv command and the command line told me Input/Output error.
But I can play the file on my harddrive. – I have the same error if I try to copy the file to somewhere else on my internal harddrive.Using OS X 10.6.5 on MBP 15″ moving the file to a WD My Passport via USB. The file in question is a .mts movie file. I can move other .mts movie files which were made with the same camera, at the same time, and are in the same folder to the external hardrive.
I can’t figure out why I have this error or how to solve it.
OS X login return to login screen
Last night at about 9 pm Camino Crashed… and the whole house came down with it.
Camino had been asking me to update it for several days. I was running at 2.0.0.6 and it wanted to update to version 2.0.0.7. Well, this crash, lead another app to crash, very unusual, considering the isolating nature of apps on OS X (I am running OS X 10.6.5). So I decided to reboot the OS. This lead to OS X booting and getting to the sequence of initializing the mouse, the cursor could be seen. However, instead of proceeding the screen would switch between blue and a lighter blue (this is the blue after the gray screen in the OX boot sequence). Not cool. So I reset the pram.
Hold down ⌘+⌥+P+R during start-up and wait till the second chime is heard. Then let go.
Still no progress. So I booted up in safe mode and went to disk utility to repair permissions. This is where I found out that several of the OS X language packs and Java had permission errors. So I repaired those permissions. Still no joy.
So I looked for an alternative and found out about booting OS X in single user mode.
Hold down ⌘+S during start-up.
Then I followed the instructions on running FSCK. I tried this several times and had no joy.
So I then tried to boot to Target disk mode.
Hold down the T key during boot.
It booted to target disk mode. But then would not appear as an external disk on any Mac I plugged it into. I tried three different Macs.
The I tried to boot from a retail OS X Snow Leopard install CD/DVD.
Hold down the C key during boot.
Can not see the disk.
Then I tried to select the boot disk by booting to an option menu.
Hold down the ⌥ key during boot.
I can see the Install disk in the options but I select the option and the machine freezes.
I then called Apple Care. I got on Skype and called the 1-800 Apple Care number from Mexico. And told them the problem and then told them what I had done and asked if an archive and install but reinstalling OS X 10.5 and then up grading to 10.6 would work. They said that was not the recommended way but might be a possibility. I asked them if there was anything more that I could do. They said, no you have tried everything that we would have suggested over the phone. Knowing what you know you might have worked for Apple before. They then asked me if I wanted to schedule an appointment with a genius. I replied that I was in Mexico and I would not be back in the States for another month.
So I decided to try and get a Time Machine copy of my hard drive and repair the hard drive. No luck, Time Machine stops with a corrupted file. Then I decided to use Carbon Copy Cloner to copy the drive and fix the cloned drive. I get to a corrupt file and it stops too. But at least it says that the file /Applications/Camino.app/Contents/MacOS/libssl3.dylib was corrupt. Someone else has had this problem too. I downloaded and reinstalled the current version of Camino. I reboot and it gets past the Blue and Gray screen to the login options. I log in an it starts the startup items but then kicks me out to the login screen again. Better but no fix.
I am still struggling with this. I found this on the Apple forums and have cleared the Cashe folder indicated and have removed everything from the startup menu. Still no joy. This was in response to the following thread on Apple’s forums: After 10.6.5 cannot login or boot from DVD or clone .
A friend loaned me a USB hardrive and installed OS X to the harddrive, and made it a bootable disk. It boots just fine. So it does not appear to be a hardware issue. However, when I tried to use migration assistant to pull my files over to the external drive, but migration assistant didn’t make progress after about 10 hours.
We have yet to see what the solution will be… I also am looking at MySQLCOM. So it appears that some start up items’ permissions got mixed up. This post explains how to reset the permissions to root.
I got my computer working again by re-installing OS X 10.6.5 Combo installer. And resting the permissions for the Startup Items via the terminal.
Finding that Apple command symbol
I have always wanted to be able to type the ⌘ symbol for various reasons, including writing tutorials, but I have not know how to access it through my keyboard. A few, general, related notes:
- There is a nice wright up including some history on the Command Key, ⌘ on wikipedia.
- How Apple Keyboards Lost a Logo and Windows PCs Gained One
- PopChar is an application which helps users find obscure characters.
This functionality is built in to OS X with Character Viewer, though it is likely that PopChar extends the user experience in some way.
- This discussion on the Apple Forums talks about a way to put these symbols in Pages’ auto correction so that Pages will auto correct a set of characters typed to the symbol desired. I have seen this used in MS Word too.
- A table of Unicode characters corresponding to Macintosh keyboard symbols, as they commonly appear in menus.
- Special Key Symbols
- Apple Keyboard Symbols
- Multi-stroke Key Bindings
- Keystroke mapping explained by SIL’s NRSI.
The Next two Links are more detailed but like the above.
Marginally relevant:
It is unicode point 2318 (the html hex code is ⌘ ) and so you can find it in the character palette under:
- Code Tables>Unicode>2300>2318
- All Characters>Symbols>Technical Symbols
or you can go into
.
On OS X, if you switch your keyboard to Unicode Hex Input, then holding down opt allows you to type the four digits for a unicode symbol and get the ⌘ (2318).
The Alt/Option Symbol has also been elusive. It can be fount at Unicode point 2325. U+2325.
Unicode and Hex Keyboard symbols
⌘ – ⌘ – ⌘ – the Command Key symbol
⌥ – ⌥ – ⌥ – the Option Key symbol
⇧ – ⇧ – ⇧ – the Shift Key (really just an outline up-arrow, not Mac-specific)
⇥ – ⇥ – ⇥ – the Tab Key symbol
⏎ – ⏎ – ⏎ – the Return Key symbol
⌫ – ⌫ – ⌫ – the Delete Key symbol
Network Needs for Poly-lingual Language Documentation Project
The diagram above roughly illustrates our network setup. This set-up might be typologically rare in terms of language documentation field stations for several reasons. But we had reasonable power (both in quality and quantity), though there were some power outages. And we had high-speed internet.
In terms of network set up there was the need for an internet direct out, so that we could have a team network, and then a separate network for language consultants, who would bring their own computers to have a “drop box with us”. To fill this need we could open our network to each of the consultants or we could use an outside service like Dropbox. – I am not sure why we did not use DropBox. Eventually we did use google spread sheets for collection word frames. Our consultants might have been atypical in that they also had their own computers and had some familiarity with computer use.
Single FLEx Datastore for all languages
MicrosoftSQL Server for running FLEx on the Network. This is achieved through running XP in a virtual machine via Virtualbox on the OSX Server. We have multi-able entry points of data to the “FLEx System”. We also did not completely solve the network access to the data bases. That is one person could access the database at a time with write access. Since this project the current version of FLEx has moved from a MicrosoftSQL Server Backend to an XML backend. But perhaps what would have been better was to use FLExBridge or LiftBridge.
Server and data store Backup
Best practice for backup calls for a three way backup plan.
- An onsite backup.
- An “across town” backup. Where a (at least weekly) backup is held by a friend or colleague across town.
- And an out of country back-up.
This three way backup is to:
- Protect from mistakes or equipment failure.
- Protect from theft.
- Protect from catastrophic events.
Our onsite backup was handled by Time Machine.
We would switch out our Backup drive every week and give it to a colleague across town.
We attempted to use KKoncepts for our offsite backup. (KKoncepts did not work out because it was based on a simple rsync script and every time we tried to re-organize folders in our corpus it would try and re-sync all of the Gigabytes of data which lived under the folders.) The DropBox service is much more efficient and looks at the block level (inside the file) and only updates things that have changed. It then looks at the tree structure and mirrors what is currently on the clients computer, rather than re-uploading the content.
Not yet well defined are the network settings needed to run WindowsXP in the virtual machine, OS X, and Windows 7, establish a DNS server with AirPort Extreme.Note: Although the title/URL says “Multi-lingual” this is to be understood that multiple languages are being documented. The term poly-lingual also fits this particular project because the language of communication and authorship was Spanish, yet many of the network issues were resolved in English.
Chronicles of DNS & Sub-Domains on OSX Server
The Intended Setup:
We want to be able to run the OSX provided Wiki, Calendar and Blog features of the WebService. In addition we want to also run Mercurial (http://mercurial.selenic.com/) and RefBase (http://www.refbase.net/).
We want to run:
- OSX services at
mephaa.xyz
- Mercurial at:
hg.mephaa.xyz
- Refbase at:
ref.mephaa.xyz
These sites are for our work group only, they need not be accessible to the outside world. But if in the process we can make our setup of such a nature that an invited guest could collaborate with us on our project and view our workgroups’s collaboration area that would be ok. We will be using the MacPorts version/method of running mercurial.
Aside: Since I originally started out to resolve this challenge I have acquired mephaa.org
, as a real registered domain.
Network layout:
We have a dynamic IP from our neighbor’s router. (We share the line and they are up stream. That is just the way things work in this location in Mexico. They are in turn connected to the ISP.)
The connection from the neighbor is hardwired to the WAN port on the Airport Extreme. The Airport Extreme is using NAT & DHCP (see settings below). I have 9 machines connected to the Airport. One of which is the MacMini server. It is the only one that is hard wired to the Airport. The rest are laptops that connect wirelessly.
The MacMini is assigned a stable IP address by the Airport Extreme based on its MAC address. The IP address for the server behind the firewall: 10.0.1.5.
The Settings on the AirPort Extreme:
The Challenge: As I presented it and discussed it on apple’s forums, on Nov. 20th.
Status: (Nov 20th)
We do not have an outside domain name that we have purchased. We just are using the name of the computer as it was set up during the install of the OSX.
I have the Wiki, Calendar and Blog features running at
macminimarlett.local
.
I can typemacminimarlett.local
in any web browser on the server side of the Airport Extreme and access the OSX provided WebServices (aka the wiki, blog and calendar.)I would like to make the mercurial repository available at:
hg.macminimarlett.local
I would like to make the refbase instance available at:ref.macminimarlett.local
These “additional” websites are hosted on the same machine as MacMiniMarlett.§1. So What must I do to get
hg.macminimarlett.local
to resolve at all to anything?
§2. So What must I do to gethg.macminimarlett.local
to resolve to my mercurial instance?Currently I can not get
hg.macminimarlett.local
to resolve at all. “Safari can not find the server”. But browsers do findmacminimarlett.local
.
This leads me to think that it is a problem with my OSX server settings not with my install of Mercurial.
Suggestions offered on the 20th:
- Do not use
.local
. - Do not use
.private
. - Change the domain to something other than the computer name.
- Computers on the LAN can find
macminimarlett.local
because of bonjour. Not because of any special DNS entry.
We dropped the .local
and the .private
and switched to mephaa.
instead of using macminimarlett.
. I left the macminimarlett.
zone in the DNS records just incase. This leads us to the server settings on Nov. 21st.
To this point I had been assuming that .private
in the DNS registry was being translated to .local
in the bowsers. This was an errant assumption.
Server Settings: Nov 21st
Suggestions received from John on Nov. 21st:
Your DNS is incorrect. Run the terminal command:
sudo changeip -checkhostname
You need to get this sorted out because it effects a lot services.Here’s the crash course version: the
.
(period) at the end of the domain name means it is a full qualified domain name (meaning that it is real domain that real people use, likegoogle.com
.) also the primary domain record should be like thismacminimarlett.com.
Ormacminimarlett.private.
Ormacminimartlett.local.
(beaware that Microsoft Server 2008+ is droping.local
support and you need a real domain name and public IP/dedicated IP – which means using.local
isn’t future proofing).One thing to know, the primary domain record doesn’t have to be a fully qualified domain, but it should be as everything is heading that way in the future.
At the moment your server is thinking the
macminimarlett.
Andmepaa.
Are the.com
part of the domain name.Yeah there will be a lot of confusion in the mepaa domain record as there isn’t any reverse mapping for it. And the cname record is at the
.com
level (layer 1) which won’t resolve very well for clients.Next, what is the forwarder settings set to? These should be set to the ISP DNS and then to the router (you can add as many DNS servers as you like for redundancy).
What is doing DHCP to the clients? What DNS are they getting? The clients need to know where your subdomains are in the network. For example if a pc is typing in
hg.macminimarlett
(which is a bad idea – it should behg.macminimarlett.private
or something like that) then the pc client checks the DNS server for which server (IP address) has the subdomainhg.macminimarlett
– but if the DNS server doesn’t have a record ofhg.macminimarlett
then the DNS server will reply with not a real address (because it doesn’t know who that is).Regards,
Nov. 22nd.
I now realize that the syntax of my DNS entries (when and only when I am not using a registered domain name) need to be:
- For a Zone: <
Some Name
>.<Something Unique
>. - For the domain root, which is an entry in the zone: <
Some Name
>.<Something Unique
>. - For a subdomain, which is also an entry in the zone: <
Some Name
>.<Some Name
>.<Something Unique
>.
Where the above corresponds to the following: (domain name level).(TDL level).
Where the above corresponds to the following: (domain name level).(TDL level).
Where the above corresponds to the following: (subdomain prefix).(domain name level).(TDL level).
My current entries in my DNS are not set up this way. I need to change them. Before I do that I should likely run the changeip -checkhostname
as suggested by John.
I ran sudo changeip -checkhostname
And this is what I got:
Now my question is: is this message saying I need to run this again? What am I to do with the results of the message from changeip
? I read the Manuals but that did not yield any profound insights.
- Mac OS X Server 10.4.6 or later: changeip now requires fully qualified domain names
- changeip(8) Mac OS X Server Manual Page
I added .private
to all the DNS records in the DNS service, in order to fix the syntax of the DNS records as indicated by John. After that I ran changeip
again. It now shows that there is nothing needing to be changed. I think this part is now resolved.
Now this is what the server settings are (Noon 22nd) :
Aside: I corrected a spelling error in the DNS Records where mephaa was mispelled as mepaa. All the records now read with the mephaa spelling, as indicated in the second picture.I got hg.macminimarlett.private
to resolve from the server to a test index.html
page on the server. But I could not get it to resolve from a client on the network.
- Is this because I have the wrong type of records?
- Is this because I am not passing the DNS records to where the clients are looking for the records?
- How do I pass these DNS entries to my clients?
- Is this something I have to enter in the Airport Extreme? If so which entries on which lines?
From Camalot via the Apple forum post:
A hostname is a record of a host within the domain. For example,
hg.macminimarlett.private
is the hostname for the hosthg
within themacminimarlett.private
domain.
I don’t see anything in the Server Admin titled “hostname”… There is one thing under the Primary zone that says “hostname” but what should this be set to? the IP of the computer on this LAN?
Server Admin doesn’t know what additional hostname you want for your domain. It’s up to you to create them. You create additional records (either ‘A’ records (Alias) for physical machines, or ‘CNAME’ records for additional hostnames that you want to map to an existing machine.
§7. Ok so in what manner do I add hg.macminimarlett.private.
to that zone? Do I add it as a CNAME, as a secondary zone, as a Machine (A) recored?
In this case it sounds like you want to add three records to your existing zone.
One A record for your server (call it whatever you want, but
server.macminimarlett.private
seems to make sense). Give this the IP address of your server.
Two CNAME records – one forhg
and one forref
that both point toserver.macminimarlett.private.
Now you’ll be able to resolve all three hostnames, and they’ll all point to the same physical IP address. From there it’s just Apache’s configuration telling it how to deal with the different requests.
From John:
Yeah there will be a lot of confusion in the mepaa [sic] domain record as there isn’t any reverse mapping for it. And the cname record is at the
.com
level (layer 1) which won’t resolve very well for clients.Next, what is the forwarder settings set to? These should be set to the ISP DNS and then to the router (you can add as many DNS servers as you like for redundancy).
What is doing DHCP to the clients? What DNS are they getting? The clients need to know where your subdomains are in the network. For example if a pc is typing in
hg.macminimarlett
(which is a bad idea – it should behg.macminimarlett.private
or something like that) then the pc client checks the DNS server for which server (IP address) has the subdomainhg.macminimarlett
– but if the DNS server doesn’t have a record ofhg.macminimarlett
then the DNS server will reply with not a real address (because it doesn’t know who that is).
I am not sure what is giving DNS to the clients. I did have to put something (I think it is the IP of my neighbors router, see the image above) in the DNS settings of the Airport Extreme in order to get the Internet to be passed to the clients. So what I did was put the internal IP address of the MacMini Server in the DNS field on the Airport Extreme. I also found this interesting: http://www.dyndnscommunity.com/questions/4567/custom-dns-with-subdomain-and-airport-extreme
It seems that an AirPort Extreme will always identify itself as the DNS server. If I want the network to look for a DNS server elsewhere. Then I need to follow one of these options: http://discussions.apple.com/thread.jspa?threadID=121990, http://wiki.amahi.org/index.php/Airport_express or http://discussions.apple.com/thread.jspa?threadID=2288123&tstart=0. (Restart might be required. Also I might be looking for something called “split horizon DNS”.)
http://www.dyndns.com/support/kb/apple_airport_with_custom_dns.html
http://www.dyndnscommunity.com/questions/1087/apple-airport-does-not-create-global-dynamic-hostname-in-custom-dns-zone
SSH, Unix commands & RegEx
This summer I am sitting in on a computational linguistics course. It is the first instruction I have had about UNIX. Pretty Awesome.
This has required me to do some googling looking from terminal commands.
This is kind of a sketch of where I have been.
UNIX:
http://www.osxfaq.com/Tutorials/LearningCenter/
SSH:
http://kimmo.suominen.com/docs/ssh/
http://ss64.com/osx/
TERMINAL:
http://homepage.mac.com/rgriff/files/TerminalBasics.pdf
grep:
http://www.thegeekstuff.com/2009/03/15-practical-unix-grep-command-examples/
http://en.wikipedia.org/wiki/Grep
http://www.computerhope.com/unix/ugrep.htm
Regular Expressions:
http://www.zytrax.com/tech/web/regex.htm
http://www.regular-expressions.info/tutorial.html
http://gnosis.cx/publish/programming/regular_expressions.html
RegEx and Unicode:
One of the issues that I have had with RegEx has been what is a natural class? i.e. [A-Z], [A-Za-z], [0-9], etc. As a linguist I deal a lot with IPA characters, subscripts, superscripts, unicode, and diacritics. How am I to define a natural class with these? Can I define a natural class based on the phonology of the language?
So I did some more searching:
http://unicode.org/reports/tr18/
http://unicode.org/reports/tr18/tr18-5.1.html
http://icu-project.org/docs/papers/iuc26_regexp.pdf
http://courses.ischool.berkeley.edu/i256/f06/papers/regexps_tutorial.pdf
http://wapedia.mobi/en/Regular_expression?t=5.
RegEx+PERL+Unicode:
http://perldoc.perl.org/perlretut.html
PERL:
http://www.enginsite.com/Library-Perl-Regular-Expressions-Tutorial.htm
http://www.cgi101.com/book/connect/mac.html
http://www.mactech.com/articles/mactech/Vol.18/18.09/PerlforMacOSX/index.html
SSH and Terminal
I used an ssh connection from the Terminal today for the first time!
I feel like a real man now.I needed to transfer a 106MB folder from one subdomain to another subdomain on my DreamHost webserver. It has been my experience that whenever I copy or move folders with a lot of sub-folders that something(s) do(es) not get copied all the time or all the way. So I needed to archive my files and move them as a single object. But I do not think it is possible to zip files with an FTP client (at least not with Interarchy). For a solution I turned to ssh and a lot of googling.
So to ssh into my webhost I had to enable a user from the DreamHost panel.
Second image from another tutorial.Then I had to open terminal and create a key. I found some sensible directions in the knowledge base.
To generate a secure public/private key pair to log in securely, and without a password (if you want):
- In Terminal type:
ssh-keygen -d
Hit the “enter” key three times.
Replacing “username” and “yourdomain” with your FTP username and your-domain,
- copy & paste/type the following into Terminal:
ssh username@ftp.yourdomain.com 'test -d .ssh || mkdir -m 0700 .ssh ; cat >> .ssh/authorized_keys && chmod 0600 .ssh/*' < ~/.ssh/id_dsa.pub
Press return/enter key again.
Wait for it to ask for the Password:Enter the password of the FTP user who's username you inserted in place of the example USERNAME@ftp.yourdomain.com above.
If it asks you for the password multiple times, type in the same correct password each time.Then you will be at the root in your Terminal window.
- type:
ssh username@ftp.yourdomain.com
You're logged in!
Now any time you want to log using SSH you can just repeat
ssh username@ftp.yourdomain.com
from the command line (Terminal), no need to repeat the other steps.
So from here on I was in my webhost but still didn't know how to get around. Evidently I needed to use long paths so $ cd /home/username/directory
would move me from directory to directory. I could not just $ cd /directory
.
Once I was able to get to the directory I needed to archive, I still needed the archive commands.
I thought I wanted to use zip as my archive utility. The zip command to do that would be:
$ zip -r folder.zip folder
Though my friend Daniel said that I might should have used tar gunzip tar.gz
instead of using the zip command: "Zip compresses each file separately and then archives. Tar+gzip or tar+bzip2 archives first and then compresses."
The commands to use the tools Daniel suggested would be like the following:
tar+gzip
$ tar -cf blah.tar folder/
$ gzip -9 blah.tar
gzip compressed tar I guess this is a combination of the above two commands. Not sure. Didn't try it.
$ tar czvf folder.tgz folder
bzip2
$ tar jcvf filename.tbz folder
After the file was compressed I used Interarchy to move the single zip file to its new location. I also needed to unzip the file. (I also read this.)
To unzip the file I navigated to the directory where the file was located and then used this command:
$ unzip folder.zip folder
I had to use the long path too. So it was really:
$ unzip /home/username/directory/folder.zip folder
What a sense of accomplishment!
Merging iLife Libraries
The Problem:
One user on in a small business / family network can’t use (with metadata) all the media in a colleague’s or family member’s iTunes or iPhoto Library.
In our family there are three Macs (2 everyday machines and a server). On many work and personal tasks we function as a small workgroup. Unfortunately iTunes and iPhoto do not facilitate the sharing of media libraries (or for that matter the merging of media libraries). For instance, my wife had her own music and photo collection before we got married. Now if I want to browse that collection from my machine, there is iPhoto & iTunes sharing. But I can not add tags or other metadata to photos on her Mac. I can not create smart folders which we both can use.
iTunes
For our music we moved my collection to the Server and made it like a “media center”. When we get new music we add it to the server. If we want a copy on our own machines we pull it as needed. i.e. for an iMove project. This solution has not allowed my wife to add her collection to the server, nor has it solved the manny duplicates which exist because we like many of the same songs. Now I have found a solution to this: PowerTunes.
iPhoto
Now the same problems exist for our photos. However, there is no real advantage (or software) for hosting the family photos on our sever. But we still need to define a photo capture strategy.
- When we take new photos, to which computer are we going to download the photos?
- Where will we have the master library?
I don’t have a complete solution to our photo capture, retention and access needs but iPhoto Library Manager is the only software out there that will let us maintain the metadata and merge our iPhoto Libraries. However, This is a fantastic first step strategy:
- Consolidate the iPhoto Libraries.
- Designate an computer to be the Master Library holder.
- Share that iPhoto library across the network.
- Back that computer up.
All PDFs are not created Equal
Many of us use PDFs every day. They’re great documents to work with and read from because of their ease of use and ease to create.
I think I started to use PDFs for the first time in 2004. That’s when I got my first computer. Since that time, most PDFs I have needed to use have just worked. In the time that I have been using PDFs I have noticed that there are at least two major ways in which PDFs are not created equally:
- Validity of the PDF: Adherence to the PDF document standard.
- Resolution of contained images
- The presence and accuracy of the PDF’s meta-data.
Validity
Since 2004, there have only been a few PDFs which after creation and distribution would not render by any of my PDF readers, or on the readers my friends used (most of these PDFs were created by Microsoft Word or Microsoft Publisher on Windows and actually one or two created by Apple’s word processor Pages). Sometimes these errors had to do with a particular image included in the source document. The image may have been malformed, but this was not always the case. Sometimes it was the PDF creator, which was creating non-cross-platform PDFs.
Not all PDFs are created equal. (This is inherently true when one considers the PDF/A The University of Michigan put a small flyer together on how to get something like a PDF/A to print from MS Word on OS X and Windows. [Link], and PDF/X standards, however lets side-step those standards for a moment.) To frame this discussion, it is necessary to acknowledge that there is a difference between creating a digital document with a life expectancy of 3 weeks and one with a life expectancy of 150 years. So for some applications, what I am about to say is a moot point. However, looking towards the long term…
If an archival institution wants a document as a PDF, what are the requirements that that PDF needs to have?
What if the source document is not in Unicode? Is the font used in the source document automatically embedded in the PDF upon PDF creation? Consider this from PDFzone.com
Embedding Fonts in a PDF
Another common area of complaint among frequent PDF users is font incompatibility and problems with font embedding. Here are some solutions and tips for putting your best font forward so to speak.Keep in mind that when it comes to embedding fonts in a PDF file you have to make certain that you have the correct fonts on the system you’re using to make the conversion. Typically you should embed and subset fonts, although there are always exceptions.
If you just need a simple solution that will handle the heavy font work for you, the WonderSoft Virtual PDF Printer helps you choose and embed your fonts into any PDF. The program supports True Type and Unicode fonts.
The left viewing window shows you all the fonts installed on your system and the right viewing window shows the selected user fonts to embed into a newly created PDF form. A single license is $89.95.
Another common solution is the 3-Heights Optimization PDF Optimization Tool [Link Removed].
One of the best sources of information on all things font is at the Adobe site itself under the Developer Resources section.
3-Heights does have an enterprise level PDF validator. I am not sure if there is one in the OpenSource world But it would seem to me that any Archival Institution should be concerned with not just having PDFs in their archive but also keenly interested in having valid PDFs in their archives. This is especially true when we consider that one of todays major security loopholes is malformed file types, i.e. PDFs that are not really PDFs or PDFs with something malicious attached or embeddedHere is a nice Blog Post about embedding a DLL in a PDF. I am sure that there is more than one method to this madness but it only takes one successful attempt to create a security breach. In fact there are several methods reported, some with javascript some without. Here are a few:
- Embedded PDF executable hack goes live in Zeus malware attacks
- Hacker finds a way to exploit PDF files, without a vulnerability
- Escape From PDF
- Adobe suggests workaround for PDF embedded executable hack
- CVE-2010-1240 : Adobe PDF Embedded EXE Social Engineering
Apparently, several kinds of media can be embed in PDFs. These include: movies and songs, JavaScript, and forms that upload data a user inputs to a web server within PDFs. And there’s no forgetting the function within PDF specs to launch executables..
Printing the PDF does not seem to be a fail proof method to see if the PDF is valid or even usable. See this write up from The University of Sydney School of Mathematics and Statistics:
Problem
I can read the file OK on screen, but it doesn’t print properly. Some characters are missing (often minus signs) and sometimes incorrect characters appear. What can I do?
Solution
When the Acrobat printing dialog box opens, put a tick in the box alongside “print as image”. This will make the printing a lot slower, but should solve the problem. (The “missing minus signs” problem seemed to occur for certain – by now rather old – HP printers.)
(Most of these problems with pdf files are caused by subtle errors with the fonts the pdf file uses. Unfortunately, there are only a limited number of fonts that supply the characters needed for files that involve a lot of mathematics.)
Printing a PDF is not necessarily a fail proof way to see if a PDF is usable. Even if the PDF is usable, printing the PDF does not ensure that it is a valid PDF either. When considering printing as a fail proof method one should also consider that PDFs can contain video, audio, and flash content. So how is one to print this material? Or in an archival context determine that the PDF is truly usable? A valid PDF will render and archive correctly because it conforms to the PDF standard (what ever version of that standard is declared in the PDF). Having a PDF conform to a given PDF standard puts the onus on the creator of the PDF viewer (software) to implement the PDF standard correctly. Thus making the PDF usable (as intended by the PDF’s author).
Note: I have not checked the Digital Curation Center for any recommendations on PDFs and ensuring their validity on acceptance to an archive.
Resolution of Contained Images
A second way that PDF documents can vary is that the resolutions of the images contained in them can vary considerably. The images inside of a PDF can be a variety of image formats, .jpg, .tiff, .png, etc. So the types of compression and the looseness of these compressions can make a difference in the archival “quality” of a PDF. A similar difference is is noted to be the difference in a raster PDF and a Vector PDF. [1]Yishai . 1 July 2009. All PDF’s are not created equal. Part III (out of III). Digilabs Technologies Blog. … Continue reading Beside these two types of differences there are various PDF printers, which print materials to PDF in various formats. This manual discusses setting Adobe Acrobat Pro’s included PDF printer.
Meta-data
A third way in which PDFs are not created equally is that they do not all contain valid, and accurate meta-data, it the meta-data containers available in the PDF standard. PDF generators do not all respectfully add meta-data to right places in a PDF file, and those which do sometimes add meta-data to a PDF file do not always add the correct meta-data to the PDF.
Prepressure.com presents some clear discussion on the embedded properties of meta-data in PDFsPreppressure.com has a really helpful section on PDFs and various issues pertaining to PDFs and their use. http://www.prepressure.com/pdf/basics.
Their discussion on meta-data can be found at http://www.prepressure.com/pdf/basics/metadata.
How metadata are stored in PDF files
There are several mechanisms available within PDF files to add metadata:
- The Info Dictionary has been included in PDF since version 1.0. It contains a set of document info entries, simple pairs of data that consist of a key and a matching value. Some of these are predefined, such as Title, Author, Subject, Keywords, Created (the creation date), Modified (the latest modification date) and Application (the originating application or library). Applications can add their own sets of data to the info dictionary.
- XMP (Extensible Metadata Platform) is an Adobe technology for embedding metadata into files. It can be used with a wide variety of data files. With Acrobat 5 and PDF 1.4 (2001) this mechanism was also made available for PDF files. XMP is more powerful than the info dictionary, which is why it is used in a number of PDF-based metadata standards.
- Additional ways of embedding metadata are the PieceInfo Dictionary (used by Illustrator and Photoshop for application specific data when you save a file as a PDF), Object Data (or User Properties) and Measurement Properties.
PDF metadata standards
There are a number of interesting standards for enriching PDF files with metadata. Below is a short summary:
- There are PDF substandards such as PDF/X and PDF/A that require the use of specific metadata. In a PDF/X-1a file, for example, there has to be a metadata field that describes whether the PDF file has been trapped or not.
- The GWG ad ticket provides a standardized way to include advertisement metadata into a PDF file.
- Certified PDF is a proprietary mechanism for embedding metadata about preflighting – whether a PDF file intended to be printed by a commercial printer or newspaper has been properly checked on the presence of all fonts, images with a sufficient resolution,…
The filename is metadata as well
The easiest way to add information about a PDF to the file is by giving it a proper filename. A name like ‘SmartGuide_12_p057-096_v3.pdf’ tells a recipient much more about what the file is about than ‘pages_part2_nextupdate.pdf’ does.
- Add the name of the publication and possibly the edition to the filename.
- Add a revision number (e.g. ‘v3′) if there will be multiple updates of a file.
- If a file contains part of the pages of a publication add at least the initial folio to the filename. That allows people to easily sort files in the right order. Use 2 or 3 digits for the page number (e.g. ‘009′ instead of just ‘9′).
- Do not use characters that are not supported in other operating systems or that have a special meaning in some applications: * < > [ ] = + ” \ / , . : ; ? % # $ | & •.
- Do not use a space as the first or last character of the filename.
- Don’t make the filename too long. Once you go beyond 50 characters or so people may not notice the full information or the filename may get clipped in browser windows or applications.
- Many prepress workflow systems can automatically insert files into a job based on a specific naming convention. This speeds up the processing of the job and can avoid costly mistakes. Consult with your printer – they may have guidelines for submitting files.
Even on my favorite operating system, OS X there are several methods available to users for making PDFs of documents. These methods do not all create the same PDFs. (The difference is in the meta-data contained and in the size of the files.) This is pointed out by Rob Griffiths [2]Rob Griffiths. Keep some PDF info private. Macworld.com. Mar 1, 2007 2:00 am. <Accessed 14 March 2011>. [Link] in Macworld in an article on privacy, and being aware of PDF meta-data which might transmit more personal information than the document creator might desire. However, what Rob points out is that there are several methods of producing PDFs on OS X and these various methods include or exclude various meta-data details. Just as privacy concerns might motivate the removal of embedded meta-data (or perhaps the creation of PDF without meta-data), the accuracy of archive quality should drive the inclusion of meta-data in PDF files hosted by archives. There are two obvious ways to increase the quality of a PDF in an archive:
- The individual can enrich the PDF with meta-data prior to submission (risking that the institution will strip the meta-data embedded and input their own meta-data values)
- The archive can systemically enrich the meta-data based on the other meta-data collected on the file while it is in their “custody”.
As individuals we can take responsibility for the first point. There are several open source tools for editing the embedded meta-data, one of these is pdfinfoAnother command line tool is ExifTool (Link to Wikipedia). ExifTool is more versatile, working with more file types than just PDF, but again this tool does not have a GUI.. I wish I could find a place to download this command line tool, but it only seems to be in linux software repositories. However, there are several other command line packages which incorporate this utility. One of these packages is xpdf. Xpdf is available under GPL for personal use from foolabs. The code has to be compiled from source but there are links to several other websites with compiled versions for various OSes. There is an OS package installer available from phg-online.de. For those of us who are strong believers in GUIs and loath the TUI (Text User Interface, or command line) there is a freely available GUI for pdfinfo from sybrex.com.
Because I use PDFs extensively in matters of linguistic research I thought that I would take look at several PDFs from a variety of sources. This would include:
- JSTOR: Steele (1976) [3]Susan M. Steele. 1976. A Law of Order: Word Order Change in Classical Aztec. International Journal of American Linguistics, vol 42 (1): 31-45. [Link] . JSTOR is well known archive in academic circles (especially the humanities).
- Project Muse: (Language) Ladefoged (2007) [4]Peter Ladefoged. 2007. Articulatory Features for Describing Lexical Distinctions. Language 83.1: 161-80. . Project Muse is also another well known repository for the humanities. Langauge is a well respected journal in the linguistic sciences, published by the Linguistic Society of America.
- Cambridge Journals: (Journal of the International Phonetic Association) Olson, Mielke, Olson, Sanicas-Daguman, Pebley and Paterson (2010) [5]Kenneth S. Olson, Jeff Mielke, Josephine Sanicas-Daguman, Carol Jean Pebley & Hugh J. Paterson III. 2010. The phonetic status of the (inter)dental approximant. Journal of the International … Continue reading Cambridge Press, of which Cambridge Journals is a part, is a major publisher of linguistic content in the English academic community.
- SIL Academic Publishing: Gardner and Merrifield (1990) [6]Richard Gardner and William R. Merrifield. 1990. Quiotepec Chinantec tone. In William R. Merrifield and Calvin R. Rensch (eds.), Syllables, tone, and verb paradigms: Studies in Chinantec languages 4, … Continue reading This PDF is found through the SIL Bibliography, but prepared by Academic Publishing (department) of SIL.It is important to note that this work was made available through SIL’s Global Publishing Service (formerly Academic publishing) not through the Language and Culture Archives. This is evidenced by the acpub used in the URL for accessing the actual PDF:
www.sil.org/acpub/repository/24341.pdf
. As a publishing service, this particular business unit of SIL is more apt to be aware of and use higher PDF standards like PDF/A in their workflows. - SIL – Papua New Guinea: Barker and Lee (n.d.) [7] Fay Barker and Janet Lee. Available: 2009; Created: n.d.. A tentative phonemic statement of Waskia. [Manuscript] 40 p. [Link]. but made available online in 2009 by SIL – Papua New Guinea.
- SIL Mexico Branch: Benito Apolinar Antonio, et al. MWP#9a [8]Benito Apolinar Antonio, et al. 2010. Vocabulario básico en me’phaa. SIL-Mexico Electronic Working Papers #9a. [PDF]. and Benito Apolinar Antonio, et al. MWP#9b [9]Benito Apolinar Antonio, et al. 2010. Vocabulario básico en me’phaa. SIL-Mexico Electronic Working Papers #9b. SIL International. [PDF]. It is interesting to note that the production tool used to create the PDFs for the Mexico Branch Work Papers was XLingPaper. [10]H. Andrew Black. 2009. Writing linguistic papers in the third wave. SIL Forum for Language Fieldwork 2009-004:11. http://www.sil.org/silepubs/abstract.asp?id=52286. [PDF] [11]H. Andrew Black, and Gary F. Simons. 2009. Third wave writing and publishing. SIL Forum for Language Fieldwork 2009-005: 15 http://www.sil.org/silepubs/abstract.asp?id=52287. [PDF] XLingPaper is a plugin for XMLMind, an XML editor. It is used for creating multiple products from a single XML data source. (In this case the data source is the linguistics paper.) However, advanced authoring tools like XLingPaper, LaTeX and its flavors like XeTeX should be able to handel assignment of keywords and meta-data on the creation fo the PDF.
- Example of a PDF made from Microsoft Word: Snider (2011) [12]Keith Snider. 2011. On Discovering Contrastive Tone Melodies. Paper presented at the Berkley Tone Workshop, 18-20 February 2011, University of California, Berkley.
- Example of a PDF made from Apple Pages: Paterson and Olson (2009) [13]Hugh Paterson III and Kenneth Olson. 2009. An unlikely retention. Paper presented at the 11th International Conference on Austronesian Linguistics, 22–26 June 2009, Aussois, France.
The goal of the comparison is to look at a variety of PDF resources from a variety of locations and content handlers. I have included two linguistic journals, two repositories for journals, and several items from various SIL outlets. Additionally, I have included two different PDFs which were authored with popular wordprocessing applications. To view the PDFs and their meta-data I used Preview, a PDF and media viewer which ships with OS X, and is created by Apple. Naturally, the scope of the available meta-data to be viewed is limited to what Preview is programed to display. Adobe Acrobat Pro will display more meta-data fields in its meta-data editing interface.
- JSTOR:
- Project Muse:
- Cambridge Journals:
- SIL Academic Publishing (not the archive):
Among the PDFs surveyed Academic Publishing was the only producer to use Keywords. They were also the only one to use or embed the ISO 639-3 code of the subject language of the item. - SIL – Papua New Guinea:
- SIL Mexico Branch:
Work Papers #9a
Work Papers #9b
- MS Word Example:
Notice that in the title that the application used to create the PDF inserts “Microsoft Word – ” Before the document title. - Apple Pages Example:
As we can see from the images presented here there is not a wide spread adoption of a systematic process on the part of:
- publishers
- or on the part of developers of writing utilities, like MS Word, or XLingPaper, to encourage the enduser to produce enriched PDFs.
- Additionally, there is not a systemic process used by content providers to enrich content produced by publishers.
However, enriched content (PDFs) is used by a variety of PDF management applications and citation management software. That is, consumers do benefit from the enriched state of PDFs and consumers are looking for these featuresThe discussion on Yep 2’s forums high-lights this point. Yep 2 is a consumer / Desktop media & PDF management tool. There are several other tools out there like Papers2, Mendeley, Zotero even Endnote..
If I were to extend this research I would look at PDFs from more content providers. I would look for a PDF from an Open Access Repository like the Rugters Optimality Archive, a Dissertation from ProQuest, I would also look for some content from a reputable archive like PARADISEC, and something from a DSpace implementationXpdf can be used in conjunction with DSpace, in fact it is even mentioned in the manual..
References
↑1 | Yishai . 1 July 2009. All PDF’s are not created equal. Part III (out of III). Digilabs Technologies Blog. http://digilabsblog.wordpress.com/2009/07/01/all-pdf’s-are-not-created-equal-part-iii-out-of-iii/. [Link] [Accessed: 23 January 2012] |
---|---|
↑2 | Rob Griffiths. Keep some PDF info private. Macworld.com. Mar 1, 2007 2:00 am. <Accessed 14 March 2011>. [Link] |
↑3 | Susan M. Steele. 1976. A Law of Order: Word Order Change in Classical Aztec. International Journal of American Linguistics, vol 42 (1): 31-45. [Link] |
↑4 | Peter Ladefoged. 2007. Articulatory Features for Describing Lexical Distinctions. Language 83.1: 161-80. |
↑5 | Kenneth S. Olson, Jeff Mielke, Josephine Sanicas-Daguman, Carol Jean Pebley & Hugh J. Paterson III. 2010. The phonetic status of the (inter)dental approximant. Journal of the International Phonetic Association 40.02: 199-215. [Link] |
↑6 | Richard Gardner and William R. Merrifield. 1990. Quiotepec Chinantec tone. In William R. Merrifield and Calvin R. Rensch (eds.), Syllables, tone, and verb paradigms: Studies in Chinantec languages 4, 91-105. Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 95. Dallas: Summer Institute of Linguistics and the University of Texas at Arlington. [PDF] |
↑7 | Fay Barker and Janet Lee. Available: 2009; Created: n.d.. A tentative phonemic statement of Waskia. [Manuscript] 40 p. [Link]. |
↑8 | Benito Apolinar Antonio, et al. 2010. Vocabulario básico en me’phaa. SIL-Mexico Electronic Working Papers #9a. [PDF]. |
↑9 | Benito Apolinar Antonio, et al. 2010. Vocabulario básico en me’phaa. SIL-Mexico Electronic Working Papers #9b. SIL International. [PDF]. |
↑10 | H. Andrew Black. 2009. Writing linguistic papers in the third wave. SIL Forum for Language Fieldwork 2009-004:11. http://www.sil.org/silepubs/abstract.asp?id=52286. [PDF] |
↑11 | H. Andrew Black, and Gary F. Simons. 2009. Third wave writing and publishing. SIL Forum for Language Fieldwork 2009-005: 15 http://www.sil.org/silepubs/abstract.asp?id=52287. [PDF] |
↑12 | Keith Snider. 2011. On Discovering Contrastive Tone Melodies. Paper presented at the Berkley Tone Workshop, 18-20 February 2011, University of California, Berkley. |
↑13 | Hugh Paterson III and Kenneth Olson. 2009. An unlikely retention. Paper presented at the 11th International Conference on Austronesian Linguistics, 22–26 June 2009, Aussois, France. |