Audio Dominant Texts and Text Dominant Audio

As linguistics and language documentation interface with digital humanities there has been a lot of effort to time-align texts and audio/video materials. At one level this is rather trivial to do and has the backing of comercial media processes like subtitles in movies. However, at another level this task is often done in XML for every project (digital corpus curation) slightly differently. At the macro-scale the argument is that if the annotation of the audio is in XML and someone wants to do something else with it, then they can just convert the XML to whatever schema they desire. This is true.

However, one antidotal point that I have not heard in discussion of time aligned texts is specifications for Audio Dominant Text vs. Text Dominant Audio. This may not initially seem very important, so let me explain what I mean.
Continue reading

Presenting Audio and Video on the Web

I have been trying to find out what is the best way to present audio on the web. This led me to look at how to present video too. I do not have any conclusions on the matter. But I have been looking at HTML5 and not using javascript or Flash. Because my platform (CMS) is WordPress, Continue reading

Presentation version vs. Archival version of Digital Audio files

What is an archival version of an audio file?

An archival version of an audio file is a file which represents the original sound faithfully. In archiving we want to keep a version of the audio which can be used to make other products and also be used directly itself if needed. This is usually done through PCM. There are several file types which are associated with PCM or RAW uncompressed faithful (to the original signal) digital audio. These are:

  • Standard Wave
  • AIFF
  • Wave 64
  • Broadcast Wave Format (BWF)One way to understand the difference between audio file formats is understanding how different format are used. One place which has been helpful to me has been the DOBBIN website as they explain their software and how it can change audio from one PCM based format to another.

Each one of these file types has the flexibility to have various kinds of components. i.e. several channels of audio can be in the same file. Or one can have .wav files with different bit depths or sampling rates. But they are each a archive friendly format. Before one says that a file is suitable for archiving simply based on its file format one must also consider things like sample rates, bit depth, embedded metadata, channels in the file, etc. I was introduced to DOBBIN as an application resource for audio archivists by a presentation by Rob Poretti. [1] Rob Poretti. 2011. Audio Analysis and Processing in Multi-Media File Formats. ARSC 2011. [Accessed: 24 October 2011] [Link] One additional thing that is worth noting in terms of archival versions of digital audio pertains to born digital materials. Sometimes audio is recored directly to a lossy compressed audio format. It would be entirely appropriate to archive a born-digital filetype based on the content. However it should be noted that in this case the recordings should have been done in a PCM file format.

What is a presentation version? (of an audio file)

A presentation version is a file created with a content use in mind. There are several general characteristics of this kind of file:

  1. It is one that does not retain the whole PCM content.
  2. It is usually designed for a specific application. (Use on a portable device, or personal audio player)
  3. It can be thought of as a derivative product from an original audio or video stream.

In terms of file formats, there is not just one file format which is a presentation format. There are many formats. This is because there are many ways to use audio. For instance there are special audio file types optimized for various kinds of applications like:

  • 3G and WiFi Audio and A/V services
  • Internet audio for streaming and download
  • Digital Radio
  • Digital Satellite and Cable
  • Portable playersA brief look a an explanation by Cube-Tec might help to get the gears moving. It is part of the inspiration for this post.

This means there is a long list of potential audio formats for the presentation form.

  • AAC (aac)
  • AC3 (ac3)
  • Amiga IFF/SVX8/SV16 (iff)
  • Apple/SGI (aiff/aifc)
  • Audio Visual Research (avr)
  • Berkeley/IRCAM/CARL (irca)
  • CDXA, like Video-CD (dat)
  • DTS (dts)
  • DVD-Video (ifo)
  • Ensoniq PARIS (paf)
  • FastTracker2 Extended (xi)
  • Flac (flac)
  • Matlab (mat)
  • Matroska (mkv/mka/mks)
  • Midi Sample dump Format (sds)
  • Monkey’s Audio (ape/mac)
  • Mpeg 1&2 container (mpeg/mpg/vob)
  • Mpeg 4 container (mp4)
  • Mpeg audio specific (mp2/mp3)
  • Mpeg video specific (mpgv/mpv/m1v/m2v)
  • Ogg (ogg/ogm)
  • Portable Voice format (pvf)
  • Quicktime (qt/mov)
  • Real (rm/rmvb/ra)
  • Riff (avi/wav)
  • Sound Designer 2 (sd2)
  • Sun/NeXT (au)
  • Windows Media (asf/wma/wmv)

Aside from just the file format difference in media files (.wav vs. .mp3) there are three other differences to be aware of:

  1. Media stream quality variations
  2. Media container formats
  3. Possibilities with embedded metadata

Media stream quality variations

Within the same file type there might be a variation of quality of audio. For instance Mp3 files can have a variable rate encoding or they can have a steady rate of encoding. When they have a steady rate of encoding they can have a High or a low rate of encoding. WAV files can also have a high or a low bit depth and a high or a low sample rate. Some file types can have more channels than others. For instance AAC files can have up to 48 channels where as Mp3 files can only have up to 5.1 channels. [2] Various Contributors. 21 October 2011 at 21:44 . Wikipedia: Advanced Audio Coding, AAC’s improvements over MP3. … Continue reading

One argument I have heard in favor of saving disk space is to use lossless compression rather than WAV files for archive quality (and as archive version) recordings. As far as archiving is concerned, these lossless compression formats are still product oriented file formats. One thing to realize is that not every file format can hold the same kind of audio. Some formats have limits on the bit depth of the samples they can contain, or they have a limit on the number of audio channels they can have in a file. This is demonstrated in the table below, taken from wikipedia. [3] Various Contributors. 21 October 2011 at 10:26 . Wikipedia:Comparison of audio formats, Technical Details of Lossless Audio Compression Formats. … Continue reading This is where understanding the relationship between a file format, a file extension and a media container format is really important.

Audio compression formatAlgorithmSample RateBits per sampleLatencyStereoMultichannel
ALACLossless44.1 kHz to 192 kHz16, 24[41]?YesYes
FLACLossless1 Hz to 655350 Hz8, 16, 20, 24, (32)4.3ms - 92ms (46.4ms typical)YesYes: Up to 8 channels
Monkey's AudioLossless8, 11.025, 12, 16, 22.05, 24, 32, 44.1, 48 kHz??YesNo
RealAudio LosslessLosslessVaries (see article)Varies (see article)VariesYesYes: Up to 6 channels
True AudioLossless0–4 GHz1 to > 64?YesYes: Up to 65535 channels
WavPack LosslessLossless, Hybrid1 Hz to 16.777216 MHzvaries in lossless mode; 2.2 minimum in lossy mode?YesYes: Up to 256 channels
Windows Media Audio LosslessLossless8, 11.025, 16, 22.05, 32, 44.1, 48, 88.2, 96 kHz16, 24>100msYesYes:Up to 6 channels

Media container formats

Media container formats can look like file types but they really are containers of file types (think like a folder with an extension). Often they allow for the bundling of audio and video files with metadata and then enable this set of data to act like a single file. On wikipedia there is a really nicecomparison of container formats.

MP4 is one such container format. Apple Lossless data is stored within an MP4 container with the filename extension .m4a – this extension is also used by Apple for AAC audio data in an MP4 container (same container, different audio encoding). However, Apple Lossless is not a variant of AAC (which is a lossy format), but rather a distinct lossless format that uses linear prediction similar to other lossless codecs such as FLAC and Shorten. [4] Various Contributors. 6 October 2011 at 03:11. Wikipedia: Apple Lossless. [Link] Files with a .m4a generally do not have a video stream even though MP4 containers can also have a video stream.

MP4 can contain:

  • Video: MPEG-4 Part 10 (H.264) and MPEG-4 Part 2
    Other compression formats are less used: MPEG-2 and MPEG-1
  • Audio: Advanced Audio Coding (AAC)
    Also MPEG-4 Part 3 audio objects, such as Audio Lossless Coding (ALS), Scalable Lossless Coding (SLS), MP3, MPEG-1 Audio Layer II (MP2), MPEG-1 Audio Layer I (MP1), CELP, HVXC (speech), TwinVQ, Text To Speech Interface (TTSI) and Structured Audio Orchestra Language (SAOL)
    Other compression formats are less used: Apple Lossless
  • Subtitles: MPEG-4 Timed Text (also known as 3GPP Timed Text).
    Nero Digital uses DVD Video subtitles in MP4 files [5] Various Contributors. 11 October 2011 at 15:00. Wikipedia: MPEG-4 Part 14. [Link]

This means that an .mp3 file can be contained inside of an .mp4 file. This also means that audio files are not always what they seem to be on the surface. This is why I advocate for an archive of digital files which archives for a digital publishing house to also use technical metadata as discovery metadata. Filetype is not enough to know about a file.

Possibilities with embedded metadata

Audio files also very greatly on what kinds of embedded metadata and metadata formats they support. MPEG-7, BWF and MP4 all support embedded metadata. But this does not mean that audio players in the consumer market or prosumer market respect this embedded metadata. ARSC has in interesting report on the support for embedded metadata in audio recording software. [6] Chris Lacinak, Walter Forsber. 2011. A Study of Embedded Metadata Support in Audio Recording Software: Summary of Findings and Conclusion. ARSC Technical Committee. … Continue reading Aside from this disregard for embedded metadata there are various metadata formats which are embedded in different file types, one common type ID3, is popular with .mp3 files. But even ID3 comes in different versions.

In archiving Language and Culture Materials our complete package often includes audio but rarely is just audio. However, understanding the audio components of the complete package help us understand what it needs to look like in the archive. In my experience in working with the Language and Culture Archive most contributors are not aware of the difference between Archival and Presentation versions of audio formats and those who think they do, generally are not aware of the differences in codecs used (sometimes with the same file extension). From the archive’s perspective this is a continual point of user/submitter education. This past week have taken the time to listen to a few presentations by Audio Archivist from the 2011 ARSC convention. These in general show that the kinds of issues that I have been dealing with in the Language and Culture Archive are not unique to our context.

The Complete Audio Package

The Complete Audio Package


1 Rob Poretti. 2011. Audio Analysis and Processing in Multi-Media File Formats. ARSC 2011. [Accessed: 24 October 2011] [Link]
2 Various Contributors. 21 October 2011 at 21:44 . Wikipedia: Advanced Audio Coding, AAC’s improvements over MP3. [Link]
3 Various Contributors. 21 October 2011 at 10:26 . Wikipedia:Comparison of audio formats, Technical Details of Lossless Audio Compression Formats. [Link]
4 Various Contributors. 6 October 2011 at 03:11. Wikipedia: Apple Lossless. [Link]
5 Various Contributors. 11 October 2011 at 15:00. Wikipedia: MPEG-4 Part 14. [Link]
6 Chris Lacinak, Walter Forsber. 2011. A Study of Embedded Metadata Support in Audio Recording Software: Summary of Findings and Conclusion. ARSC Technical Committee. [Link]


This past weekend I walked across the street from our house with Becky and went to an estate sale. In one of the rooms there was a stack of electronics. There on a dresser was a Reel-to-Reel machine in almost perfect condition.



The week prior I had been working with some Reels of language data in the archive. We have two machines but one of them does not work completely. Some amazing stuff. So now it sits on my desk and I am using my Marantz PMD 661 to digitize all sorts of Reel to Reels. Today I was looking up this model online and found it on Ebay for $300 and in Germany on Ebay for 500€, not bad as I paid $35 for mine. I finally found the manuals online for Free! It seems that somebody is trying to sell print offs on ebay for $11 each. But here is what I found.


Track system: 4 track, 2 channel stereo/monaural system

Maximum reel capacity: 7″ reel

Wow and flutter: 0.08% at 7.5 ips

Frequency response: 30Hz to 24kHz at 7.5ips

Distortion: less than 1.5%

Signal to noise ratio: >50dB

Heads: 3 heads

Motors: 3 motors

Dimensions: 430 x 425 x 230mm

Weight: 19kg

One of my concerns is with the 30Hz playback.

Was this standard for reel-to-reels or was there better recordings done but I will never know because of the model of playback machine I have?

Another concern I have is about the heads:

If they are magnetized will they erase my tape? How do I tell if they are magnetized?

Here is a listing of Reel-to-Reel machines by AKAI.

Here is a youtube video of the same model I have.

Headphones for Language Documentation

Looking at the overall audio recording picture in Language Documentation, the place of headphones is not to be taken lightly. The total recording act has a performer, performing the speech act, a microphone recording the sounds, a recording device digitizing the input from the microphone. And while it could all stop there, an additional step is needed – the monitoring of the input to the recording device. High quality sound is important in the recording process. If the Documenter does not have and ear for what a recording should be (training), then they are not going to be able to tell the difference between a good recording, a mediocre recording and a bad recording. It think it is well known, if not commonly acknowledged that a good mic is needed to make quality recordings (recordings which are faithful at recording the input). Likewise it is no less true that to listen to that recording to assure that it is truly a faithful recording a speaker or set of headphones is needed to produce a faithful output of the recording. So high quality audio (faithfulness to input and to the recording) is not only needed in the capture phase of the recording event but also in the monitoring phase (and the playback phase) of the recording event. Here inter the importance of headphones.
It is possible to record with out monitoring the recording but, actively monitoring recorded audio allows for on-the-spot quality control. I have seen some tears shed when a documenter waited till later to find out that the mic record level was set too low or that the record button was not pushed.
It is also possible to record with low quality headphones, or headphones which are not faithful to the recording. But when these kinds of speakers or headphones are used there is no way to accurately access the quality of the recording. I have seen a good quality recording played through laptop speakers and questioned as being a “quality” recording. I have also seen a poor quality recording listened to on poor quality speakers and thought it was a fine recording, only to find out later what the true quality of the recording really was.
Like all gear getting to know what levels your headphones are at is important. When the recording level is set to “y” and you are used to listening/monitoring with Headphone set “A” at volume level “z” and then you switch headphones and use headphones set “B”, the temptation is to change the recording level. Really what needs to change is the volume level from the recording device to the headphones. Every set of headphones will sound different at any given volume setting. The volume of a speaker or set of headphones is only part of the sound equation.

Recording volume (or Mic sensitivity) + Speaker volume = Total listening experience.

Just raising the volume on a speaker or a pair of headphones might make the audio more hearable but does not mean that a good recording is being captured. Conversely, a low speaker volume and a high mic record level does not make a great recording either. (It might be the right setup for your recording session but it does not guarantee a good recording all the time.)

Record and Volume Levels

Record and Volume Levels

Just as Language Documentation is about Capturing the event, [1]David Nathan. 2010. Sound and unsound practices in documentary linguistics: towards an epistemology for audio. In Peter Austin (ed) Language Documentation and Description. Vol 7. London: SOAS. … Continue reading headphones are about Monitoring the recording.

Headphones v.s. Speakers
Headphones have the distinct advantage over Speakers for not re-introducing noise into a recording, especially in a field recording environment. However, both can be used to monitor a recording. When I recorded in a professional recording studio the mixing room was isolated from the performance rooms and we had Speakers in the mixing room.

Different Uses
Different kinds of Headphones serve different purposes. A good knowledge of your purpose and a good knowledge of the right kinds of headphones will lead to a great experience. First, lets take a look at the audio listening needs on a language documentation project.

  • Documenter listening to the audio feed while video taping.
  • Documenter listening to the audio while recording.
  • Documenter and Consultant listening to audio while recording with the BOLD method [2] Will D. Reiman. 2010. Basic oral language documentation. Language Documentation & Conservation 4. 254-268. [url] .
  • Documenter editing audio and video.
  • Documenter doing analysis and annotation in the field with tools like Audacity, Praat and ELAN.
  • Documenter listening to their iPod in their hammock at night.

With these difference in mind, it is easy to see that various Language Documentation Projects will have different functional requirements depending on the project goals. Additionally, Language Documentation projects often have constraints on their weight of gear to area of service ratios. (Sometimes the documenter just can’t take everything. So take the right tool for the job and hope that tool is light weight.) Headphones come into the equation, as apposed to speakers, because they can deliver high quality sound at a low weight. However, should the language documenter just grab their iPod headphones and think that they will be alright?

What kinds of headphones are there?
There are several basic kinds of headphones which can be roughly broken down into 4 types of headphones (Depending on who is consulted.Types of headphones on:

. [Accessed April 5th, 2011]..”]Various kinds of headphones

  • First, there are over-the-ear headphones or Circumaural headphones which cover the wearers ears completely.
    These come in two varieties:

    1. Open Back – which lets ambient noise into the wearer’s ear canal.
    2. Closed Back – which isolates the wearer from noise which is not in the recording.
  • Second, there are on-the-ear headphones or Supra-aural headphones. Supra-aural headphones sit on the wearers ear rather than covering it completely.
  • Lastly, there are in-the-ear headphones. This group can actually be divided into two sub-groups:
    1. Ear buds, such as the Apple iPod Earphones, which sit in the wearers outer ear, just above the opening of the ear canal.
    2. True in-the-ear headphones or In-ear monitors which are actually inserted into your ear canal and isolate the wearer from exterior noise.

How do I know a good set of headphones from a bad set?
Well there are several ways we can answer the question as to what is “good” set of head phones. The most succinct way to say it is likely to be:

Closed back circumaural, 20-20k.

– Will Reiman

More generally, consider weight, size, durability, fit, frequency response and price.

Weight: The Lower the weight the better: less strain on the neck after hours of use and the less weight to carry around with you.
Size: Do they fold up? are they compact and easy to pack and carry with your other gear?

  • Are they comfortable for people with large ears? You might not have large ears but if you are asking someone else to wear them then will it be comfortable for them, especially if you are asking them to wear them for a long period of time (long recording sessions).
  • Can they be worn with an over the head mic?
  • Can they be worn by someone wearing glasses?
  • Are they adjustable? Adjustable for head circumference and pronation around the ear.

Quality of sound:

  • Frequency response: Frequency response should minimally be 20Hz-20kHz. (because that is generally the range of human hearing.) I like going for 5Hz-30kHz because it gives me a full range of bass. I feel like I am hearing the whole range of sound.
  • Flat response: Just like microphones, speakers (and headphones) can make some frequencies (warmer) try and get a set of head phones which are true to signal (flat response).Read about how PC Magazine tests their headphones: How We Test Headphones.

Headphone Frequency Response graph by PDMag comparing three sets of headphones.

Price: I think I paid $50 (USD) for the Sony MDR-V600 even though the suggested manufactures price is over $100. So look look around. Find a deal.

Here is a look a some of the Headphones I have and I have used. How I used them and what I thought about them.

Standard Apple Earbuds

These are the ones which come with my iPod.

[Apple] [Amazon] [Headphone Reviews]Taken from Apple’s website on April 5th, 2011.

Apple Earbuds

Apple Earbuds

Headphone Type: Canal
Fit Style: Earbud
Frequency response: 20Hz to 20,000Hz
Impedance: 32 ohms per Apple for newer models, 16 ohms for older 2008-ish models.
Plug Type: 1/8-inch
Cable Type: Straight
Cord Length: Approximately 0.5 meters
Weight: 0.15 pounds

Comments: These are the ones which come with your iPod.
How I use them: I listen to my iPod with them. I also listen to music through my computer with them.

Rating by Hugh: 3.0 stars

Olympus Earbuds

They came with my Olympus WS-320M Digital Voice Recorder. No type ID indicated and they are not the ones indicated on the website currently.

Olympus Earbuds

Olympus Earbuds

Headphone Type: Canal
Fit Style: Earbud
Frequency response: ?
Impedance: ?
Plug Type: 1/8-inch
Cable Type: Straight
Cord Length: Approximately 0.5 meters
Weight: 0.15 pounds
Comments: Really clear and crisp sound. It took some getting used to because they are different than the iPod earbuds.
How I use them: I use these to actively monitor the audio though the video camera (Vixia HF200).

Rating by Hugh: 4.5 stars

Samson CH700

This is the pair that came in my Zoom H4n Kit.

[Samson] [Amazon] [Headphone Reviews]
Samson CH700

Samson CH700

Headphone Type: Closed Back
Driver Size: 40mm
Fit Style: Circumaural
Frequency Response: 20Hz-22,000Hz
Impedance: 64 ohms
Plug Type: 1/8-inch to 1/4-inch adapter
Cable Type: Straight
Cord Length: Approximately 3 meters
Weight: 1.1 pounds
Comments: These came in a packaged kit with the Zoom H4n. They are a bit on the heavy side. they do have a nice sound to them. The do have an adjustment slider to fit various sizes of heads. They are not very compact.
How I use them: I use these to actively monitor recorded audio. Mostly when using a Zoom H4n, but also with an Olympus LS-10.

Rating by Hugh: 3.0 stars

Sony MDR-V600

A High frequency response range and a good reputation for durability.The Sony MDR-V600 is not to be confused with the Sony MDR-V6 which only has a response rate of 10Hz – 20,000Hz. (Many search engines do confuse the two.)

[Sony] [Amazon] [Headphone Reviews] [ProductWiki]
Sony MDR-V600

Sony MDR-V600

Headphone Type: Closed Back
Fit Style: Circumaural
Frequency Response : 5Hz – 30,000Hz
Sensitivity : 106dB/mW
Impedance : 45 ohms
Plug Type: 1/8-inch to 1/4-inch adapter
Cable Type: curly
Cord Length: Approximately 2 meters
Driver Unit : 40mm
Weight: 0.5 pounds
Comments: Between these and the Samson CH700 I like these much better. They are lighter, and more compact. They have a larger frequency response than the CH700. And they are adjustable for various head shapes.
How I use them: I use these to actively monitor recorded audio. Mostly when using a Zoom H4n.

Rating by Hugh: 5.0 stars


1 David Nathan. 2010. Sound and unsound practices in documentary linguistics: towards an epistemology for audio. In Peter Austin (ed) Language Documentation and Description. Vol 7. London: SOAS. 262-284.
2 Will D. Reiman. 2010. Basic oral language documentation. Language Documentation & Conservation 4. 254-268. [url]