I should do a reading of Beatrix potter with my Mics... I think it would be great to see how one comes out.
I must say this open source solution looks amazing and I need to try it out!
As linguistics and language documentation interface with digital humanities there has been a lot of effort to time-align texts and audio/video materials. At one level this is rather trivial to do and has the backing of comercial media processes like subtitles in movies. However, at another level this task is often done in XML for every project (digital corpus curation) slightly differently. At the macro-scale the argument is that if the annotation of the audio is in XML and someone wants to do something else with it, then they can just convert the XML to whatever schema they desire. This is true.
However, one antidotal point that I have not heard in discussion of time aligned texts is specifications for Audio Dominant Text vs. Text Dominant Audio. This may not initially seem very important, so let me explain what I mean.
I have been looking into some
Over the last several months I have been looking for and comparing digitization services for audio, film, and for images (slides and more). I have been doing this as part of the ongoing work at the Language and Culture Archive to preserve the linguistic and cultural heritage of the people groups SIL International has encountered and served. I have not come to any hard and fast conclusions on “what is the best service provider”. This is partially because we are still looking at various out sourcing options and looking at multiple mediums is time consuming. Then there is also the issue of looking for archival standards and the creation of corporate policy for the digitization of these materials. I am presenting several names here as the results of several searches for digitization services providers.
Last month I was passed a short film on the BBC highlighting one of these providers. The short is well worth the watch because it highlights the reason and madness behind some of the work of digitization.
Several of the companies which have come to the top of the list.
- http://dijifi.com/ – Does the UN’s Collections
- http://www.digmypics.com/ – does work for National Geographic
- http://www.scancafe.com/ – Great consumer grade service
Doing it on our own
Another option the Archive has been looking at is to determine if the the quantity of the work is cost prohibitive to have professional done. Meaning that, we would be better served by buying the equipment and doing the work in house. So in the process I have also been looking at people’s experience with various kinds of equipment and technology used in scanning.
The following article has a fascinating presentation of marketing for audio products in the USA during the 20th century. http://thephoenix.com/Boston/music/129722-rise-and-fall-of-the-columbia-house-record-clu/
One of the things I enjoy is reading about the licenses that CC has retired. Usually they do great job of explaining why they are retiring the license. Understanding these use cases and their context is a really informative view on society.
One interesting retired license is the Sampling+ License. They did a really good job of explaining why they were retiring the license. One of the interesting exercise they talk about was how they had to go through the machine readable description to describe the license — basically mapping out the assertions.
Sound+ is interesting because it is targeted for sound. It makes me wonder if sound/audio can still be licensed under Creative Commons if it is not protected by copyright.
What is an archival version of an audio file?
An archival version of an audio file is a file which represents the original sound faithfully. In archiving we want to keep a version of the audio which can be used to make other products and also be used directly itself if needed. This is usually done through PCM. There are several file types which are associated with PCM or RAW uncompressed faithful (to the original signal) digital audio. These are:
- Standard Wave
- Wave 64
- Broadcast Wave Format (BWF)One way to understand the difference between audio file formats is understanding how different format are used. One place which has been helpful to me has been the DOBBIN website as they explain their software and how it can change audio from one PCM based format to another.
Each one of these file types has the flexibility to have various kinds of components. i.e. several channels of audio can be in the same file. Or one can have .wav files with different bit depths or sampling rates. But they are each a archive friendly format. Before one says that a file is suitable for archiving simply based on its file format one must also consider things like sample rates, bit depth, embedded metadata, channels in the file, etc. I was introduced to DOBBIN as an application resource for audio archivists by a presentation by Rob Poretti.  Rob Poretti. 2011. Audio Analysis and Processing in Multi-Media File Formats. ARSC 2011. [Accessed: 24 October 2011] http://www.arsc-audio.org/conference/audio2011/extra/48-Poretti.pptx [Link] One additional thing that is worth noting in terms of archival versions of digital audio pertains to born digital materials. Sometimes audio is recored directly to a lossy compressed audio format. It would be entirely appropriate to archive a born-digital filetype based on the content. However it should be noted that in this case the recordings should have been done in a PCM file format.
What is a presentation version? (of an audio file)
A presentation version is a file created with a content use in mind. There are several general characteristics of this kind of file:
- It is one that does not retain the whole PCM content.
- It is usually designed for a specific application. (Use on a portable device, or personal audio player)
- It can be thought of as a derivative product from an original audio or video stream.
In terms of file formats, there is not just one file format which is a presentation format. There are many formats. This is because there are many ways to use audio. For instance there are special audio file types optimized for various kinds of applications like:
- 3G and WiFi Audio and A/V services
- Internet audio for streaming and download
- Digital Radio
- Digital Satellite and Cable
- Portable playersA brief look a an explanation by Cube-Tec might help to get the gears moving. It is part of the inspiration for this post.
This means there is a long list of potential audio formats for the presentation form.
- AAC (aac)
- AC3 (ac3)
- Amiga IFF/SVX8/SV16 (iff)
- Apple/SGI (aiff/aifc)
- Audio Visual Research (avr)
- Berkeley/IRCAM/CARL (irca)
- CDXA, like Video-CD (dat)
- DTS (dts)
- DVD-Video (ifo)
- Ensoniq PARIS (paf)
- FastTracker2 Extended (xi)
- Flac (flac)
- Matlab (mat)
- Matroska (mkv/mka/mks)
- Midi Sample dump Format (sds)
- Monkey’s Audio (ape/mac)
- Mpeg 1&2 container (mpeg/mpg/vob)
- Mpeg 4 container (mp4)
- Mpeg audio specific (mp2/mp3)
- Mpeg video specific (mpgv/mpv/m1v/m2v)
- Ogg (ogg/ogm)
- Portable Voice format (pvf)
- Quicktime (qt/mov)
- Real (rm/rmvb/ra)
- Riff (avi/wav)
- Sound Designer 2 (sd2)
- Sun/NeXT (au)
- Windows Media (asf/wma/wmv)
Aside from just the file format difference in media files (.wav vs. .mp3) there are three other differences to be aware of:
- Media stream quality variations
- Media container formats
- Possibilities with embedded metadata
Media stream quality variations
Within the same file type there might be a variation of quality of audio. For instance Mp3 files can have a variable rate encoding or they can have a steady rate of encoding. When they have a steady rate of encoding they can have a High or a low rate of encoding. WAV files can also have a high or a low bit depth and a high or a low sample rate. Some file types can have more channels than others. For instance AAC files can have up to 48 channels where as Mp3 files can only have up to 5.1 channels. Various Contributors. 21 October 2011 at 21:44 . Wikipedia: Advanced Audio Coding, AAC’s improvements over MP3. http://en.wikipedia.org/wiki/Advanced_Audio_Coding#AAC.27s_improvements_over_MP3 … Continue reading
One argument I have heard in favor of saving disk space is to use lossless compression rather than WAV files for archive quality (and as archive version) recordings. As far as archiving is concerned, these lossless compression formats are still product oriented file formats. One thing to realize is that not every file format can hold the same kind of audio. Some formats have limits on the bit depth of the samples they can contain, or they have a limit on the number of audio channels they can have in a file. This is demonstrated in the table below, taken from wikipedia. Various Contributors. 21 October 2011 at 10:26 . Wikipedia:Comparison of audio formats, Technical Details of Lossless Audio Compression Formats. … Continue reading This is where understanding the relationship between a file format, a file extension and a media container format is really important.
|Audio compression format
|Bits per sample
|44.1 kHz to 192 kHz
|1 Hz to 655350 Hz
|8, 16, 20, 24, (32)
|4.3ms - 92ms (46.4ms typical)
|Yes: Up to 8 channels
|8, 11.025, 12, 16, 22.05, 24, 32, 44.1, 48 kHz
|Varies (see article)
|Varies (see article)
|Yes: Up to 6 channels
|1 to > 64
|Yes: Up to 65535 channels
|1 Hz to 16.777216 MHz
|varies in lossless mode; 2.2 minimum in lossy mode
|Yes: Up to 256 channels
|Windows Media Audio Lossless
|8, 11.025, 16, 22.05, 32, 44.1, 48, 88.2, 96 kHz
|Yes:Up to 6 channels
Media container formats
Media container formats can look like file types but they really are containers of file types (think like a folder with an extension). Often they allow for the bundling of audio and video files with metadata and then enable this set of data to act like a single file. On wikipedia there is a really nicecomparison of container formats.
MP4 is one such container format. Apple Lossless data is stored within an MP4 container with the filename extension .m4a – this extension is also used by Apple for AAC audio data in an MP4 container (same container, different audio encoding). However, Apple Lossless is not a variant of AAC (which is a lossy format), but rather a distinct lossless format that uses linear prediction similar to other lossless codecs such as FLAC and Shorten.  Various Contributors. 6 October 2011 at 03:11. Wikipedia: Apple Lossless. http://en.wikipedia.org/wiki/Apple_Lossless [Link] Files with a .m4a generally do not have a video stream even though MP4 containers can also have a video stream.
MP4 can contain:
- Video: MPEG-4 Part 10 (H.264) and MPEG-4 Part 2
Other compression formats are less used: MPEG-2 and MPEG-1
- Audio: Advanced Audio Coding (AAC)
Also MPEG-4 Part 3 audio objects, such as Audio Lossless Coding (ALS), Scalable Lossless Coding (SLS), MP3, MPEG-1 Audio Layer II (MP2), MPEG-1 Audio Layer I (MP1), CELP, HVXC (speech), TwinVQ, Text To Speech Interface (TTSI) and Structured Audio Orchestra Language (SAOL)
Other compression formats are less used: Apple Lossless
- Subtitles: MPEG-4 Timed Text (also known as 3GPP Timed Text).
Nero Digital uses DVD Video subtitles in MP4 files  Various Contributors. 11 October 2011 at 15:00. Wikipedia: MPEG-4 Part 14. http://en.wikipedia.org/wiki/.m4a [Link]
This means that an .mp3 file can be contained inside of an .mp4 file. This also means that audio files are not always what they seem to be on the surface. This is why I advocate for an archive of digital files which archives for a digital publishing house to also use technical metadata as discovery metadata. Filetype is not enough to know about a file.
Possibilities with embedded metadata
Audio files also very greatly on what kinds of embedded metadata and metadata formats they support. MPEG-7, BWF and MP4 all support embedded metadata. But this does not mean that audio players in the consumer market or prosumer market respect this embedded metadata. ARSC has in interesting report on the support for embedded metadata in audio recording software. Chris Lacinak, Walter Forsber. 2011. A Study of Embedded Metadata Support in Audio Recording Software: Summary of Findings and Conclusion. ARSC Technical Committee. … Continue reading Aside from this disregard for embedded metadata there are various metadata formats which are embedded in different file types, one common type ID3, is popular with .mp3 files. But even ID3 comes in different versions.
In archiving Language and Culture Materials our complete package often includes audio but rarely is just audio. However, understanding the audio components of the complete package help us understand what it needs to look like in the archive. In my experience in working with the Language and Culture Archive most contributors are not aware of the difference between Archival and Presentation versions of audio formats and those who think they do, generally are not aware of the differences in codecs used (sometimes with the same file extension). From the archive’s perspective this is a continual point of user/submitter education. This past week have taken the time to listen to a few presentations by Audio Archivist from the 2011 ARSC convention. These in general show that the kinds of issues that I have been dealing with in the Language and Culture Archive are not unique to our context.
- Anthony Seeger, Maureen Russell, David Martinelli. Ethnographic Sound Archives.http://www.arsc-audio.org/conference/audio2011/mp3/14.mp3 [Accessed 24 Oct. 2011]
- Wendy Sistrunk, Sandy Rodriguez. The Goldin Transcription Collection at UMKC. http://www.arsc-audio.org/conference/audio2011/mp3/16.mp3 [Accessed 24 Oct. 2011] [PDF visual of presentation]
- Birgitta Johnson. Gospel music in L.A.http://www.arsc-audio.org/conference/audio2011/mp3/39.mp3 [Accessed 24 Oct. 2011]
|Rob Poretti. 2011. Audio Analysis and Processing in Multi-Media File Formats. ARSC 2011. [Accessed: 24 October 2011] http://www.arsc-audio.org/conference/audio2011/extra/48-Poretti.pptx [Link]
|Various Contributors. 21 October 2011 at 21:44 . Wikipedia: Advanced Audio Coding, AAC’s improvements over MP3. http://en.wikipedia.org/wiki/Advanced_Audio_Coding#AAC.27s_improvements_over_MP3 [Link]
|Various Contributors. 21 October 2011 at 10:26 . Wikipedia:Comparison of audio formats, Technical Details of Lossless Audio Compression Formats. http://en.wikipedia.org/wiki/Comparison_of_audio_codecs#Technical_Details_of_Lossless_Audio_Compression_Formats [Link]
|Various Contributors. 6 October 2011 at 03:11. Wikipedia: Apple Lossless. http://en.wikipedia.org/wiki/Apple_Lossless [Link]
|Various Contributors. 11 October 2011 at 15:00. Wikipedia: MPEG-4 Part 14. http://en.wikipedia.org/wiki/.m4a [Link]
|Chris Lacinak, Walter Forsber. 2011. A Study of Embedded Metadata Support in Audio Recording Software: Summary of Findings and Conclusion. ARSC Technical Committee. http://www.arsc-audio.org/pdf/ARSC_TC_MD_Study.pdf [Link]
The company I work for has an archive for many kinds of materials. In recent times this company has moved to start a digital repository using DSpace. To facilitate contributions to the repository the company has built an Adobe AIR app which allows for the uploading of metadata to the metadata elements of DSpace as well as the attachement of the digital item to the proper bitstream. Totally Awesome.
However, one of the challenges is that just because the metadata is curated, collected and properly filed, it does not mean that the metadata is embedded in the digital items uploaded to the repository. PDFs are still being uploaded with the PDF’s author attribute set to Microsoft-WordMore about the metadata attributes of PDF/A can be read about on pdfa.org. Not only is the correct metadata and the wrong metadata in the same place at the same time (and being uploaded at the same time) later, when a consumer of the digital file downloads the file, only the wrong metadata will travel with the file. This is not just happening with PDFs but also with .mp3, .wav, .docx, .mov, .jpg and a slew of other file types. This saga of bad metadata in PDFs has been recognized since at least 2004 by James Howison & Abby Goodrum. 2004. Why can’t I manage academic papers like MP3s? The evolution and intent of Metadata standards.
So, today I was looking around to see if Adobe AIR can indeed use some of the available tools to propagate the correct metadata in the files before upload so that when the files arrive in DSpace that they will have the correct metadata.
- The first step is to retrieve metadata from files. It seems that Adobe AIR can do this with PDFs. (One would hope so as they are both brain children of the geeks at Adobe.) However, what is needed in this particular set up is a two way street with a check in between. We would need to overwrite what was there with the data we want there.
- However, as of 2009, there were no tools in AIR which could manipulate exif Data (for photos).
- But it does look like the situation is more hopeful for working with audio metadata.
Three Lingering Thoughts
- Even if the Resource and Metadata Packager has the abilities to embed the metadata in the files themselves, it does not mean that the submitters would know about how to use them or why to use them. This is not, however, a valid reason to not include functionality in a development project. All marketing aside, an archive does have a responsibility to consumers of the digital content, that the content will be functional. Part of today’s “functional” is the interoperability of metadata. Consumers do appreciate – even expect – that the metadata will be interoperable. The extra effort taken on the submitting end of the process, pays dividends as consumers use the files with programs like Picasa, iPhoto, PhotoShop, iTunes, Mendeley, Papers, etc.
- Another thought that comes to mind is that When one is dealing with large files (over 1 GB) It occurs to me that there is a reason for making a “preview” version of a couple of MB. That is if I have a 2 GB audio file, why not make 4 MB .mp3 file for rapid assessment of the file to see if it is worth downloading the .wav file. It seems that a metadata packager could also create a presentation file on the fly too. This is no-less true with photos or images. If a command-line tool could be used like imagemagick, that would be awesome.
- This problem has been addressed in the open source library science world. In fact a nice piece of software does live out there. It is called the Metadata Extraction Tool. It is not an end-all for all of this archive’s needs but it is a solution for some needs of this type.