I have found this comparison of RDF/RDFa and micro-formats really helpful. http://manu.sporny.org/2011/uber-comparison-rdfa-md-uf/
Category Archives: Digital Archival
DSpace and the Presentation Layer
Drupal
Because I have been on the team doing the SIL.org redesign, I have been looking at the Open Source landscape looking at what is available to connect Drupal with DSpace data stores. We are planning on making DSpace the back-end repository, with another CMS running the presentation and interactive layers. I found a module which parses DSpace's XML feeds in development. However, this is not the only thing that I am looking at. I am also looking at how we might deploy Omeka. Presenting the entire contents of a Digital Language and Culture Archive, and citations for their physical contents is no small task. In addition to past content there is also future content. That is to say archiving is also not devoid of publishing - so there is also the PKP project [sic redundant]. (SIL also currently has a publishing house, whose content need CSV or version control and editorial workflows, which interact with archiving and presentation functions.)
Omeaka
Wally Grotophorst has a really good reflection on Omeaka and DSpace, I am not sure that it is current but it does present the problem space quite well. [1]Wally Grotophorst. 4 March 2008. DSpace And Omeka. iNODE: The weblog of Digital Programs and Systems at George Mason University Libraries. http://timesync.gmu.edu/wordpress/?p=485 . [Accessed: 26 … Continue reading Tom Scheinfeldt at Omeka also has a nice write up on why Omeka exists, titled "Omeka and It's peers". It is really important to understand Omeka's place in the eco system of content delivery to content consumers by qualified site administrators. [2] Tom Scheinfeldt. 21 September 2010. Omeka and It's peers. http://omeka.org/blog/2010/09/21/omeka-and-peers/ [Accessed: 26 November 2011] [Link] [Also Posted on Tom's Blog]
@Mire talks about What DSpace could learn from Omeka. [3] @Mire. 20 May 2010. What DSpace could learn from Omeka. http://www.facebook.com/notes/mire/what-dspace-could-learn-from-omeka/393758568767 . [Accessed: 26 November 2011] [Link]
Dspace Mailing list discussion discussing some DSpace technologies for mixing with OAI-ORE and Fedora, Omeka, and Drupal.
http://omeka.org/forums/topic/omeka-and-harvesting-from-dspace
http://omeka.org/forums/topic/import-to-dspace
References
↑1 | Wally Grotophorst. 4 March 2008. DSpace And Omeka. iNODE: The weblog of Digital Programs and Systems at George Mason University Libraries. http://timesync.gmu.edu/wordpress/?p=485 . [Accessed: 26 November 2011] [Link] |
---|---|
↑2 | Tom Scheinfeldt. 21 September 2010. Omeka and It's peers. http://omeka.org/blog/2010/09/21/omeka-and-peers/ [Accessed: 26 November 2011] [Link] [Also Posted on Tom's Blog] |
↑3 | @Mire. 20 May 2010. What DSpace could learn from Omeka. http://www.facebook.com/notes/mire/what-dspace-could-learn-from-omeka/393758568767 . [Accessed: 26 November 2011] [Link] |
Metadata for Educational Materials
I have been following Learning Resource Metadata Initiative (LRMI), a collaborative effort between Creative Commons and the Association of Educational Publishers[1]Creative Commons. 7 June 2011. Creative Commons & the Association of Educational Publishers to establish a common learning resources framework. http://creativecommons.org/weblog/entry/27603 . … Continue reading , with some interest as I start to look at SIL.org and potential services and resources offered through SIL.org are merged with the larger world of well described data.
https://youtu.be/-1QEkA9qbwA
SIL has a long tradition of providing linguistic training. With the digital revolution, it only seems right that these training resources would be described appropriately in the educational arena. It will be interesting to look at LRMI as it develops over the next few months. And then to think about applying it in the context of Drupal.
The Record Club
The following article has a fascinating presentation of marketing for audio products in the USA during the 20th century. http://thephoenix.com/Boston/music/129722-rise-and-fall-of-the-columbia-house-record-clu/
Retired License for Audio
One of the things I enjoy is reading about the licenses that CC has retired. Usually they do great job of explaining why they are retiring the license. Understanding these use cases and their context is a really informative view on society.
One interesting retired license is the Sampling+ License. They did a really good job of explaining why they were retiring the license. One of the interesting exercise they talk about was how they had to go through the machine readable description to describe the license — basically mapping out the assertions.
Sound+ is interesting because it is targeted for sound. It makes me wonder if sound/audio can still be licensed under Creative Commons if it is not protected by copyright.
Did we do input Sanitizing?
Image found at: http://xkcd.com/327/
Teaching FLEx in Malaysia
In October, Becky and I were invited to present FLEx at the Universiti of Malaysia, Sabah as part of a workshop for compiling native dictionaries and managing cultural data. I learned a lot about dictionaries, about using FLEx to organize dictionary data, about Webonary and about Malaysia.
One of the things this workshop helped me to clearly articulate was that there are four knowledge content areas which dictionary creators need:
- Knowledge about Theoretical Linguistics to understand the language being described and the categories possible in the dictionary.
- Knowledge about the language being analyzed and described so that they can apply the appropriate options available to this situation.
- Knowledge about how to manage the editorial process for the dictionary (including entry submission).
- Knowledge about how to use the software to implement the editorial process.
This workshop’s focus was only on the software used to implement the editorial process (mostly the data collection part of the editorial process). So in some ways it felt like we weren’t giving the participants all the tools they will need (or even showing them all the tools they will need). But we had to realize that it is not our responsibility to give them all the tools they need or to expose them to these issues. They need local contacts for that. Regardless of these issue we were still ecstatic that there were about 80 people in attendance.
Becky took most of the sessions on FLEx. She presented on using FLEx as a tool for collecting words and various things about words. We covered several input methods and features in the application.
I presented a session on explaining how to get data out of FLEx. We talked about putting dictionary data on the web and turning it into .epub files.
I think one of the more interesting things that I learned was about expectations, culture and photographs.
Many people wanted photographs with us (or of us). This is not totally unexpected. What was unexpected was that rather than taking one photo and sharing it (passing it around), everyone wanted their own picture. Not their own picture with us but a picture with us made with their own camera! It was in that moment that I had an epiphany. Having training in Language Documentation I am aware and concerned with rules and laws concerning privacy. In the U.S. when dealing with issues of informed consent and intellectual property, it can not be assumed that if I want to take a picture of you that I, the owner of the camera, own the picture. Furthermore it can not be assumed that I have the right to do with that picture as I please. i.e. Post it to the internet. This may be in part that our laws are based on our semantics. It may be in part our culture. But there I realized that if the photo is taken with your camera you own the photo. You can do with it as you please. The asking for permission is that you have asked for permission to take the photo.
Presentation version vs. Archival version of Digital Audio files
What is an archival version of an audio file?
An archival version of an audio file is a file which represents the original sound faithfully. In archiving we want to keep a version of the audio which can be used to make other products and also be used directly itself if needed. This is usually done through PCM. There are several file types which are associated with PCM or RAW uncompressed faithful (to the original signal) digital audio. These are:
- Standard Wave
- AIFF
- Wave 64
- Broadcast Wave Format (BWF)One way to understand the difference between audio file formats is understanding how different format are used. One place which has been helpful to me has been the DOBBIN website as they explain their software and how it can change audio from one PCM based format to another.
Each one of these file types has the flexibility to have various kinds of components. i.e. several channels of audio can be in the same file. Or one can have .wav files with different bit depths or sampling rates. But they are each a archive friendly format. Before one says that a file is suitable for archiving simply based on its file format one must also consider things like sample rates, bit depth, embedded metadata, channels in the file, etc. I was introduced to DOBBIN as an application resource for audio archivists by a presentation by Rob Poretti. [1] Rob Poretti. 2011. Audio Analysis and Processing in Multi-Media File Formats. ARSC 2011. [Accessed: 24 October 2011] http://www.arsc-audio.org/conference/audio2011/extra/48-Poretti.pptx [Link] One additional thing that is worth noting in terms of archival versions of digital audio pertains to born digital materials. Sometimes audio is recored directly to a lossy compressed audio format. It would be entirely appropriate to archive a born-digital filetype based on the content. However it should be noted that in this case the recordings should have been done in a PCM file format.
What is a presentation version? (of an audio file)
A presentation version is a file created with a content use in mind. There are several general characteristics of this kind of file:
- It is one that does not retain the whole PCM content.
- It is usually designed for a specific application. (Use on a portable device, or personal audio player)
- It can be thought of as a derivative product from an original audio or video stream.
In terms of file formats, there is not just one file format which is a presentation format. There are many formats. This is because there are many ways to use audio. For instance there are special audio file types optimized for various kinds of applications like:
- 3G and WiFi Audio and A/V services
- Internet audio for streaming and download
- Digital Radio
- Digital Satellite and Cable
- Portable playersA brief look a an explanation by Cube-Tec might help to get the gears moving. It is part of the inspiration for this post.
This means there is a long list of potential audio formats for the presentation form.
- AAC (aac)
- AC3 (ac3)
- Amiga IFF/SVX8/SV16 (iff)
- Apple/SGI (aiff/aifc)
- Audio Visual Research (avr)
- Berkeley/IRCAM/CARL (irca)
- CDXA, like Video-CD (dat)
- DTS (dts)
- DVD-Video (ifo)
- Ensoniq PARIS (paf)
- FastTracker2 Extended (xi)
- Flac (flac)
- Matlab (mat)
- Matroska (mkv/mka/mks)
- Midi Sample dump Format (sds)
- Monkey’s Audio (ape/mac)
- Mpeg 1&2 container (mpeg/mpg/vob)
- Mpeg 4 container (mp4)
- Mpeg audio specific (mp2/mp3)
- Mpeg video specific (mpgv/mpv/m1v/m2v)
- Ogg (ogg/ogm)
- Portable Voice format (pvf)
- Quicktime (qt/mov)
- Real (rm/rmvb/ra)
- Riff (avi/wav)
- Sound Designer 2 (sd2)
- Sun/NeXT (au)
- Windows Media (asf/wma/wmv)
Aside from just the file format difference in media files (.wav vs. .mp3) there are three other differences to be aware of:
- Media stream quality variations
- Media container formats
- Possibilities with embedded metadata
Media stream quality variations
Within the same file type there might be a variation of quality of audio. For instance Mp3 files can have a variable rate encoding or they can have a steady rate of encoding. When they have a steady rate of encoding they can have a High or a low rate of encoding. WAV files can also have a high or a low bit depth and a high or a low sample rate. Some file types can have more channels than others. For instance AAC files can have up to 48 channels where as Mp3 files can only have up to 5.1 channels. [2]Various Contributors. 21 October 2011 at 21:44 . Wikipedia: Advanced Audio Coding, AAC’s improvements over MP3. http://en.wikipedia.org/wiki/Advanced_Audio_Coding#AAC.27s_improvements_over_MP3 … Continue reading
One argument I have heard in favor of saving disk space is to use lossless compression rather than WAV files for archive quality (and as archive version) recordings. As far as archiving is concerned, these lossless compression formats are still product oriented file formats. One thing to realize is that not every file format can hold the same kind of audio. Some formats have limits on the bit depth of the samples they can contain, or they have a limit on the number of audio channels they can have in a file. This is demonstrated in the table below, taken from wikipedia. [3]Various Contributors. 21 October 2011 at 10:26 . Wikipedia:Comparison of audio formats, Technical Details of Lossless Audio Compression Formats. … Continue reading This is where understanding the relationship between a file format, a file extension and a media container format is really important.
Audio compression format | Algorithm | Sample Rate | Bits per sample | Latency | Stereo | Multichannel |
---|---|---|---|---|---|---|
ALAC | Lossless | 44.1 kHz to 192 kHz | 16, 24[41] | ? | Yes | Yes |
FLAC | Lossless | 1 Hz to 655350 Hz | 8, 16, 20, 24, (32) | 4.3ms - 92ms (46.4ms typical) | Yes | Yes: Up to 8 channels |
Monkey's Audio | Lossless | 8, 11.025, 12, 16, 22.05, 24, 32, 44.1, 48 kHz | ? | ? | Yes | No |
RealAudio Lossless | Lossless | Varies (see article) | Varies (see article) | Varies | Yes | Yes: Up to 6 channels |
True Audio | Lossless | 0–4 GHz | 1 to > 64 | ? | Yes | Yes: Up to 65535 channels |
WavPack Lossless | Lossless, Hybrid | 1 Hz to 16.777216 MHz | varies in lossless mode; 2.2 minimum in lossy mode | ? | Yes | Yes: Up to 256 channels |
Windows Media Audio Lossless | Lossless | 8, 11.025, 16, 22.05, 32, 44.1, 48, 88.2, 96 kHz | 16, 24 | >100ms | Yes | Yes:Up to 6 channels |
Media container formats
Media container formats can look like file types but they really are containers of file types (think like a folder with an extension). Often they allow for the bundling of audio and video files with metadata and then enable this set of data to act like a single file. On wikipedia there is a really nicecomparison of container formats.
MP4 is one such container format. Apple Lossless data is stored within an MP4 container with the filename extension .m4a – this extension is also used by Apple for AAC audio data in an MP4 container (same container, different audio encoding). However, Apple Lossless is not a variant of AAC (which is a lossy format), but rather a distinct lossless format that uses linear prediction similar to other lossless codecs such as FLAC and Shorten. [4] Various Contributors. 6 October 2011 at 03:11. Wikipedia: Apple Lossless. http://en.wikipedia.org/wiki/Apple_Lossless [Link] Files with a .m4a generally do not have a video stream even though MP4 containers can also have a video stream.
MP4 can contain:
- Video: MPEG-4 Part 10 (H.264) and MPEG-4 Part 2
Other compression formats are less used: MPEG-2 and MPEG-1 - Audio: Advanced Audio Coding (AAC)
Also MPEG-4 Part 3 audio objects, such as Audio Lossless Coding (ALS), Scalable Lossless Coding (SLS), MP3, MPEG-1 Audio Layer II (MP2), MPEG-1 Audio Layer I (MP1), CELP, HVXC (speech), TwinVQ, Text To Speech Interface (TTSI) and Structured Audio Orchestra Language (SAOL)
Other compression formats are less used: Apple Lossless - Subtitles: MPEG-4 Timed Text (also known as 3GPP Timed Text).
Nero Digital uses DVD Video subtitles in MP4 files [5] Various Contributors. 11 October 2011 at 15:00. Wikipedia: MPEG-4 Part 14. http://en.wikipedia.org/wiki/.m4a [Link]
This means that an .mp3 file can be contained inside of an .mp4 file. This also means that audio files are not always what they seem to be on the surface. This is why I advocate for an archive of digital files which archives for a digital publishing house to also use technical metadata as discovery metadata. Filetype is not enough to know about a file.
Possibilities with embedded metadata
Audio files also very greatly on what kinds of embedded metadata and metadata formats they support. MPEG-7, BWF and MP4 all support embedded metadata. But this does not mean that audio players in the consumer market or prosumer market respect this embedded metadata. ARSC has in interesting report on the support for embedded metadata in audio recording software. [6]Chris Lacinak, Walter Forsber. 2011. A Study of Embedded Metadata Support in Audio Recording Software: Summary of Findings and Conclusion. ARSC Technical Committee. … Continue reading Aside from this disregard for embedded metadata there are various metadata formats which are embedded in different file types, one common type ID3, is popular with .mp3 files. But even ID3 comes in different versions.
In archiving Language and Culture Materials our complete package often includes audio but rarely is just audio. However, understanding the audio components of the complete package help us understand what it needs to look like in the archive. In my experience in working with the Language and Culture Archive most contributors are not aware of the difference between Archival and Presentation versions of audio formats and those who think they do, generally are not aware of the differences in codecs used (sometimes with the same file extension). From the archive’s perspective this is a continual point of user/submitter education. This past week have taken the time to listen to a few presentations by Audio Archivist from the 2011 ARSC convention. These in general show that the kinds of issues that I have been dealing with in the Language and Culture Archive are not unique to our context.
- Anthony Seeger, Maureen Russell, David Martinelli. Ethnographic Sound Archives.http://www.arsc-audio.org/conference/audio2011/mp3/14.mp3 [Accessed 24 Oct. 2011]
- Wendy Sistrunk, Sandy Rodriguez. The Goldin Transcription Collection at UMKC. http://www.arsc-audio.org/conference/audio2011/mp3/16.mp3 [Accessed 24 Oct. 2011] [PDF visual of presentation]
- Birgitta Johnson. Gospel music in L.A.http://www.arsc-audio.org/conference/audio2011/mp3/39.mp3 [Accessed 24 Oct. 2011]
References
↑1 | Rob Poretti. 2011. Audio Analysis and Processing in Multi-Media File Formats. ARSC 2011. [Accessed: 24 October 2011] http://www.arsc-audio.org/conference/audio2011/extra/48-Poretti.pptx [Link] |
---|---|
↑2 | Various Contributors. 21 October 2011 at 21:44 . Wikipedia: Advanced Audio Coding, AAC’s improvements over MP3. http://en.wikipedia.org/wiki/Advanced_Audio_Coding#AAC.27s_improvements_over_MP3 [Link] |
↑3 | Various Contributors. 21 October 2011 at 10:26 . Wikipedia:Comparison of audio formats, Technical Details of Lossless Audio Compression Formats. http://en.wikipedia.org/wiki/Comparison_of_audio_codecs#Technical_Details_of_Lossless_Audio_Compression_Formats [Link] |
↑4 | Various Contributors. 6 October 2011 at 03:11. Wikipedia: Apple Lossless. http://en.wikipedia.org/wiki/Apple_Lossless [Link] |
↑5 | Various Contributors. 11 October 2011 at 15:00. Wikipedia: MPEG-4 Part 14. http://en.wikipedia.org/wiki/.m4a [Link] |
↑6 | Chris Lacinak, Walter Forsber. 2011. A Study of Embedded Metadata Support in Audio Recording Software: Summary of Findings and Conclusion. ARSC Technical Committee. http://www.arsc-audio.org/pdf/ARSC_TC_MD_Study.pdf [Link] |
Working on a Dynamic Left Menu Bar
It seems to be that the logical place to have a context based and role based menu would be on the left side. That being the assumption, the question is how to go about it, what does it contain, why does it need to change, when should it change.
I have been looking at several widgets and custom field plugins. Some of these deserve a deeper look. We might even should take a deeper look at how we are implementing custom fields and our plugin so that we have an abstraction layer.
Here are some plugin in options which seemed to be able to handle some of this complexity.
There are really three places that custom field need to be used: so this post is not just about a dynamic left side bar. It is about a dynamic left sidebar that is run off of values in custom fields in the main post. So this post is talking about approaching custom fields with the best strategy.
- Just Custom Fields for WordPress plugin: This plugin adds custom fields for standard and custom post types in WordPress. After installation you will see simple settings page which is self-explanatory to use.
I found two Posts about this plugin to be really helpful: http://justcoded.com/just-labs/just-custom-fields-for-wordpress-plugin, and http://justcoded.com/implementation/wordpress-3-vs-drupal-cck/.
- One of the ideas for the left side bar is to have a listing of related content. Related content could be all the files which belong in a single audio package, or all the digital files belonging to a physical item.
This is where Related Widgets Plugin For WordPress comes in. The Related Widgets plugin for WordPress introduces multi-use widgets that allow you to list related posts or pages. To use the plugin, browse Appearance / Widgets, insert a Related Widget where you want it to be, and configure it as appropriate. You can optionally filter the results by category or section. - List Related Attachments: List Related Attachments is a sidebar widget that will display a list of filtered attachments related to the current post. This might be useful in one of two ways: listing associated content, depending on how we implement it. Or listing the photos on the right-hand sidebar.
- Custom Field Template: This plugin adds the default custom fields on the Write Post/Page.
- Custom Field List Widget: This plugin creates sidebar widgets with lists of the values of custom fields. The listed values can be (hyper-)linked in different ways. One possibility is to create a list of all values of a custom field, which will be groupped by their post (or page) and (hyper-)linked automatically to this post (or page). Another possibility is that you can create a list of all unique values of a custom field and specify links as you like (or not).
- Get Custom Field Values:
Get Custom Field Values allows the admin to use widgets, shortcodes, and/or template tags to easily retrieve and display custom field values for posts or pages. - Advanced Custom Fields:
Advanced Custom Fields is the perfect solution for any wordpress website which needs more flexible data like other Content Management Systems. - Easy Custom Fields:This is a set of extendable classes to allow easy handling of custom post fields. https://wordpress.org/extend/plugins/easy-custom-fields/
- Advanced Custom Field Widget: The Advanced Custom Field Widget is an extension of the Custom Field Widget by Scott Wallick, and displays values of custom field keys, allowing post- and page-specific meta sidebar content.
- Custom Post Template: Provides a drop-down to select different templates for posts from the post edit screen. The templates replace single.php for the specified post.https://wordpress.org/extend/plugins/custom-post-template/
WordPress Custom Fields, Part I: The Basics : http://perishablepress.com/press/2008/12/17/wordpress-custom-fields-tutorial/
Custom Post Type UI: Admin UI for creating custom post types and custom taxonomies in WordPress.
https://wordpress.org/extend/plugins/custom-post-type-ui/
Admin Panel of the Plugin
Needs to be able to:
- Define the Metadata Values
- Do they have controlled vocabularies?
- What kind of input will they use?
- Define the work Stages
- Including sub-work stages
- What are the Metadata Values in each Stage
- What are the help texts for each Metadata question
- What is the Part shown for the Keys?
- Does the Plugin create a special Browse Page?