The Future Is Now

Thank You, Ada

Bess Sadler, Andromeda Yelton, Chris Bourg and Mark Matienzo have stepped forward to pledge to match up to $5120 of donations to the Ada Initiative, a non-profit organization that supports the participation of women in open source and culture. I completely support this very generous act; the Ada Initiative does incredibly important work, and I’m extremely proud of my friends and of the library community for supporting them.

I’ve written before about how I stopped pursuing a career in tech in my late teens. I saw few female (or trans) role models in the tech industry; at a time when my self-image and self-identity was its most fragile, I pivoted away from something I saw as too masculine, without room for me. The Ada Initiative’s conferences and advocacy work have done a lot to help make the open tech world a more welcoming space.

There are a lot of reasons why women don’t enter, or don’t stay, in the tech industry. The last few weeks, when harassment campaigns have targeted women to drive them out of the video game industry, have made me reflect on how important it is to work to make online communities and conferences safe spaces.

The Ada Initiative’s conference policy advocacy work and their example anti-harassment policy have been instrumental in helping many organizations and projects adopt their own policies. Both the Code4lib anti-harassment policy and the Homebrew code of conduct, for example, were inspired by and partially based on the Ada Initiative’s work. Seeing organizations adopt these policies has done a lot to make me feel comfortable, and given me confidence that both preventing and dealing with these forms of harassment is something that they see as important. My hope is that future generations of women will feel comfortable entering and interacting in these spaces in ways that others may not have in the past.

In just a few years, the Ada Initiative has helped make sure that these policies are becoming the norm and not the exception for conferences and online communities. I’m so grateful we have their advocacy; please consider donating to help them do even more great things.

Mind the Dust

Please excuse the sparseness! I’m in the process of migrating from Wordpress to Octopress; I haven’t had time to change the default team or migrate over my older posts.

-no-cpp-precomp: The Compiler Flag That Time Forgot

I’m often surprised how long software can keep trying to use compatibility features ages after their best-by date.

Now that GCC 4.8 builds on Tiger1, I’ve been testing as much software with it as I can. When building ncurses using GCC 4.8.1, though, I came across a strange error message:

gcc-4.8: error: unrecognized command line option ‘-no-cpp-precomp’

It built fine with the gcc-4.0 that came with the OS. GCC rarely removes support for flags like this, so I assumed it must be an Apple-only flag. Unfortunately, it wasn’t listed in the manpage at all, and the internet was no help either – the search results were full of users confused about the same build failures, or trying to figure out what it does. All I could find was confirmation that it’s an obsolete Apple-only flag.

Not finding anything, I decided to find out straight from the horse’s mouth and try source-diving. Luckily Apple publishes the source code for all obsolete versions of their tools at their Apple Open Source site.

Recent versions of Apple GCC don’t include the flag anywhere in their source. The only place it’s still referenced is in a few configure scripts and changelogs, such as these:

1
2
3
4
5
# The spiffy cpp-precomp chokes on some legitimate constructs in GCC
# sources; use -no-cpp-precomp to get to GNU cpp.
# Apple's GCC has bugs in designated initializer handling, so disable
# that too.
stage1_cflags="-g -no-cpp-precomp -DHAVE_DESIGNATED_INITIALIZERS=0"

In several releases prior to that, for instance gcc-5493, the flag is explicitly mentioned as being retained for compatibility and is a no-op:

1
2
/* APPLE LOCAL cpp-precomp compatibility */
%{precomp:%ecpp-precomp not supported}%{no-cpp-precomp:}%{Wno-precomp:}\

The last time it was actually documented was in gcc-1765’s install.texi, shipped as a part of the WWDC 2004 Developer Preview of Xcode, which also provides a hint as to what the flag actually did:

It’s a good idea to use the GNU preprocessor instead of Apple’s @file{cpp-precomp} during the first stage of bootstrapping; this is automatic when doing @samp{make bootstrap}, but to do it from the toplevel objdir you will need to say @samp{make CC=’cc -no-cpp-precomp’ bootstrap}.

So this partially answers our question: Apple shipped an alternate preprocessor, and -no-cpp-precomp triggers the use of the GCC cpp instead. I can only assume this was a leftover that had yet to be excised, because the flag itself was still a no-op at that time. To actually find a version where the flag does something, we have to go all the way back to the December 2002 developer tools, whose gcc-937.2 actually has code that uses the flag. This particular build of GCC is Apple’s version of gcc-2.95, and it appears to be the very last where it had any effect. Interestingly, the #ifdef that guards this particular block of code is “#ifdef NEXT_CPP_PRECOMP” – suggesting that this dates back to NeXT, rather than Apple.

To actually find out what this means, O’Reilly’s Mac OS X for Unix Geeks, from September 2002, has a nice explanation in chapter 5:

Precompiled header files are binary files that have been generated from ordinary C header files and that have been preprocessed and parsed using cpp-precomp. When such a precompiled header is created, both macros and declarations present in the corresponding ordinary header file are sorted, resulting in a faster compile time, a reduced symbol table size, and consequently, faster lookup. Precompiled header files are given a .p extension and are produced from ordinary header files that end with a .h extension.

Chapter 4 also provides a nice explanation of why -no-cpp-precomp was desirable:

cpp-precomp is faster than cpp. However, some code may not compile with cpp-precomp. In that case, you should invoke cpp by instructing cc not to use cpp-precomp.

So there we have it – -no-cpp-precomp became somewhat widely used in Unix software as a compatibility measure to prevent Apple’s cpp-precomp feature from breaking their headers, and has stuck around more than a decade since the last time it’s actually done anything.


  1. More on that in a future blog post.

Software Archaeology: Apple’s Cctools

One of the things I’ve been working on in Tigerbrew is backporting modern Apple build tools. The latest official versions, bundled with Xcode 2.5, are simply too old to be able to build some software. (For example, the latest GCC version available is 4.0.)

In the process, I’ve found some pretty fascinating bits of history littered through the code and makefiles for Apple’s build tools. Here are some findings from Apple’s cctools1 package:

1
2
3
4
5
6
7
8
9
10
11
12
13
# MacOS X (the default)
#  RC_OS is set to macos (the top level makefile does this)
#  RC_CFLAGS needs -D\__KODIAK__ when RC_RELEASE is Kodiak (Public Beta),
#      to get the Public Beta directory layout.
#  RC_CFLAGS needs -D\__GONZO_BUNSEN_BEAKER__ when RC_RELEASE is Gonzo, 
#      Bunsen or Beaker to get the old directory layout. 
#  The code is #ifdef'ed with \__Mach30__ is picked up from <mach/mach.h> 
# Rhapsody 
#  RC_OS is set to teflon 
#  RC_CFLAGS needs the additional flag -D__HERA__ 
# Openstep
#  RC_OS is set to nextstep 
#  RC_CFLAGS needs the additional flag -D__OPENSTEP__ 

This comment from near the top of cctools’s Makefile lists some of the valid build targets, which includes:

  • Kodiak, which was the Mac OS X public beta from September, 2000
  • Gonzo (Developer Preview 4), Bunsen (Developer Preview 3), and Beaker (PR2)
  • Rhapsody (internal name for the OS X project as a whole), Hera (Mac OS X Server 1.0, released 1999), and teflon (unknown to me)
  • OPENSTEP, NeXT’s implementation of their own OpenStep API

From further down in the same Makefile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
ifeq "macos" "$(RC_OS)"
  TRIE := $(shell if [ "$(RC_MAJOR_RELEASE_TRAIN)" = "Tiger" ] || \
             [ "$(RC_MAJOR_RELEASE_TRAIN)" = "Leopard" ] || \
             [ "$(RC_RELEASE)" = "Puma"      ]  || \
             [ "$(RC_RELEASE)" = "Jaguar"    ]  || \
             [ "$(RC_RELEASE)" = "Panther"   ]  || \
             [ "$(RC_RELEASE)" = "MuonPrime" ]  || \
             [ "$(RC_RELEASE)" = "MuonSeed"  ]  || \
             [ "$(RC_RELEASE)" = "SUPanWheat" ] || \
             [ "$(RC_RELEASE)" = "Tiger" ]      || \
             [ "$(RC_RELEASE)" = "SUTiSoho" ]   || \
             [ "$(RC_RELEASE)" = "Leopard" ]    || \
             [ "$(RC_RELEASE)" = "Vail" ]       || \
             [ "$(RC_RELEASE)" = "SugarBowl" ]  || \
             [ "$(RC_RELEASE)" = "BigBear" ]    || \
             [ "$(RC_RELEASE)" = "Homewood" ]   || \
             [ "$(RC_RELEASE)" = "Kirkwood" ]   || \
             [ "$(RC_RELEASE)" = "Northstar" ]; then \
                echo "" ; \ 

A lot of familiar cats here, along with a couple of early iOS versions (SugarBowl, BigBear) and a lot of names I’m not familiar with. (Please leave a comment if you have any insight!) As far as I know “Vail” was the Mac LC III from 1993 with no NeXT connection, but I’m sure it must be referring to something else.

From elsewhere in the tree, there’s code to support various CPU architectures. Aside from the usual suspects (PPC, i386), there are some other interesting finds:

  • HP/PA, aka PA-RISC, a CPU family from HP; some versions of NeXTSTEP were shipped for this
  • i860, an Intel CPU used in the NeXTdimension graphics board for NeXT’s computers
  • M680000, the classic Motorola CPU family, used in the original NeXT computers
  • M880000, a Motorola CPU family; NeXT considered using this in their original hardware but never shipped a product using it
  • SPARC, a CPU family from Sun; some versions of NeXTSTEP were shipped for this

I find it fascinating that, even now, cctools still carries the (presumably unmaintained) code for all of these architectures Apple no longer uses.


  1. Apple’s equivalent of binutils.

Tiger’s `which` Is Terrible; or, Necessity Is the Mother of Invention

One of the most useful things about running software in unusual configurations is that sometimes it exposes unexpected flaws you never knew you had.

The which utility is one of those commandline niceties you never really think about until it’s not there anymore. While sometimes implemented as a shell builtin1, it’s also frequently shipped as a standalone utility. Apple’s modern version, which is part of the shell_utils package and crystallized around Snow Leopard, works like this:

  • If the specified tool is found on the path, prints the path to the first version found (e.g., the one the shell would execute), and exits 0.
  • If the specified tool isn’t found, prints a newline and exits 1.

This version of the tool is really useful in shell scripts to determine a) if a program is present, and b) where it’s located, and until fairly recently Homebrew used it extensively. Unfortunately, early on in my work on Tigerbrew, I discovered that Tiger’s version was… deficient. It works like this:

  • If the specified tool is found on the path, prints the path to the first version found, and exits 0.
  • If the specified tool isn’t found, prints a verbose message to stdout, and exits 0.

The lack of a meaningful exit status and the error message on stdout are both pretty poor behaviour for a CLI app, and broke Homebrew’s assumptions about how it should work.

To work around this, I replaced Homebrew’s wrapper function with a two-line native Ruby method for Tigerbrew, like so:

1
2
3
4
def which cmd
  dir = ENV['PATH'].split(':').find {|p| File.executable? File.join(p, cmd)}
  Pathname.new(File.join(dir, cmd)) unless dir.nil?
end

As it turns out, not only does it work better on Tiger, but this method is actually faster2 than shelling out like Homebrew did; process spawning is relatively expensive. As a result, I ended up using the new helper in Homebrew even though it wasn’t strictly necessary.

(And as for the commandline utility, Tigerbrew has a formula for the shell_cmds collection of utilities.)


  1. zsh does; bash doesn’t.

  2. On the millisecond scale, at least.

Adventures With Ruby 1.8.2

Homebrew has always used the version of Ruby which comes with OS X,1 a design decision I decided to keep with Tigerbrew. Tiger comes with Ruby 1.8.2, built on Christmas Day, 2004, and with a version of Ruby that old I went in steeling myself for the inevitable ton of compatibility issues.

On the whole I was pleasantly surprised. Most of what Homebrew uses is provided in exactly the same form, and while there are differences that range from puzzling2 to major3, pretty much everything Just Works.

Except, at first, for Pathname. Ruby’s Pathname class, which is an object-oriented wrapper around the File and Dir classes, is at the heart of Homebrew’s file management. The first time I tried to install something with the newborn Tigerbrew, I was quickly treated to a strange exception with an equally mysterious backtrace: Errno::ENOTDIR: Not a directory.

Curious, I dug in. I soon discovered that the bug occurred while Homebrew was unlinking an existing version of a package before beginning to install an upgrade. (For those not in the know, Homebrew installs software into isolated versioned prefixes. The active version of a given package is symlinked into the standard /usr/local locations.) Most of the files were linked and unlinked just fine, but a few files caused the method Pathname#unlink to throw an exception every time. Eventually I noticed a pattern — every symlink that Pathname choked on represented a directory. Once I noticed that, it clicked.

For those who don’t know, symlinks are actually treated on the filesystem level as special files containing their target as text. For most operations, symlinks transparently act as their targets. However, applications which hit the filesystem directly will see them as files — even when they point to directories. Since Pathname handles files and directories differently, handing its instance methods off to File or Dir as appropriate, the bug happened something like this:

  • The #unlink method is called on a Pathname object representing a symlink to a directory.
  • Pathname examines the object to see if it represents a file or directory, in order to determine whether to call File.unlink or Dir.unlink.
  • In doing so, Pathname follows the symlink to its target and examines the properties of the target.
  • Seeing that the target is a directory, Pathname calls Dir.unlink on the original symlink.
  • Dir.unlink raises Errno::ENOTDIR because, of course, the symlink isn’t a directory.

The overridden version of the method can be found here. The rest of Tigerbrew’s current backports are in Tigerbrew’s file extend/tiger.rb, for the curious.


  1. For predictability, and so the user doesn’t have to install Ruby before installing Homebrew.

  2. String’s [] operator always returns the sliced character’s ASCII ordinal, not a string.

  3. File#flock doesn’t exist in any form.

Introducing Tigerbrew

Some of you may know that my other gig is Homebrew, the package manager for Mac OS X. Over the last few months, I’ve been spending some time on a fork of Homebrew that’s starting to become usable enough that I think it’s ready to be announced.

When I was attending the AMIA1 conference in December, my partner and I were travelling together; while I was at the conference during the day, she worked from various places in Seattle on her laptop. Since it’s practically impossible to attend a modern conference without a laptop, and she uses a desktop at home, I dug out my 2005-era PowerBook G4 to take notes. It may be eight years old, but as soon as I opened it up I remembered why I loved that laptop so much. It’s still in great shape, and it feels like a crime to leave it sitting unused so much of the time.

It’s slow by modern standards, of course, but the thing really keeping it from being usable all the time is software. Apple’s left PowerPC behind as of Mac OS X Leopard2, and so have nearly all developers at this point. There are still a few developers carrying the torch (shoutouts to TenFourFox), but as a commandline junkie what I really need is an up-to-date shell[^2] and CLI software3. And as big Homebrew fan, as well as a developer, MacPorts just wasn’t going to cut it. Tigerbrew was born.

The first version of Tigerbrew was pulled together over an evening at the hotel after the first day of the conference, and I’ve been plugging away at it regularly since. At this point I’m proud to say that a significant number of packages build flawlessly,4 and thanks to some backports from newer versions of OS X5 Tigerbrew can supply a much more modern set of essential development tools than Apple provides.

Tigerbrew’s still very much an alpha, and there’s some more work needed until it’s mature, but at this point I consider it ready enough to announce to the world.6 If you have a PowerPC Mac yearning to be used again, why not give it a go?


  1. Association of Moving Image Archivists

  2. And many hardcore PowerPC users stick with their old Macs for OS 9 compatibility, which was last supported in Tiger.

  3. bash 2.5 doesn’t cut it.

  4. Even complex software with a lot of moving parts, like FFmpeg.

  5. I’m very indebted to the MacPorts developers, whose portfiles served as a reference for the buildsystems for several of these.

  6. Development’s been happening in the public for months, of course, and there are already a few other users out there.

PSA: Homebrew-digipres Repository Now Available!

Outside of archivy, I’m also a collaborator on Homebrew, the awesome, lightweight package manager for OS X. I’ve been building a private repository of niche packages which aren’t available in the core repository for some reason or another, and ended up collecting enough digital preservation tools to create a new digital preservation-focused repository. You can find the new homebrew-digipres here: https://github.com/mistydemeo/homebrew-digipres I’d welcome any contributions if you want to improve an existing formula, submit updates, or add a new package! Fork away.

File ID Hackathon Debrief: FITS Handles Video Now!

I took part in the 24-hour file ID hackathon November 16th. It was a fantastic event, and between us the 15-ish participants got a lot of practical work done. You can read more about it and what was accomplished at the CURATEcamp wiki.

I spent most of my time working with video content and with FITS, the File Identification Tool Set. FITS is a useful tool, but it’s traditionally had some problems that have held it back from being as effective as it could for digital preservation. Aside from its performance, which is an issue that still needs to be addressed, its support for audio-visual material has been pretty poor. I addressed a couple of the more serious items:

Its embedded Exiftool was badly out of date

FITS bundles its own versions of the various tools it uses, rather than use the versions installed elsewhere on the machine. In theory this is a good idea; incompatibilities in the tools it uses could subtly break its output. In practice, however, it means that FITS has missed out on a lot of format identification improvements its tools have made. Before the hackathon FITS included exiftool 7.74, which was released in April, 2009. Back then exiftool had only rudimentary video support, but it’s made enormous strides in past several years and now has very robust video metadata extraction. The first thing I did in FITS was update the embedded exiftool to the current release. That alone has made a big difference in format detection.

In the future I think it would be best to rethink the policy of embedding tools rather than using external copies, or at least provide the option to use another version the user has installed. exiftool is updated once every week or two and changes rapidly. I doubt FITS will be updated that frequently. A better option might be to recommend specific known-good versions of tools, but allow the user the option of running whichever tool version they prefer.

Its metadata mapping for video formats was primitive

FITS uses XSLT to map metadata fields from their native tag names to its own vocabulary, but the list of tags used for video was very short compared to other formats. As a result, a lot of potentially useful information from exiftool’s output was being discarded. Based on videos in my collection which had extensive embedded metadata, I beefed up FITS’s mapping table to enable it to grab many more common tags.

While this made a good short-term solution, it made me think a bit more about how FITS approaches mapping fields. In particular,

  1. FITS has separate mappings for types such as “image”, “video”, “audio.” In practice, though, many of these formats use the exact same tags to mean the same things; this means either some mapping logic is duplicated, or certain fields are skipped for some files even though they’re mapped for others. After looking at practical examples of how FITS maps images and videos, I’m not convinced that treating them separately is practical.

  2. Beyond that, FITS uses file extension to determine whether a file is an image, video, etc. In practice many container file extensions can represent many kinds of files; extension is a pretty fragile way of determining type. If FITS keeps a distinction between file type mappings, it should move to using something like mimetype instead of extension.

Aside from my work improving FITS, I also submitted a set of Quicktime videos to the OpenPlanets Format Corpus on GitHub. The 61-video set covers almost every codec Apple ships with Quicktime and Final Cut Pro, and should be useful for anyone who wants to try to identify individual codec/container combinations. They’re available at: https://github.com/openplanets/format-corpus/tree/master/video/Quicktime

I’ll end this off with some eye candy, to show how nicely FITS’s video support has improved.

Before. The video is detected only as “Unknown Binary” (this was sadly common for video), and no meaningful metadata is extracted.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<?xml version="1.0" encoding="UTF-8"?>
<fits xmlns="http://hul.harvard.edu/ois/xml/ns/fits/fits_output" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/fits/fits_output http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd" version="0.6.1" timestamp="11/17/12 10:18 PM">
<identification status="UNKNOWN">
<identity format="Unknown Binary" mimetype="application/octet-stream" toolname="FITS" toolversion="0.6.1">
<tool toolname="Jhove" toolversion="1.5" />
</identity>
</identification>
<fileinfo>
<filepath toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">/Users/mistydemeo/Downloads/set1/00000.MTS</filepath>
<filename toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">/Users/mistydemeo/Downloads/set1/00000.MTS</filename>
<size toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">6039552</size>
<md5checksum toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">8c7c728334017a3ab4caff6e78b30037</md5checksum>
<fslastmodified toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">1261684470000</fslastmodified>
</fileinfo>
<filestatus />
<metadata />
</fits>

After. Not only is the video format extracted, but a good 18 video tags are extracted.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
<?xml version="1.0" encoding="UTF-8"?>
<fits xmlns="http://hul.harvard.edu/ois/xml/ns/fits/fits_output" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/fits/fits_output http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd" version="0.6.1" timestamp="11/17/12 10:20 PM">
<identification status="SINGLE_RESULT">
<identity format="M2TS" mimetype="video/m2ts" toolname="FITS" toolversion="0.6.1">
<tool toolname="Exiftool" toolversion="9.05" />
</identity>
</identification>
<fileinfo>
<lastmodified toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">2009:12:24 13:54:36-06:00</lastmodified>
<filepath toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">/Users/mistydemeo/Downloads/set1/00001.MTS</filepath>
<filename toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">/Users/mistydemeo/Downloads/set1/00001.MTS</filename>
<size toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">4552704</size>
<md5checksum toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">770fd667d68ca8e6509670b0ef50e61c</md5checksum>
<fslastmodified toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">1261684476000</fslastmodified>
</fileinfo>
<filestatus />
<metadata>
<video>
<digitalCameraManufacturer toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">Sony</digitalCameraManufacturer>
<digitalCameraModelName toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">HXR-NX5U</digitalCameraModelName>
<duration toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">0.09 s</duration>
<imageWidth toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">1920</imageWidth>
<imageHeight toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">1080</imageHeight>
<videoStreamType toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">DigiCipher II Video</videoStreamType>
<shutterSpeedValue toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">1/60</shutterSpeedValue>
<apertureSetting toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">Auto</apertureSetting>
<fNumber toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">3.7</fNumber>
<gain toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">-3 dB</gain>
<exposureTime toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">1/60</exposureTime>
<exposureProgram toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">Manual</exposureProgram>
<whiteBalance toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">Daylight</whiteBalance>
<imageStabilization toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">On (0x3f)</imageStabilization>
<focus toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">Manual (2.3)</focus>
<gpsVersionID toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">2.2.0.0</gpsVersionID>
<gpsStatus toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">V</gpsStatus>
<gpsMapDatum toolname="Exiftool" toolversion="9.05" status="SINGLE_RESULT">WGS-84</gpsMapDatum>
</video>
</metadata>
</fits>

Revisiting Archival Description – LOD-LAM Session Idea

Apologies for the brevity of this blog post – I’m keeping this brief to make sure I get it posted before LOD-LAM.

So, archival description.

Archival records are hard to find. They’re often in large bodies of records, difficult to browse through and generally less cut-and-dry than publications which are intended for formal publication and/or public consumption. Archival finding aids are the researcher’s traditional first point of contact, providing background biographical information on the organization and/or personal creator(s), as well as a description of how the records are arranged and description of the various levels of organizational hierarchy. They’re useful!

But they’re also a bit old-fashioned, at least as typically implemented. The finding aid structure imposes a few issues for linked open data applications.

I see two[^1] major problems with current archival description:

They’re hierarchical

Most countries’ archival description standards are based on a strict hierarchy from higher levels of description (fonds, etc.) to more precise levels of description (series, sub-series, file, item) with fairly rigidly prescribed relationships between items. The finding aid also assumes a “paper” whole-body approach, rather than a linking approach. This is kind of non-webby, and imposes a stricter order on documents than their creators may have had, in many cases.

(The Australians, of course, are a few steps ahead of the rest of us already.)

Perhaps even more though, a major problem is that:

They’re imprecise.

This is the real issue, or at least the most immediate issue. Archival descriptions are designed for human eyes in a paper world, and so they’re often encoded with a level of ambiguity that’s difficult for machines to extract. (LOCAH has been doing a great job of identifying points of concern and trying to route around them.)

Archival descriptions have some inherent ambiguity because interpretation of archival holdings is not always cut and dry, but that doesn’t mean that we have to be ambiguous in how we create those descriptions. We can be precise about the ways in which our collections are ambiguous.

I’d love to get a conversation going about revising descriptive standards to enhance precision in finding aids in order to enhance the ability to use them as computer-readable metadata. I can see a number of areas for improvement:

  • More strongly-typed data fields, rather than “fuzzy” fields that can hold a variety of types of subjectively-defined data
  • More focus on “globally-scoped” names rather than “locally scoped” (as pointed out by Pete@LOCAH here)
  • A stricter, clearer inheritance model rather than ISAD(G)’s rule of non-repetition (Thanks to Pete again)
  • Certainly more, which we can talk about at LOD-LAM!

The extent to which all this can be implemented will depend on the organization, of course – retrofitting older archival descriptions for all of this would be time-consuming, if practical at all. But I think there are a lot of benefits to be gained by changing practices going forward, and I see this as an enhancement to current descriptive standards/practices that can benefit more than just linked open data applications.