SILS Independent Student Blog

Ol-Lib, artifactual value and the future of our profession

September 26, 2007 · Leave a Comment

I’m including the text of a message that Karen Coyle posted to Ol-lib, the open library project discussion list.

[H]ashing a file works if you all have the same file to hash. That’s not the problem here. We have books, most of which are in hard copy, and we have metadata in machine-readable form. However, the metadata for the same book are not identical, so a hash won’t work on them. And we need for people who have the book to be able to connect that book to the identifier.

Note that independently scanned versions of books will not be identical, either to the eye, or bit-wise.

We know, thanks to the extensive work done on de-duping of files of bibliographic records, that the metadata alone is not enough to fully identify a book item. I’m thinking that we may have to go back to the physical book, but perhaps could do it in a way that scanned books could be identified algorithmically. I’ve had in mind something like an “incipit” — a snippet (of text, of sound, of whatever) that can feed into the identifier. For example, the first two words of each chapter. Or the first and last words of each chapter. But this isn’t something you can derive from metadata.

The other option is to see if there’s a way to use something like the title and the pagination, or some other combination of easily identifiable facts about the book that also appear in metadata, that would end up being unique. That’s why I asked to have some output from the creation of the OLNs – to begin to see what patterns appear in large files.

This particular snippet from the discussion should raise some issues with all of us. Suppose the open library really can create the (much longed for) single point of bibliographic access. If the actual ID of that point of access derives from a physical object, there may be some clear and severe preservation implications.

If you’re not subscribed to Ol-lib, fix that oversight first. This is the place that everything we already have learned about librarianship is being actively translated into everything we’re going to have to learn about librarianship. I’d submit to you that if you can follow the discussion going on about this project, you’re ready to deal with our profession in the 21st century. Some things to consider as you read and re-read this:

  • What abstractions do we have introduce into our cataloging models to make a system like Open Library effective? At what level of abstraction from the item-in-hand will records be created?
    • Is the FRBR model sufficient?
    • Does the AACR defined level of correspondence between the item in hand and its surrogate catalog record serve a genuine need?
  • Given a high level of abstraction for a record, what sub-records or meta-records do we require to manage our own inventories of physical collections?
  • How do we identify the locus of preservation activity in a shared, single record environment? Preserve the item in hand? Ensure the preservation of representative items somewhere?
    • Who has to take responsibility? Will there be a rush to avoid having the last copy standing? In a networked, single record environment, is preservation a hot-potato game to avoid being stuck with the cost of expensive, perpetual, single-item preservation?

–Jacob Nadal

Categories: Preservation · Read this! · Uncategorized

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

You must be logged in to post a comment.