Our basic idea for entry into the 2012 “Rails Rumble” was simple; build an API that reads and writes ISBNs, creating a basic catalog of associated bibliographic information in the process. There’s a lot of sources of ISBNs and bibliographic data out there. Our idea was to poll these sources and offer a simple, streamlined API that was based mostly on the ISBN rather than the idea of the book itself. A clean and clear data stream, ambitiously targeted on every ISBN in the world.
But like any simple idea, complexity lurked just below the surface.
Though ISBNs are issued by a central agency, the “meaning” of these numerical strings is not particularly organized. The process begins, at least in the US, with a publisher purchasing a block of ISBNs. The publisher assigns these ISBNs to their products. Though primarily “books” a publisher’s products might also include associated supplemental material such as a CDROM accompanying a biology textbook - or a plastic wand bundled with a Harry Potter book. Books are products first and books second, if at all.
The ISBN encodes a few facts about the product. The 13 digit string checks out as an EAN— European Article Number — an international standard despite its provincial name. The opening string 978 tells us this product comes from “bookland.” 979 also signifies bookland, but no ISBNs have yet been assigned to this expansionary prefix.
A following string identifies a designated country or language region. Following this string, the publisher can be named. Big publishers, such as Random House or Penguin or Oxford University Press, purchase big blocks at a time. Smaller publishers purchase purchase small blocks of ISBNs or even a single number and thus have longer identifying strings. Finally, a check digit at the end of the ISBN can be calculated against the full number to verify that the EAN is in fact in a valid format.
And that’d the limit of what an ISBN can reveal, more or less. The remainder of the bibliography is paratextual to the ISBN; alien.
Which brings us back to our 2012 Rails Rumble project. Building records of and about ISBNs is a cataloging task. Every catalog is built with degrees of bias and blindness. The literature of Library Science revolves around catalogs and cataloging. As a discipline, Library Science began as the Computer Science of the predigital information age. When paper was the primary machinery of information, the catalog was (and remains) paper’s database.
Seymour Lubetzky was a metaphysician of data circa mid twentieth century library science. His essays, though devoted to obsolete technologies such as the card catalog, remain relevant for their ability to get to the essence of information storage, organization and retrieval. For Lubetzky, the library begins with its catalog, without a catalog the library is an inaccessible collection of material. The start of cataloging though is the opening of prejudice:
The book (i.e. the material record) and the work (i.e. the intellectual product embodied in it) are not coterminous; that, in cataloging, the medium is not to be taken as synonymous with the message; that the book is actually only one representation of a certain work which may be found in a given library or system of libraries in different media (as books, manuscripts, films, phonorecords, punched and magnetic tape, braille), different forms (as editions, translations, versions), and even under different titles.
Lubetzky continues to anticipate and describe the problems of building a library that best serves the user, a library of works rather than objects.
For the Rail Rumble project, we took Lubetsky’s warning as an invitation to simplification. Rather than attempt to build bodies of work from an ISBN’s metadata, we took the ISBN as proof of object itself. The finished project, ISBN.IO is just that; a solid known fact, the ISBN along with place holders for such trailing incidentals as Title, Author, Page Count.
The API allows trusted users to write ISBNs and submit paratext. Conflicted paratext is checked against previous entries; we attempt to establish the most correct information. For example, if two of three writers to the API prefer William Shakespeare to Wm Shakespeare, we keep the more popular expression.
As an thought exercise, the Rails Rumble forced us to consider a key area of our business, the ISBN, as both an abstract and material entity. As a practical product, it deserves a prize as least front facing entry in this years rumble. But its humble function is open source, public and, given some time, could touch upon every ISBN, the citizens of bookland. And who know’s perhaps it will appear as backend for an interesting project for Rails Rumble 2013 such as fellow entry “Ideal Copy,” a user of our related Ruby Gem, Vacuum.
Synonyms, Discriminated. Nabu Press, 2010. ISBN 9781147194395
Perhaps the strangest book-related artifact of the digital period is the bot book, the spam listed print edition, POD, print-on-demand, the infinite number of books potentially existing but nearly always unwanted by any reader or buyer of books.
A labyrinth of business identities are behind these public-domain print-on-demand oddities. Kessinger was an early trailblazer in the lightning print and scanning technology that made copyright-free publishing possible.
But Kessinger’s vast catalog was soon surpassed by a network of businesses connected in a maze of corporate registrations: Biblio Labs, Nabu Press, Bibliobooks, aka BiblioBazaar are a few of the names. Together they have produced perhaps a million books—that is to say potential books, or at the very least ISBNs.
Nabu is perhaps the weirdest of the Biblio offshoots. In their own words:
Nabu Press differs from BiblioLife and our other projects in that the books published by Nabu Press have not been hand curated. Hand curation is a lengthy process. We believe the books published by Nabu Press are culturally important, and should be available in as many formats as soon as possible. As these books go through the laborious process of hand curation and enrichment, they will become part of one or more of our other collections.
What does this mean? In short, it means the books are scraped from the web. Some are Google Book scans, such as the Victorian thesaurus “Synonyms Discriminated” shown here. Others are scraped from Wikipedia or other sources deemed public domain or creative commons (Appropriately, the image of Nabu used on their website is in fact also taken directly from Wikipedia).
The Wikipedia scrapes differ from the scans; the latter are in fact images and not text at all. Interestingly, all text on the bibliolab website are also presented as image files, presumably to protect them from the spidering and scraping they so much rely upon.
The pseudo-existence of these weird books is entirely self-reflexive: they are only available for sale on the internet, especially Amazon. Meanwhile the content is sourced from well-known public websites such as Google and Wikipedia. Leaving out Facebook, Nabu Press is connecting the most obvious internet platforms, and doing damn well too. Synonyms, discriminated.
Synonyms, Discriminated. Henry Holt & Co., 1904.