þriðjudagur, desember 21, 2004

the great library

A terse but well-written editorial in the New York Times today voices some of the same concerns I have about the impending Googlization (Googlification? Googlun?) of several major research libraries. I, too, am concerned about copyright and about "the well-being of the books themselves:"


Google has developed a scanning technology that the company claims is not destructive. Clearly, Google will need to work closely with libraries to ensure that no books are damaged. It is an illusion to think that the digital versions of scanned books can replace the books themselves.

A participating library will get a free digital copy of every book scanned in its collection. In other words, each library will essentially get a digital backup of a significant portion of its holdings, but it will be critical to remember that printed books are a stable medium, one that has persisted for
hundreds of years.


Hear hear. I would only have said it more stridently myself. In fact, I will anyway.

There is real danger of damaging books physically during scanning, either through carelessness or because of unforeseen effects of the scanning itself. Goodness knows that even the preservation techniques of earlier ages have sometimes turned out to be destructive. And I for one seem always to be hand-copying something out of a book I am not permitted to photocopy because it is too fragile.

The stability of books as a medium cannot be stressed too much. They are of course vulnerable to fire and water (as are electronic media), but they are remarkably durable. You can, for example, drop them on the floor several times before this abuse has any impact on their legibility.

Furthermore, writing or set type on pages is organized into informational quanta in such a way that damage to part of it does not reduce the readability of the rest. In contrast, many digital, machine-readable formats are so sensitive to corruption that a few 0s or 1s out of place makes the entire document unreadable.

Imagine if books worked this way, if the careless underlining of an interesting word or a single dogeared page were to make it impossible for the next potential reader to even open the book. (Note that we've borrowed the vocabulary of opening and closing books into the language of digital documents, and note that it brings exactly the deficiences of the younger medium versus the older into high relief.) Whole pages and quires have gone missing from books and manuscripts, and though obviously it would be better if we had them, their loss does not equal the loss of the information on the rest of the pages. If this were so, we wouldn't have Beowulf or The Poetic Edda or most anything of any age.

Even if they are accurately scanned (and it would be generations before anyone truly knew whether they had been accurately scanned), the books themselves must not be destroyed. They contain information that will not be captured by any technology of reproduction, never mind one being put into practice by people who cannot possibly be reading all the material they are meant to be copying accurately. And:

Digital technology is only a few years old, and even in that brief time, the digital world has produced dozens of incompatible, and often unreadable, media formats. The Google project will enhance the usefulness of the books it encompasses, but it in no way will render them obsolete.

I still fear greatly that large-scale digitalization of the great libraries will make it more difficult for those of us who know this to convince cash-strapped bureaucrats to continue buying, housing, and preserving physical books. I know that this is already a problem in the American legal profession as well as in the matter of newspapers, and it would be an enormous tragedy if it became a bigger problem than it already is in the Academy in general.

I could go on at huge length on this matter, and on the whole concept of searchable text, and I probably will. I leave you will this comparison of the durability of technological versus human methods for decoding information preserved in outdated formats.

The format in question in this first example is magnetic wire, a precursor of magnetic tape. I know of a sizeable collection of recordings preserved on magnetic wire, made in the second decade of the twentieth century when this was the cutting edge of technology for recording sound. There are, to my knowledge, now only two machines in the United States that can play magnetic wire spools. I once met on a plane the one fellow who knows how to repair them when they break down, which they do, apparently, often.

The other format is medieval Icelandic, an admittedly obscure language, one fairly obscure even in the context of 13th-century Europe. Even discounting Icelanders, there are today more people in the United States who can decode this antiquated storage format than there are machines than can play magnetic wire, more by an easy factor of 100. And they can teach others to decode it too.

More ranting on this subject to come, but now I really must trot across the street and look at that manuscript from 1827. I expect the handwriting to be perfectly awful, but at least I will not have to endure the antics of Microsoft Word perpetually crashing and freezing my machine.


1 ummæli:

Nafnlaus sagði...

For a new textbook, at least, I would have to say that having both a physical and an electronic copy is best. Why? The electronic copy may be more portable, and will be more searchable. Free (and I use this term elastically here) with my pathology textbook came access to the same book online: anywhere I have a networked computer I can search the entire text for any term I need help in understanding.
Of course, reading at length frm a computer screen is something else again. And making notes in the text to come back to later - well, not so easy.
But the physical copy will last.
So really, I think that we are now recognising that we are in a time of two complementary technologies: one that is highly searchable, the other that is highly durable. Both are valuable, when correctly understood.
Though I personally could do without Microsoft Word.

 
Hvaðan þið eruð