I always love reading Jill Lepore in the New Yorker. Her latest piece, "The Cobweb," is no exception.
Lepore expertly describes what archivists and librarians have long known--that the content we post online is at great risk of eventually being lost. Unlike sturdy ol' print, we have not yet figured out how to confidently preserve the Web.
This is not for lack of trying, as Lepore documents. She spends a great deal of time with Brewster Kahle, founder of the Internet Archive. Content online can be overwritten very easily--and without version control over previous instances of the same page, the older versions of the pages effectively disappaear. Enter the Internet Archive, which trawls the Web repearedly and makes cached copies of pages before they vanish. The technology is not perfect--Lepore notes that a cache from CNN in fall 2008 includes content from the 2012 presidential campaign. But Kahle is nonetheless doing yeoman's work to guard against digital obsolescence.
It's not fully clear if this yeoman's work is legal. Our antiquated copyright laws--the last major revision to US copyright law occured in 1976, well before Web browsers existed--have not kept pace with how people conduct research today. Are Kahle's efforts an act of preservation or appropriation? Does the fact that the Internet Archive is a non-profit make any difference to this analysis? These are unsettled questions. Or as Lepore puts it, "Copyright is the elephant in the archive."
I'll stand with Brewster if his aims are ever challenged. In the meantime, the conjoined problems of "link rot" (when a web page URL disappears) and "reference rot" (when the version of a Web page you see is not the version you intended to cite) remain fierce. One promising solution, as Lepore notes, is perma.cc. According to their website Perma.cc "helps scholars, journals and courts create permanent links to the online sources cited in their work." Lepore sings its praises: "Perma.ccn has already been adopted by law reviews and state courts; it's ony a matter of time before it's universally adopted as the standard in legal, scientific and scholary citation."
Lepore also notes that perma.cc emerged from Harvard Library's Innovation Lab. That's right, the library. These issues of permanence and findability--which digital scholars believe are new in our time--are actually extremely old. They have been the concern of librarians and archivists for hundreds if not thousands of years. We're now working in bits rather than on papyrus, but the fundamental imperarative for preservation of the culture's artifacts has not changed.
Version control is our bread and butter. One reason Kahle's library is so messy is that he is not a librarian.
So go forth bravely, digital librarians and archivists. The world needs you.