Light Reading: Brewster Kahle, director of the Internet Archive, and 'the Scribe,' the scanner used to get public-domain works online.
By Richard Koman
Say you're in school and you have to write a research report on the anti-slavery movement in the United States in the mid-19th century. Where do you go?
Hmmm. How about taking a look at the James Birney Collection of Anti-Slavery Pamphlets--a collection of over a thousand abolitionist books, pamphlets and newspapers housed at the Johns Hopkins University Libraries. Fancy a trip to Baltimore? Right now, that's the only way you'll get to look at them.
But soon enough, the entire collection will be online as high-quality scans. So will the complete personal library of John Adams (housed at the Boston Public Library), the Getty Research Institute's collections on art and architecture, the full archive of publications of the Metropolitan Museum of Art and UC Berkeley's extensive collection of texts from the Gold Rush.
"Many people are turning to the Net as the public library," says Brewster Kahle, director of the Internet Archive in San Francisco. "Unless the works are available on the Internet, they will be unavailable to the next generation. Our role is to make great materials available to our children."
In January, the Sloan Foundation awarded a $1 million grant to the Internet Archive and the Open Content Alliance to scan and put online those classic materials from America's past. The award is a stake in the ground, a flag that says information should not only be online but truly free, truly accessible, no matter what search engine brings you to the content.
"The capability to digitize all recorded human knowledge now exists for the first time, and it is important that we seize this moment and ensure that public works and the public domain at large remain in the hands of the public," says Doron Weber, the program director of Public Understanding of Science and Technology at the New York-based Sloan Foundation.
The Sloan project is fundamentally different from Google Books, an initiative the search giant launched in cooperation with Stanford, Harvard, the University of Michigan and several other major university libraries.
For one thing, Sloan's paltry million bucks is a drop in the bucket compared to the upward of $100 million that Google is spending. Google's footing the bill for all of the scanning, and the universities are giving Google access to millions of books. That's money schools like Stanford are happy not to be spending themselves.
Google is really putting only one condition on the partner libraries: that the books are only indexed by Google. That means if you use another search engine, you won't have access to these works. Use Google, you get access.
That's a deal breaker for people like Kahle. His first company was based on the open source search system, WAIS, that he developed in the late 1980s. He later sold WAIS Inc. to America Online and a second company, Alexa, to Amazon.com.
Kahle has worked to make information available online for two decades, but the open-content movement goes back even further. It began some 30 years ago when Michael Hart, a professor at the University of Illinois, launched Project Gutenberg, an online collection of public domain books available in text, HTML and XML formats. Hart started by typing in texts like Alice in Wonderland and War and Peace. Today there are 20,000 texts online at Project Gutenberg (www.gutenberg.org), which are also hosted at the Internet Archive (www.archive.org).
The Archive serves as a kind of portal to a number of open content efforts, including Gutenberg. The other projects are not just text renditions of books but full-color scans that can be downloaded as PDFs or in a highly compressed format called DjVu. Among the efforts: the Million Book Project, Microsoft's book search, the scanning of American and Canadian libraries and the Archives' own scanning efforts. There are a total of 100,000 public domain books freely available for download and printing on the Archive site.
"People are deciding to go open," says Kahle. "People are interested in having the public domain stay public domain--and to do high-quality scanning that would be of value to the public and to researchers."
The books that make up our heritage should be available online, but freely available, says Kahle. "We want the books available through Google, Yahoo, Microsoft, libraries. The idea of locking things down doesn't make sense in this Internet age."
When Google Books launched, the company said they would scan books in copyright, as well as public domain books. That made publishers mad. Really mad. In 2005, publishers and authors sued Google and the company made some changes to accommodate those concerns.
The Archive project will have no such issues; it's focused totally on works that are in the public domain.
"The first step is public domain works, then orphan works, then out-of-print works, then in-print works," says Kahle. "For in-print works, I think we'll see publishers take a role in distributing their works."But the orphans will be locked up for a while longer.
"Orphan works" are those that would have entered the public domain if it weren't for a 1976 rewrite of the Copyright Act that made copyright registration optional. In 2004, Kahle and ephemeral film collector Rick Prelinger sued the government to try to "free the orphans." But a panel of the Ninth Circuit Court of Appeals ruled last month that the orphan works will stay where they are.
"What is at stake is if libraries of the future can provide access to out-of-print materials after the publishers and authors are gone," says Kahle. "This case had only one purpose: to get the judge to say that the structure of copyright had changed so we can get the law examined, and he did not seem to even answer the question. Very sad. Another opportunity missed by our government. Sometimes, I think some of the more senior judges haven't bothered to understand what is happening to our civic institutions in our digital age."
Perhaps a little copyright history is in order. For almost 200 years--from the adoption of the Constitution in 1789 until the bicentennial in 1976--you had to register a copyright, which lasted for a certain number of years, and then renew it. If your work no longer had commercial value, you wouldn't renew it and it would enter the public domain.
The rules changed in 1976 with a rewrite of the Copyright Act. The intent was to bring the United States into compliance with the Berne Convention, the 1971 international accord on copyright issues, and the new law did away with the registration and renewal requirement. Now work is copyrighted upon creation--you don't even have to publish or print the "©" symbol.
But there are a number of works that hadn't been renewed and would have entered the public domain if not for the new law. These so-called orphan works have no commercial value and yet are locked up under copyright. They can't be scanned or published online or used in derivative works until their copyright expires. And copyright now lasts a very long time: a 1998 law named after Sonny Bono extended copyright to the lifetime of the author plus 70 years.
So Kahle and Prelinger filed suit, hoping that the courts would order the Copyright Office to remove copyright protection from these works. In rejecting the Archive's request, the Ninth Circuit judges said that Kahle and Prelinger were essentially complaining that copyright was too long--the same argument that had earlier been made and rejected in the U.S. Supreme Court.
Chris Sprigman, the lead lawyer in the Kahle case, wrote on his blog that he was "maddened" by the Appeals Court's refusal to take on a key aspect of the Supreme Court's Eldred decision--that unless changes to the copyright laws "alter the traditional contours" of copyright protection, they don't offend the First Amendment.
Sprigman, Kahle and Prelinger are appealing the decision for review by the full Ninth Circuit Court of Appeals.
Kahle wants the court to clarify that groups like the Internet Archive can make out-of-print works available on the Internet.
"Otherwise we live in a world of just very old works in the public domain and commercially available works. Everything in between effectively will be denied the next generation," he says.
"We could lose the 20th century."