Text size
Text Print Share Email
1
Aug 06, 2009

Brewster Kahle wants to give you digital access to every book, film, video, song, TV show and periodical ever published. If he succeeds, the world will be a different place.

By Amy Van Vechten

Brewster Kahle and his team at the Internet Archive propose a revolution. By scanning, uploading, storing and linking all the world’s media into a massive digital library, they want to make everything that has ever been published accessible to every computer in the world. Think of it as a fire hose with infinite capacity for all intellectual property, as close as your broadband outlet.

“We want to create a full digital library that’s useful to the Wikipedia generation,” Kahle (pronounced “kale”) says.

Kahle is not alone in this endeavor. Beginning with a similar idea in 1971, Project Gutenberg has made some three million books available online.

In just the last four years, Google’s Book Search project—in cooperation with 20,000 publishers and 28 libraries around the world, including Harvard, Oxford, and the University of Michigan—has scanned over a million books, making them searchable online, and in some cases downloadable.

Yet despite such company, Kahle’s Internet Archive makes a plausible claim to be the mother of all digital dreams, encompassing as it does not just print, but also movies, music, TV and the Internet. Making no apology for the audacity of his ambition, Kahle says he wants nothing less than to “one-up the Greeks,” whose ancient Library of Alexandria was the most notable attempt to create a repository of everything.

That library went up in smoke. To make sure his dream doesn’t, Kahle and his archivists are being very good about backing up their files. The original in San Francisco is mirrored, or simultaneously updated, in the new Library of Alexandria.

 

 A Lot of Dimes

 

Kahle considers the Internet Archive as the ultimate product of Silicon Valley idealism. After studying computer science and engineering at MIT, he designed an early search engine server called WAIS, which he sold to AOL for $15 million. This money financed the next business he developed, a web crawler called Alexa Internet, which he in turn sold to Amazon in 1996 for $250 million of their stock.

In part with the proceeds of that sale, Kahle then started the nonprofit Internet Archive, with offices in the Presidio district of San Francisco.

The archive now contains more than one million books—half scanned by Kahle’s team and the other half by digitizing organizations like Project Gutenberg—as well as three million audio recordings, more than 300,000 films, 50,000 hours of video and something the team calls the “Wayback Machine,” which houses everything on the internet from 1996 to the present.

Collecting all that material has been no small feat. Books must be scanned by hand on machines that use carefully engineered lighting and high-quality cameras. When done in volume, the process works out to a cost of 10 cents per page, or $30 for a 300-page book.

The Archive is currently scanning 25,000 books per month, thanks to cooperative libraries at University of California, Los Angeles and the University of Toronto, among others.
The project has been supported by donations from the Alfred P. Sloan Foundation, the William and Flora Hewlett Foundation, and Kahle’s own Alexa Internet and the Kahle/Austin Foundation he created with his wife.

And he requires a lot of donations: Kahle would need $800 million to digitize the 26 million volumes in the Library of Congress. His more modest short-term goal is to scan one million out-of-copyright books published before the 21st century, which, even at a dime a page, adds up.

“We have the machines and the people to scan the books,” Kahle says. “We just need the ten centses!”

These scans produce more than simple image files. Each book is transformed into searchable text and a variety of formats, including PDFs and text-only files.

After scanning, the books can be reprinted in a variety of ways. Most often, the Internet Archive uses its Espresso Book Machine, an 800 lb. on-demand printing system that can bind and trim library-quality paperbacks in 3–4 minutes for a penny per page. There are now nine Espresso Book Machines in the world, from the Internet Archive in San Francisco to the new Library of Alexandria in Egypt.

Kahle created a bookmobile and, with funding from the World Bank, sent it to rural spots in Uganda, India and Egypt, where it drew from digital scans to make books on the spot for children who had never owned one before.

“I like the physical book,” Kahle explains. “Books are beautiful. And the archive gets them into the hands of people who wouldn’t otherwise have access to them.”

 

 

What It Takes to Hold 150 Library of Congresses

 

The Archive is stored in “petaboxes,” which were specially designed by Capricorn Technologies. Plans for the device have been made public, and the hardware is open-source in case anyone else wishes to make one, but the cost—$1 million—will discourage amateurs.

Eight feet tall, eight feet deep,and twenty feet wide, the petaboxes are named for their storage capacity, which is one petabyte, or a million gigabytes, or a trillion megabytes. That amount could hold 150 times the digital books that now exist in the Library of Congress, so filling it up is quite a challenge.

Recording the history of the Internet, done by the Wayback Machine once a year over the last twelve years, has already taken up two petabytes. The archive is currently growing at a rate of 20 terabytes a month.

Kahle and his team will continue to record the Internet, host files for free, scan the books in our libraries and raise money for more petaboxes. While their mission may be circumscribed by copyright laws, Kahle remains determined in his goal to enable access to all of the media ever produced.

“Anyone with curiosity should be able to get access to the books, music and video of humankind,” Kahle says. “We need to work with the government to keep the monopoly impulse under control, because this new age should be about diversity of publishers, libraries and authors. We’re trying to make sure we offer that option and make it accessible.”


login or register to post a comment

congratulations on for the best use of new technology i’ve come across. this is a terrific piece: superb integration of aural/visual with all info given at a perfect pace. (but, if i may, there’s one no: the “page” using 4 images at once.) as a photographer working for print, i’ve broken my head over how magazines can gracefully slide over. now i know. a big thank you to all you flyp workers. elaine ellman

elaine ellman
Sep 9, 2009