The Evelyn Wood of Digitized Book Scanners
From: NY Times Online - May 12, 2003
By: John Markoff 

Palo Alto, CA, May 10 - Putting the world's most advanced scholarly and
scientific knowledge on the Internet has been a long-held ambition for
Michael Keller, head librarian at Stanford University. But achieving this
goal means digitizing the texts of millions of books, journals and magazines
- a slow process that involves turning each page, flattening it and scanning
the words into a computer database.  

Mr. Keller, however, has recently added a tool to his crusade. On a recent
afternoon, he unlocked an unmarked door in the basement of the Stanford
library to demonstrate the newest agent in the march toward digitization.
Inside the room a Swiss-designed robot about the size of a sport utility
vehicle was rapidly turning the pages of an old book and scanning the text.
The machine can turn the pages of both small and large books as well as bound
newspaper volumes and scan at speeds of more than 1,000 pages an hour.  

Occasionally the robot will stumble, turning more than a single page. When
that happens, the machine will pause briefly and send out a puff of
compressed air to separate the sticking pages.  

For Mr. Keller, the robot, made by 4DigitalBooks, one of two companies now
introducing the first automated digitization systems, is a boon.  

"Think about the power of bringing our library to little schools in the
middle of Africa," Mr. Keller said. "Would it make a difference for those who
now have their minds closed to the idea of democracy?" 

The first book-scanning robots were introduced this spring by 4DigitalBooks
of St. Aubin, Switzerland, and Kirtas Technologies of Victor, NY. The
machines have already begun to generate interest from libraries and private
and nonprofit groups now working to digitize books.  

Until now, the job has been done mostly by students or armies of low-cost
workers in countries like India and the Philippines. But manual digitization
presents significant logistical problems. Book collections may have to be
moved long distances to digitization centers.  

And in some cases the process of scanning has damaged old books and journals,
making it necessary to rebind them afterward.  

The digitizing machines, by contrast, can be located close to book
collections and offer speed and quality control unattainable by manual
systems.  

Even so, manual processing is still less expensive in many cases than
acquiring a robot. The 4DigitalBooks robot, whose price neither the company
nor Stanford officials would disclose, becomes cost effective on projects
larger than 5.5 million pages, said Ivo Iossiger, the company's chief
technology officer and a co-founder. It seems likely that the vast majority
of digitization over the next several years will be done by hand.  

Mr. Keller admits that his dream to have the entire Stanford library in a
digital database is unlikely in the foreseeable future because such an
undertaking - involving eight million volumes - could cost upward of $250
million.  

In the meantime, the Stanford librarians have begun digitizing books and
documents where there are no thorny copyright barriers and have important
historical and political significance.  

The newly installed robot is currently finishing two pilot projects, scanning
books published by Stanford's Center for the Study of Language and
Information and works for the Medieval and Modern Thought Text Digitization
Project. It will soon begin work on the 2,500 titles published by the
Stanford University Press.  

Not long ago Stanford helped finance the manual digitization of the
presidential papers of Eduardo Frey, the former president of Chile, who was
concerned that records of his administration could be lost in a coup.  

And beginning in 1999, the Stanford library system sent a team of specialists
and students to Europe, where the university is engaged in a multiyear
project to digitize selected documents produced by the General Agreement on
Tariffs and Trade and its successor organization, the World Trade
Organization in Geneva. The project, which will take five years, will
ultimately scan about 2.2 million pages of information.  

Other ambitious undertakings like Carnegie Mellon University's Million Book
Project will also continue to rely on manual digitization for several more
years. Another project, led by the Internet Archive in San Francisco,
recently shipped 80 tons of old books acquired from the Kansas City Library
to Hyderabad, India, where they will be scanned, according to Michael Lesk, a
former National Science Foundation official and digital library expert who
works with the archive.  

Mr. Lesk said that currently in India or the Philippines it is possible to
scan and digitize a book for $1 to $4. But he acknowledged that there were
significant costs in quality control.  

For Mr. Keller the most vexing challenges are neither labor costs nor
technology. Librarians, he said, must find a way to address the copyright
restrictions that appear to be tightening as a result of new federal laws
like the Digital Millennium Copyright Act of 1998.  

Stanford is struggling to comply with copyright restrictions while making
works that have recently lost their copyright protection available digitally.
Mr. Keller said the library increased the circulation of its collection by 50
percent when it computerized its card catalog. Digitizing out-of-print books
could likewise make them available to a much wider audience, he said. The
payoff for building such a digital collection, he added, is vastly improved
availability of a huge store of knowledge and information for teaching,
learning and research.  

http://www.nytimes.com/2003/05/12/technology/12TURN.html?ex=1053759747&ei=1&en=ee36c9d5c5566647

Links:
http://www.4digitalbooks.com/
http://www.kirtas-tech.com/

Contributed by Joel Turmo
