Stanford EE Computer Systems Colloquium

4:15PM, Wednesday, Feb 15, 2006
HP Auditorium, Gates Computer Science Building B01

Google Book Search -- Making off-line content fully searchable on the web

Daniel Clancy
About the talk:

When Gutenberg invented the printing press, it revolutionized how people accessed information by allowing printed text to be reproducible. A similar revolution occurred with the advent of the Internet. The Internet provides users with billions of pages of information at the touch of their fingertips. However, a great deal of the world's information still is only accessible in printed form through books and other mediums. For many Internet users, this content is difficult to find in comparison to the vast amounts of information that can be found through Google and other search engines on the Web.

The Google Book Search project is an ambitious project designed to make offline book content fully searchable via the web. Google works with publishers to digitize and index their in-print books providing users with the ability to sample portions of the book to help them select the appropriate book to purchase. In the Library project, Google is working with many of world's leading libraries to digitize their collections and make them fully searchable. For public domain works, the books are fully accessible via the web while for in-copyright books users are limited to viewing a few short sentences of the text contextually relevant to their search.

In undertaking this project, Google had to address numerous technical challenges since a digitization effort on this scale had not been previously attempted. In this talk, I will discuss the motivation for the project and some of the technical challenges that Google has faced in this project. I will then focus on some of the research challenges that remain.


Download the visuals for this presentation in PDF format.

About the speaker:

Dr. Daniel J. Clancy, PhD, is the Engineering Director for the Google Book Search Project. This project is working to bring off-line books content on-line and make it searchable to allow discovery of books. Google is working with both publishers and libraries as part of this project.

Prior to coming to Google in January 2005, Dr. Clancy was the Director of the Exploration Technologies Directorate at NASA Ames Research Center. The Directorate supports over 700 people performing both basic and applied research in a diverse range of technology areas intended to enable both robotic and human exploration missions. Technology areas include Intelligent Systems, High-end Computing, Human-Centered Systems, Bio/Nanotechnology, Entry Systems and others. In this role, Dr. Clancy played numerous roles at the agency level including participating in the team that developed the agency’s plan to return men to the Moon and eventually Mars.

Dr. Clancy received his PhD from the University of Texas at Austin in artificial intelligence. While in school, Dr. Clancy also worked at Trilogy Corporation, the NASA Jet Propulsion Laboratory and Xerox Webster Research center. Dr. Clancy received a Bachelor of Arts from Duke University in 1985 in computer science and theatre.

Contact information:

Daniel J. Clancy
1600 Amphitheatre Way
Mountain View, ca 94035