Title: DataCommons

Speaker: RV Guha (Google)


Publicly available data from open sources are a vital resource for students and researchers in a variety of disciplines. Unfortunately, processing these datasets to make them useful --- scraping, cleaning, normalizing, joining --- is tedious, error prone and has to repeated by every group. DataCommons attempts to alleviate some of this pain by synthesizing a single Knowledge Graph from many different data sources. It links references to the same entities (such as cities, counties, organizations, etc.) across different datasets to nodes on the graph, so that users can access data about a particular entity aggregated from different sources. Like the Web, the DataCommons graph is open - any user can contribute data or build applications powered by the graph. We are jump-starting the graph with data from publicly available sources such as CDC, Census, BLS, FBI, etc. and are looking to engage with the academic community to take it further.

The slides are available here.


Guha is the founder and lead for DataCommons.org, a platform which synthesis a wide range of data sets into a single knowledge graph, for use by students and researchers. He is the creator of widely used web standards such as RSS, RDF and Schema.org. He is also responsible for products such as Google Custom Search. He was a co-founder of Epinions.com and Alpiri. Earlier, he was the author of CycL, the representation language used in Cyc. He is currently Google Fellow and a vice president in research at Google. He has a Ph.D. in computer science from Stanford University and B.Tech in mechanical engineering from IIT Chennai.