Recent Advances in Entity Matching

Speaker: An Hai Doan, University of Wisconsin at Madison


Entity matching (EM) finds data instances that refer to the same real-world entity, such as (D. Smith, UW-Madison) and (Dave Smith, UWM). This problem is ubiquitous in data integration, and plays a fundamental role in knowledge graph construction. In this talk I will describe current advances in EM, focusing on efforts to build open-source EM platforms as well as commercial ones. In particular, I will describe the Magellan project at the University of Wisconsin-Madison, which develops the PyMatcher on-prem EM platform as well as CloudMatcher, a cloud-based EM platform. I will discuss how these EM platforms combine machine learning, big data processing, and effective user interaction to solve EM problems. Magellan has recently been commercialized by the startup GreenBay and pushed into commercial EM platforms at Informatica, the world-leading data integration company. Finally, I will touch upon how Magellan technologies are helping to build knowledge graphs at Informatica.

The slides are available here.


AnHai Doan is a Vilas Distinguished Achievement Professor of Computer Science at the University of Wisconsin-Madison. His interests cover databases, AI, and Web, with a current focus on data integration, data science, and machine learning. AnHai received the ACM Doctoral Dissertation Award in 2003, a CAREER Award in 2004, and a Sloan Fellowship in 2007. He co-authored ``Principles of Data Integration'', a textbook published by Morgan-Kaufmann in 2012. AnHai has also consulted extensively and been involved in several startups. He was on the Advisory Board of Transformic, a Deep Web startup acquired by Google in 2005, and was Chief Scientist of Kosmix, a social media startup acquired by Walmart in 2011. From 2011 to 2014 he was Chief Scientist of WalmartLabs, a newly formed research and development lab at Walmart, devoted to analyzing and integrating data for e-commerce. From 2019 until now he is a co-founder of GreenBay Technologies, a startup commercializing his recent research on entity matching.