Title: Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation

Speaker: Laurel Orr

Abstract

Abstract Named Entity Disambiguation (NED) is the task of mapping textual mentions to entities in a database. A key challenge in NED is generalizing to rarely seen entities, termed tail entities. Traditional NED systems use hand-tuned features to improve tail generalization, but these features make the system challenging to deploy and maintain. In 2018, a subset of the authors built and deployed a self-supervised NED system at a major technology company, which improved performance over its hand-tuned predecessor. Motivated to understand the core reasons for this improvement, we introduce Bootleg, a clean-slate, open-source, self-supervised NED system. In this talk, we'll show how to succeed on the tail by reasoning over structured data of entity types and, importantly, knowledge graph relations. We demonstrate that with this structured knowledge, Bootleg matches or exceeds state-of-the-art performance on three NED benchmarks and that the learned representations from Bootleg successfully transfer to other non-disambiguation tasks that require entity-based knowledge. We set a new state-of-the-art in the popular TACRED relation extraction task by 1.0 F1 points and demonstrate up to 8% performance lift in highly optimized production search and assistant tasks at a major technology company.

Slides

Bio

I am currently a PostDoc at Stanford working with Christopher Re as part of the Hazy Research lab. My research interests are centered around the data management challenges associated with building, monitoring, and maintaining self-supervised systems. In August of 2019, I graduated with a PhD from Paul G Allen School for Computer Science and Engineering at the University of Washington in Seattle. I was part of the Database Group and advised by Dan Suciu and Magdalena Balazinska.