Title: Graph Processing at Scale with Apache Spark

Speaker: Matei Zaharia, Stanford University


Many interesting applications involve large graphs with automatically-generated data such as sensor readings and events. These graphs can easily have billions of vertices and hundreds of billions of edges. I will discuss how some of these types of applications are handled in Apache Spark, a general purpose distributed computing engine that supports packages for graph computations such as GraphFrames. Apache Spark also seamlessly integrates graph computation with more traditional SQL and machine learning computations, so that users can combine graph datasets and algorithms with other processing tools to develop their data applications.


Matei Zaharia is an Assistant Professor of Computer Science at Stanford University and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley in 2009, and has worked broadly on other cluster computing and analytics software, including Apache Mesos, Apache Hadoop and MLflow. Today, Matei is a PI in the Stanford DAWN Lab doing research on infrastructure for machine learning, and continues to work on data analytics systems at Databricks. His research was recognized through the 2014 ACM Doctoral Dissertation Award, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE).