MMDS 2012. Workshop on Algorithms for Modern Massive Data Sets

Stanford University
July 10–13, 2012


The Workshops on Algorithms for Modern Massive Data Sets (MMDS 2012) addressed algorithmic and statistical challenges in modern large-scale data analysis. The goals of this series of workshops are to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets; and to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to promote the cross-fertilization of ideas.


MMDS 2012 Wrap-up: We kindly thank all participants, poster presenters and speakers for attending the workshop.

  • Talk videos: Video recordings of the MMDS 2012 talks are now available.
  • Talk slides: Majority of the slides from the workshop have been posted below. Outstanding slides will be added as they become available. The full workshop program can be found here.
  • Event location: All talks will be held in Cubberley Auditorium in the School of Education Building. You can find the directions to the venue here!
  • Preliminary schedule: The schedule of the talks is now available below.
  • Logistics: Information on lodging, transportation and parking for MMDS 2012 may be found here.
  • Sponsors: If your organization is interested in sponsoring MMDS 2012, please let us know.
  • GraphLab: A related workshop on GraphLab is being held in San Francisco on Monday just prior to MMDS 2012.
  • Poster Session: There will be a poster session held on the second day of the workshop immediately following the talks. Students presenting a poster are waived the conference registration fee.
  • Contact: If you have questions, email us at mmds-organizers at math dot stanford dot edu.

  • MMDS 2012 Preliminary Schedule

    Tuesday, July 10, 2012. Theme: Data Analysis and Statistical Data Analysis

    Time Talk
    8:00 - 10:00 Breakfast and Registration -- outside Cubberley Auditorium (at the Stanford School of Education, just off the Main Quad)
    9:45 - 10:00 Welcome and Opening Remarks -- in Cubberley Auditorium
    10:00 - 11:00 Tutorial: Jiawei Han
    A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks
    11:00 - 11:30 Alexander Gray
    Faster Learning for Massive Datasets
    11:30 - 12:00 Christopher Re
    Hazy: Making Data-driven Statistical Applications Easier to Build and Maintain
    2:00 - 3:00 Tutorial: Peter Bartlett
    Model Selection and Recent Results for Large Scale Problems
    3:00 - 3:30 Noureddine El Karoui
    On Robust Regression Estimators in High-dimension
    3:30 - 4:00 Jure Leskovec
    Affiliation Network Models for Densely Overlapping Communities in Networks
    4:30 - 5:00 Haesun Park
    Nonnegative Matrix Factorizations for Clustering
    5:00 - 5:30 Fan Chung Graham
    Vectorized Laplacians for Dealing with High-dimensional Data Sets
    5:30 - 6:00 Joydeep Ghosh
    Actionable Mining of Large, Multi-relational Data using Localized Predictive Models

    Wednesday, July 11, 2012. Theme: Industrial and Scientific Applications

    Time Talk
    9:00 - 10:00 Tutorial: DJ Patil
    When Algorithms Go Wrong: How Product Design Can Save Algorithmic Limitations
    Book PDFs: Building Data Science Teams, Data Jujitsu
    10:00 - 10:30 Sean Fahey
    Big Data and Analytics for National Security
    11:00 - 11:30 Petros Drineas
    Leverage Scores, the Column Subset Selection Problem, and Least-squares Problems
    11:30 - 12:00 David Woodruff
    Low Rank Approximation and Regression in Input Sparsity Time
    12:00 - 12:30 Michael W. Mahoney
    Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments
    2:30 - 3:30 Tutorial: Rick Stevens
    The Biological, Algorithmic and Computational Challenges of Systems Biology
    3:30 - 4:00 Tiankai Tu
    Fault-Tolerant Parallel Analysis of Millisecond-Scale Molecular Dynamics Trajectories
    4:30 - 5:00 Alexander Szalay
    Current Statistical Challenges in Large Astronomical Surveys
    5:00 - 5:30 Joseph Richards
    Astronomical Time Series Analysis for the Synoptic Survey Era
    5:30 - 6:00 Tony Cass
    Data Handling for LHC: Plans and Reality

    Thursday, July 12, 2012. Theme: Novel Algorithmic Approaches

    Time Talk
    9:00 - 10:00 Tutorial: Michael Mitzenmacher
    Peeling Arguments: Invertible Bloom Lookup Tables and Biff Codes
    10:00 - 10:30 Frederic Chazal
    Detection and Approximation of Linear Structures in Metric Spaces
    11:00 - 11:30 Ping Li
    Probabilistic Hashing for Efficient Search and Learning on Massive Data
    11:30 - 12:00 Ashish Goel
    Real Time Social Search and Related Problems
    12:00 - 12:30 Andrew Goldberg
    Hub Labels in Databases: Shortest Paths for the Masses
    2:30 - 3:00 Theodore Johnson
    Data Stream Warehousing
    3:00 - 3:30 Josh Wills
    Experimenting at Scale
    3:30 - 4:00 Hang Li
    Large Scale Machine Learning for Query Document Matching in Web Search
    4:30 - 4:50 Blair Sullivan
    Branching Out: Quantifying Tree-like Structure in Complex Networks
    4:50 - 5:10 Mahdi Soltanolkotabi
    A Geometric Analysis of Subspace Clustering with Outliers
    5:10 - 5:30 Bahman Bahmani
    Scalable K-Means++
    5:30 - 6:00 Steve Bartel
    Analytics at Dropbox

    Friday, July 13, 2012. Theme: Novel Matrix and Graph Methods

    Time Talk
    9:00 - 10:00 Tutorial: Yi Ma
    The Pursuit of Low-dimensional Structures in High-dimensional Data
    10:00 - 10:30 Edoardo Airoldi
    Graphlets Decomposition of a Weighted Network
    11:00 - 11:30 Yiannis Koutis
    SDD Solvers: Bridging the Gap Between Theory and Practice
    11:30 - 12:00 Art Owen
    Bootstrapping r-fold Tensor Data
    12:00 - 12:30 Kamesh Madduri
    Algorithms and Tools for Scalable Graph Analytics
    2:30 - 3:00 Shaowei Lin
    Studying Model Asymptotics with Singular Learning Theory
    3:00 - 3:30 David Bindel
    Communities, Spectral Clustering, and Random Walks
    3:30 - 4:00 Ali Pinar
    The Block Two-Level Erdos-Renyi (BTER) Graph Model
    4:30 - 5:00 Xiao-Li Meng (presented by Alexander Blocker)
    Preprocessing, Multiphase Inference, and Massive Data in Theory and Practice
    5:00 - 5:30 Alfred Hero
    Hub Discovery in Large Correlation Networks
    5:30 - 6:00 Dan Feldman
    Google Your Life: Learning Sensors Data

    MMDS 2012 Confirmed Speakers

    MMDS 2012 Organizers

    Organizing Committee:
    Michael Mahoney (chair), Alex Shkolnik, Gunnar Carlsson, Petros Drineas

    MMDS 2012 Sponsors

    The MMDS 2012 Organizers and the MMDS Foundation would like to thank the following institutional sponsors for their generous support:

    ebay dropbox dropbox

    AFOSR LBNL stanford

    Past MMDS events

    MMDS 2010: Workshop on Algorithms for Modern Massive Data Sets, Stanford, CA, June 15–18, 2010.

    MMDS 2008: Workshop on Algorithms for Modern Massive Data Sets, Stanford, CA, June 25–28, 2008.

    MMDS 2006: Workshop on Algorithms for Modern Massive Data Sets, Stanford, CA, June 21–24, 2006.