CS369G: Algorithmic Techniques for Big Data

Spring Quarter 2016, Stanford University

Course description

Designing algorithms for efficient processing of large data sets poses unique challenges. This course will discuss algorithmic paradigms that have been developed to efficiently process data sets that are much larger than available memory. We will cover streaming algorithms and sketching methods that produce compact data structures, dimension reduction methods that preserve geometric structure, efficient algorithms for numerical linear algebra, graph sparsification methods, as well as impossibility results for these techniques.

Prerequisites: students will be expected to have a strong background in algorithms and probability.

Grading: there will be three problem sets (60%), a research project (30%) and a scribing requirement (10%).


Professor Moses Charikar and teaching assistant Paris Syminelakis.

Tentative Syllabus

  1. Sketching and Streaming algorithms for basic statistics: Distinct elements, heavy hitters, frequency moments, p-stable sketches

  2. Dimension Reduction: Johnson Lindenstrauss lemma, lower bounds and impossibility results

  3. Graph stream algorithms: connectivity, cut/spectral sparsifiers, spanners, matching, graph sketching

  4. Lower bounds for Sketching and Streaming: communication complexity: Equality, Index and Set-Disjointness

  5. Locality Sensitive Hashing: similarity estimation, approximate nearest neighbor search, data dependent hashing

  6. Fast Approximate Numerical Linear Algebra: matrix multiplication, low-rank approximation, subspace embeddings, least squares regression


  • Lectures take place TuThu 1:30-2:50 at Bldg 200 - Room 030.

  • Register on Piazza using this link.

  • To sign up for scribing a lecture send an email to the staff list with your name and SUID. You can see which lectures are available for scribing in this document.

  • The list of project topics has been posted!

  • Homework 3 is out and due May 31 before class!

  • Due to Memorial Day, homework 3 due date is moved to Wednesday June 1st!

  • The project deadline has been moved to Wednesday, June 8, 1:29pm!