CS 168: The Modern Algorithmic Toolbox
- 4/19: Here is a link to some notes on Gradient Descent, which might be helpful for mini-project #3 if you haven't seen gradient descent before. Gradient Descent Notes.
It is due Tuesday, April 24th at 11:59pm.
- 4/17: Mini-project #3 is available.
It is due Tuesday, April 24th at 11:59pm.
- 4/13: There was some confusion on Problem 3 (c); I re-worded that part of the problem to make it more clear. (Though any good answer to the original wording is also a valid answer to the re-worded question. Sorry for the confusion, and please download the latest version.
- 4/10: Fixed some minor typos in Miniproject 2 at 11pm. Please download the latest version.
- 4/9: Mini-project #2 is available.
It is due Tuesday, April 17th at 11:59pm. The dataset is available here.
- Here is some starter code for drawing
heatmaps in Python (and/or check Stack Overflow).
- 4/9: Here
is a review (courtesy of EE263) of the most basic aspects of vectors and matrices (e.g., how
to multiply them). Most relevant for CS168 are pages 1--9 (up to "Block
matrices and submatrices") and the "Linear functions" and "Linear
equations" sections on pages 10--11.
- 4/4: We updated the miniproject submission instructions, and set up the Gradescope submission page (entry code 986PDG).
- 4/2: Mini-project #1 is available.
It is due Tuesday, April 10th (at 11:59pm).
- Here is some starter code for drawing
histograms in Python (and/or check Stack Overflow).
- 4/2: First class, 3-4:20pm in 320-105. Welcome to CS168!
Gregory Valiant (Office hours: Mon 4:30-5:30pm, Gates 470. Email: last name at stanford.edu).
(Office hours: Mon 9-11am, Huang Basement,
Email: mkatzman at stanford.edu).
(Office hours: Mon 1-3pm, Gates 459 (starting 4/16)
Email: mussmann at stanford).
(Office hours: Tues 4:15-6:15pm, Gates 460,
Email: first initial last name at stanford.edu).
(Office hours: Fri 3-5pm, Gates 498,
Email: jungs at stanford).
Time/location: 3:00 - 4:20pm Mon/Wed in 320-105.
Piazza site: Here.
Prerequisites: CS107 and CS161, or permission from the instructor.
This course will provide a rigorous and hands-on
introduction to the central ideas and algorithms that constitute the
core of the modern algorithms toolkit. Emphasis will be on
understanding the high-level theoretical intuitions and principles
underlying the algorithms we discuss, as well as developing a concrete
understanding of when and how to implement and apply the algorithms.
The course will be structured as a sequence of one-week
investigations; each week will introduce one algorithmic idea, and
discuss the motivation, theoretical underpinning, and practical
applications of that algorithmic idea. Each topic will be accompanied
by a mini-project in which students will be guided through a practical
application of the ideas of the week. Topics include modern techniques
in hashing, dimension reduction, linear and convex programming,
gradient descent and regression, sampling and estimation, compressive
sensing, and linear-algebraic techniques (principal components
analysis, singular value decomposition, spectral techniques).
Week 1: Modern Hashing
- Lecture 1 (Mon 4/2): Course introduction. Consistent hashing.
- Lecture 2 (Wed 4/4):
Property-preserving lossy compression.
From majority elements to approximate heavy hitters.
From bloom filters to the count-min sketch.
Week 2: Data with Distances (Similarity Search, Nearest Neighbor,
Dimension Reduction, LSH)
- Lecture 3 (Mon 4/9):
(Dis)similarity metrics: Jaccard, Euclidean, Lp.
Efficient algorithm for finding similar elements in small/medium (ie. <20)
dimensions using k-d-trees.
- Lecture 4 (Wed 4/12):
Curse of Dimensionality, kissing number.
Estimating Jaccard similarity using MinHash.
JL dimensionality reduction.
- A nice survey of "kissing number", and some other strange phenomena from high dimensional spaces:
Kissing Numbers, Sphere Packings, and some Unexpected Proofs (from 2000).
- Origins of MinHash at Alta Vista:
and Filtering Near-Duplicate
Documents (from 2000).
- Ailon/Chazelle, Faster
Dimension Reduction, CACM '10.
Hashing Algorithms for Approximate Nearest Neighbor in High
Dimensions, CACM '08.
- For much more on LSH, see this chapter of
the CS246 textbook (by Leskovec, Rajaraman, and Ullman).
Week 3: Generalization and Regularization
- Lecture 5 (Mon 4/16):
Generalization (or, how much data is enough?).
Learning an unknown function from samples from an unknown distribution.
Training error vs. test error. PAC guarantees for linear classifiers. Empirical risk minimization.
- Lecture 6 (Wed 4/18): Regularization. The polynomial embedding and random projection, L2 regularization, and L1 regularization as a computationally tractable surrogate for L0 regularization.
- A nice/short technical description of recovering sparse signals via L1 regularization: Candes/Wakin'08.
- Very recent paper on generalization and implicit regularization in deep learning, which raises more questions than it answers: Understanding deep learning requires rethinking generalization, ICLR'17.
Week 4: Linear-Algebraic Techniques:
Understanding Principal Components Analysis
Week 5: Linear-Algebraic Techniques: Understanding the Singular Value Decomposition
Week 6: Spectral Graph Theory
Week 7: Sampling and Estimation
Week 8: The Fourier Perspective (and other bases)
Week 9: Sparse Vector/Matrix Recovery (Compressive Sensing)
Bonus Lecture: Privacy-Preserving Computation
- Assignments (75%): There will be 9 weekly mini-projects centered around the topics covered that week. Each mini-project contains both written and programming parts. The projects can be done individually or in pairs. If you work in a pair, only one member should submit all of the relevant files.
For the written part, you are encouraged to use LaTeX to typeset your homeworks;
we've provided a template for your
convenience. We will be using the GradeScope online submission system. You should have received an email saying that you've been enrolled in CS168 on Gradescope. If not, create an account on Gradescope using your Stanford ID and join CS168 using entry code 986PDG.
For the programming part, you are encouraged to use matlab (tutorial), Numpy and Pyplot in Python (Python tutorial, Numpy tutorial, Pyplot tutorial), or some other scientific computing tool (with plotting). Here is a comprehensive python tutorial using IPython Notebook. IPython Notebook is an interactive computational environment, especially useful for scientific computing (tutorial on how to set up). For easy reference, you can also view the notebook here.
To turn in your programming part:
- Compress all your files into a .zip file, such
- See the instructions on the mini-project for details about what files to
submit. For example, we generally want your code in addition
to your answers.
- Copy the zip file to the cardinal machine (don't miss the colon in the end):
scp <zip file> <your SUNetID>@cardinal.stanford.edu:
- Log onto the cardinal machine:
ssh <your SUNetID>cardinal.stanford.edu
- Run the submit script (run the script without any argument to see its usage):
Note: your file must have the proper name (such as
/usr/class/cs168/WWW/submit.py <project ID> .
p1.zip) for the submit script to process it.
You can submit multiple times; each submission will just replace the previous one.
Assignments are released on Mondays, and are due at 11:59pm on Tuesdays the following week (both the written and the programming parts). No late assignments will be accepted, but we will drop your lowest assignment grade when calculating your final grade.
- Exam (25%):
Date: Friday, June 8th, 3:30 - 6:30 pm.
Except where otherwise noted, you may refer to your course notes, the
textbooks and research papers listed on the course Web
page only. You cannot refer to textbooks, handouts, or research
papers that are not listed on the course home page. If you do use any
approved sources, make you sure you cite them appropriately, and make
sure that all your words are your own.
You are also permitted to use general resources for whatever programming
language you choose to use.
You can discuss the problems verbally at a high level with other groups. And of course, you are encouraged to contact the course staff (via Piazza or office hours) for additional help.
Please follow the honor code.