Announcements

Presentation Schedule Posted
May 27, 2014

We have just posted a schedule of final project presentations with dates, times, and room numbers. You're welcome to stop by any presentations you'd like, as long as they're not for the same data structure that you're covering.

We're looking forward to seeing your presentations! Feel free to email us if you have any questions in the meantime.

Final Project Requirements
May 22, 2014

We have just posted the requirements for the final project to the course website. In particular, please note that your project writeup will be due 24 hours before you give your presentation.

Midterm Logistics
May 17, 2014

We have some updates to the midterm logistics. I've sent out an email containing this information, but wanted to reprint it here in case anyone missed it.

The midterm will be held on Wednesday, May 21 from 7PM - 10PM. Although initially the plan was for the exam to be closed-book and closed-note, we have revised this policy. You may now have one double-sided sheet of notes with you on 8.5" × 11" paper.

The midterm location has changed. If your last name is A - S, you should go to Meyer 124 in the Meyer Library. If your last name is T - Z, you should go to Meyer 147.

We have released a practice midterm exam that you can use to prepare for the upcoming exam. We'll release solutions in hardcopy on Monday.

Keith will be holding a review session on Monday, May 19 from 7:30PM - 9:30PM in Gates 104. Please feel free to stop by if you have any questions!

Keith's OH Split This Week
May 5, 2014

Keith will be holding two sets of office hours this week: Monday, 3:30PM - 4:45PM in Gates 178, and Tuesday, 1:30 - 2:30PM in Gates 178.

Office Hours Time Update
April 23, 2014

We've finalized our office hours times! Keith will be holding office hours from 3:30PM - 5:30PM on Mondays in Gates 178, and the TAs will be holding office hours from 7:30PM - 9:30PM on Thursdays in room 160-318.

Correction to Problem Set 3
April 16, 2014

There's an error in Q2.iii of Problem Set 3 that makes the problem impossible to solve. I've posted an updated version of the problem set handout here on the course website. Sorry about that!

Office Hours Locations
April 7, 2014

I will be tentatively be holding my Monday office hours from 3:30PM - 5:30PM in Hewlett 201 (our lecture hall). Feel free to stop by with questions!

Welcome to CS166!
March 5, 2014

Welcome to CS166, a new course in the design, analysis, and implementation of data structures. We've got an exciting quarter ahead of us - the data structures we'll investigate are some of the most beautiful constructs I've ever come across - and I hope you're able to join us.

CS166 has two prerequisites - CS107 and CS161. From CS107, we'll assume that you're comfortable working from the command-line; designing, testing, and debugging nontrivial programs; manipulating pointers and arrays; using bitwise operators; and reasoning about the memory hierarchy. From CS161, we'll assume you're comfortable designing and analyzing nontrivial algorithms; using O, o, Θ, Ω, and ω notation; solving recurrences; working through standard graph and sequence algorithms; and structuring proofs of correctness.

We'll update this site with more information as we get closer to the start of the quarter. In the meantime, feel free to email me at htiek@cs.stanford.edu if you have any questions about the class!

 

Schedule and Readings

This syllabus is still under construction and is subject to change as we fine-tune the course. Stay tuned for more information and updates!

Monday Wednesday
Range Minimum Queries
March 31

The range minimum query problem is the following: given an array, preprocess it so that you can efficiently determine the smallest value in a variety of subranges. This problem has applications all over CS and there are many strategies to solve it. What are they? How do they work?

Readings:

Cartesian Trees and RMQ
April 2

There is a close connection between a special data structure called a Cartesian tree and the RMQ problem. By exploiting this similarity and using a technique called the Method of Four Russians, we can get both theoretical and practical speedups in RMQ computations.

Readings:

Handout 01: Mathematical Terms and Identities

Handout 02: Problem Set Policies

Handout 03: CS166 and the Honor Code

Handout 04: Problem Set One

Balanced Search Trees I
April 7

Balanced search trees are among the most versatile and flexible data structures. They're used extensively in theory and in practice. What sorts of balanced trees exist? How would you design them? And what can you do with them?

Readings:

Balanced Search Trees II
April 9

Balanced search trees support many operations besides insertions and lookups. Additionally, they can be augmented to support more advanced types of queries. These techniques, when combined, make balanced search trees powerful building blocks for later data structures.

Readings:

Handout 05: Problem Set Two

Euler Tour Trees
April 14

Euler tour trees are a data structure for maintaining information about collections of trees in a graph. They're also used as building blocks for more powerful data structures like dynamic graphs and as subroutines in many algorithms. Understanding how Euler tour trees work will give you a much better feel for some of the techniques that go into advanced data structures.

Readings:

Intro to Amortization
April 16

In many cases we only care about the total time required to process a set of data. In those cases, we can design data structures that make some operations more expensive in order to lower the total cost of all aggregate operations. How do you analyze these structures?

Readings:

Handout 06: Problem Set 3

Binomial Heaps
April 21

Binomial heaps are a simple and flexible priority queue structure that supports efficient melding of priority queues. The intuition behind binomial heaps is particularly elegant, and they'll serve as a building block toward the more complex Fibonacci heap data structure that we'll talk about on Wednesday.

Readings:

Fibonacci Heaps
April 23

Fibonacci heaps are a type of priority queue that efficiently supports decrease-key, an operation used as a subroutine in many graph algorithms (Dijkstra's algorithm, Prim's algorithm, the Stoer-Wagner min cut algorithm, etc.) They're formed by a clever transformation on a lazy binomial heap. Although Fibonacci heaps have a reputation for being ferociously complicated, they're a lot less scary than they might seem!

Readings:

Handout 07: Problem Set 4

Splay Trees
April 28

A perfectly balanced BST might not be the best BST for a particular data set if the accesses aren't uniform. What happens if we try to optimize a BST by moving frequently-accessed elements toward the top?

Readings:

Aho-Corasick String Matching
April 30

How can you search a huge text for a bunch of shorter strings? Using a combination of tries and finite automata, you can do so about as asymptotically efficiently as possible.

Readings:

Handout 08: Assignment 5

Suffix Trees
May 5

Suffix trees are an incredibly versatile data structure for solving problems on strings. What is a suffix tree? Where does it come from? And why is it so useful?

Readings:

Suffix Arrays
May 7

Suffix arrays are a space-efficient alternative to suffix trees. They're closely related to suffix trees: given a suffix array and an auxiliary data structure called an LCP array, we can construct a suffix trees in linear time. This lecture explores suffix arrays, LCP arrays, and how to construct them.

Readings:

Handout 10: Problem Set 6

Handout 11: Final Project Proposal

Count-Min and Count Sketches
May 12

How can Google keep track of frequent search queries without storing all the queries it gets in memory? How can you estimate frequently- occurring tweets without storing every tweet in RAM? As long as you're willing to trade off accuracy for space, you get get excellent approximations.

Readings:

Cuckoo Hashing
May 14

Most hash tables give expected O(1) lookups. Can we make hash tables with no collisions at all, and if so, can we do it efficiently? Amazingly, the answer is yes. There are many schemes for achieving this, one of which, Cuckoo hashing, is surprisingly simple to implement, though the mathematical analysis is decidedly nontrivial.

Readings:

Handout 12: Problem Set 7

Handout 13: Practice Midterm

van Emde Boas Trees
May 19

Binary search trees assume that the keys being stored are totally ordered, but otherwise makes no assumptions about them. If you know for a fact that the keys will be integers from a fixed range, you can harness properties of numbers to exponentially speed up BST operations. Doing so requires some tricks reminiscent of those we used in RMQ and will lead to a surprisingly elegant and efficient data structure.

Readings:

x-Fast and y-Fast Tries
May 21

van Emde Boas trees have excellent runtimes for each operation, yet use a huge amount of space. The y-fast trie matches the runtime bounds of van Emde Boas trees, but use only linear space. More importantly, though, y-fast tries bring together all of the topics from the course so far - blocking, balanced BSTs, amortization, tries, and randomization. My hope is that they're a fitting pre-midterm send-off!

Readings:

Handout 15: Optional Problems 1

Memorial Day
May 26

No class.

Disjoint-Set Forests
May 28

When we covered Euler tour trees earlier in the quarter, we saw how to solve the fully dynamic connectivity problem in forests with cost O(log n) per operation. In this lecture, we'll explore a data structure called the disjoint-set forest for solving the incremental connectivity problem - in which edges are successively added to a graph - in general graphs. The resulting data structure is blindly fast and has a very clever analysis.

Readings:

Dynamic Graphs
June 2

In real-world networks, links come online and offline all the time. Euler-tour trees can maintain connectivity in such networks as long as those networks are graphs. In this lecture, we'll consider two other cases: decremental connectivity in trees, and fully dynamic connectivity in undirected graphs.

Readings:

Where to Go from Here
June 4

There are so many types of data structures out there - geometric data structures, persistent data structures, parallel data structures, etc. What else is out there? How are the skills from CS166 useful in a broader context?

Readings: