Announcements

May 27, 2014

We have just posted a schedule of final project presentations with dates, times, and room numbers. You're welcome to stop by any presentations you'd like, as long as they're not for the same data structure that you're covering.

We're looking forward to seeing your presentations! Feel free to email us if you have any questions in the meantime.

May 22, 2014

We have just posted the requirements for the final project to the course website. In particular, please note that your project writeup will be due 24 hours before you give your presentation.

May 17, 2014

We have some updates to the midterm logistics. I've sent out an email containing this information, but wanted to reprint it here in case anyone missed it.

The midterm will be held on Wednesday, May 21 from 7PM - 10PM. Although initially the plan was for the exam to be closed-book and closed-note, we have revised this policy. You may now have one double-sided sheet of notes with you on 8.5" × 11" paper.

The midterm location has changed. If your last name is A - S, you should go to Meyer 124 in the Meyer Library. If your last name is T - Z, you should go to Meyer 147.

We have released a practice midterm exam that you can use to prepare for the upcoming exam. We'll release solutions in hardcopy on Monday.

Keith will be holding a review session on Monday, May 19 from 7:30PM - 9:30PM in Gates 104. Please feel free to stop by if you have any questions!

May 5, 2014

Keith will be holding two sets of office hours this week: Monday, 3:30PM - 4:45PM in Gates 178, and Tuesday, 1:30 - 2:30PM in Gates 178.

April 23, 2014

We've finalized our office hours times! Keith will be holding office hours from 3:30PM - 5:30PM on Mondays in Gates 178, and the TAs will be holding office hours from 7:30PM - 9:30PM on Thursdays in room 160-318.

April 16, 2014

There's an error in Q2.iii of Problem Set 3 that makes the problem impossible to solve. I've posted an updated version of the problem set handout here on the course website. Sorry about that!

April 7, 2014

I will be tentatively be holding my Monday office hours from 3:30PM - 5:30PM in Hewlett 201 (our lecture hall). Feel free to stop by with questions!

March 5, 2014

Welcome to CS166, a new course in the design, analysis, and implementation of data structures. We've got an exciting quarter ahead of us - the data structures we'll investigate are some of the most beautiful constructs I've ever come across - and I hope you're able to join us.

CS166 has two prerequisites - CS107 and CS161. From CS107, we'll assume that you're comfortable working from the command-line; designing, testing, and debugging nontrivial programs; manipulating pointers and arrays; using bitwise operators; and reasoning about the memory hierarchy. From CS161, we'll assume you're comfortable designing and analyzing nontrivial algorithms; using O, o, Θ, Ω, and ω notation; solving recurrences; working through standard graph and sequence algorithms; and structuring proofs of correctness.

We'll update this site with more information as we get closer to the start of the quarter. In the meantime, feel free to email me at htiek@cs.stanford.edu if you have any questions about the class!

Schedule and Readings

This syllabus is still under construction and is subject to change as we fine-tune the course. Stay tuned for more information and updates!

Monday	Wednesday
Range Minimum Queries March 31 The range minimum query problem is the following: given an array, preprocess it so that you can efficiently determine the smallest value in a variety of subranges. This problem has applications all over CS and there are many strategies to solve it. What are they? How do they work? Readings: TopCoder introduction to RMQ. Handout 00: Course Information Lecture Slides \| (Condensed) Extra Slides: Segment Trees	Cartesian Trees and RMQ April 2 There is a close connection between a special data structure called a Cartesian tree and the RMQ problem. By exploiting this similarity and using a technique called the Method of Four Russians, we can get both theoretical and practical speedups in RMQ computations. Readings: Wikipedia entry for Cartesian trees. Fischer, Johannes and Heun, Volker. Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE. Lecture Slides \| (Condensed) Handout 01: Mathematical Terms and Identities Handout 02: Problem Set Policies Handout 03: CS166 and the Honor Code Handout 04: Problem Set One
Balanced Search Trees I April 7 Balanced search trees are among the most versatile and flexible data structures. They're used extensively in theory and in practice. What sorts of balanced trees exist? How would you design them? And what can you do with them? Readings: CLRS, Chapter 13 CLRS, Chapter 18 Google Open Source Blog: "C++ containers that save time and memory." Lecture Slides \| (Condensed)	Balanced Search Trees II April 9 Balanced search trees support many operations besides insertions and lookups. Additionally, they can be augmented to support more advanced types of queries. These techniques, when combined, make balanced search trees powerful building blocks for later data structures. Readings: CLRS, Chapter 14 Lecture Slides \| (Condensed) Handout 05: Problem Set Two
Euler Tour Trees April 14 Euler tour trees are a data structure for maintaining information about collections of trees in a graph. They're also used as building blocks for more powerful data structures like dynamic graphs and as subroutines in many algorithms. Understanding how Euler tour trees work will give you a much better feel for some of the techniques that go into advanced data structures. Readings: Tarjan, Robert. Dynamic Trees as Search Trees via Euler Tours, Applied to the Network Simplex Algorithm. Lecture Slides \| (Condensed)	Intro to Amortization April 16 In many cases we only care about the total time required to process a set of data. In those cases, we can design data structures that make some operations more expensive in order to lower the total cost of all aggregate operations. How do you analyze these structures? Readings: CLRS: Chapter 17 Lecture Slides \| (Condensed) Handout 06: Problem Set 3
Binomial Heaps April 21 Binomial heaps are a simple and flexible priority queue structure that supports efficient melding of priority queues. The intuition behind binomial heaps is particularly elegant, and they'll serve as a building block toward the more complex Fibonacci heap data structure that we'll talk about on Wednesday. Readings: Vuillemin, Jean. A Data Structure for Manipulating Priority Queues Lecture Slides \| (Condensed)	Fibonacci Heaps April 23 Fibonacci heaps are a type of priority queue that efficiently supports decrease-key, an operation used as a subroutine in many graph algorithms (Dijkstra's algorithm, Prim's algorithm, the Stoer-Wagner min cut algorithm, etc.) They're formed by a clever transformation on a lazy binomial heap. Although Fibonacci heaps have a reputation for being ferociously complicated, they're a lot less scary than they might seem! Readings: CLRS: Chapter 19 Fredman, Michael and Tarjan, Robert. Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms My implementation of a Fibonacci heap. Lecture Slides \| (Condensed) Handout 07: Problem Set 4
Splay Trees April 28 A perfectly balanced BST might not be the best BST for a particular data set if the accesses aren't uniform. What happens if we try to optimize a BST by moving frequently-accessed elements toward the top? Readings: Sleator, Daniel and Tarjan, Robert. Self-Adjusting Binary Search Trees. Lecture Slides \| (Condensed)	Aho-Corasick String Matching April 30 How can you search a huge text for a bunch of shorter strings? Using a combination of tries and finite automata, you can do so about as asymptotically efficiently as possible. Readings: Aho, Alfred and Corasick, Margaret. Efficient String Matching: An Aid to Bibliographic Search. Lecture Slides \| (Condensed) Handout 08: Assignment 5
Suffix Trees May 5 Suffix trees are an incredibly versatile data structure for solving problems on strings. What is a suffix tree? Where does it come from? And why is it so useful? Readings: Lecture Slides \| (Condensed) Handout 09: Suggested Final Project Topics	Suffix Arrays May 7 Suffix arrays are a space-efficient alternative to suffix trees. They're closely related to suffix trees: given a suffix array and an auxiliary data structure called an LCP array, we can construct a suffix trees in linear time. This lecture explores suffix arrays, LCP arrays, and how to construct them. Readings: Kärkkäinen et al. Linear Work Suffix Array Construction. Kasai et al. Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications. Lecture Slides \| (Condensed) Handout 10: Problem Set 6 Handout 11: Final Project Proposal
Count-Min and Count Sketches May 12 How can Google keep track of frequent search queries without storing all the queries it gets in memory? How can you estimate frequently- occurring tweets without storing every tweet in RAM? As long as you're willing to trade off accuracy for space, you get get excellent approximations. Readings: Charikar et al. Finding Frequent Items in Data Streams. Cormode, Graham and Muthukrishnan, C. An Improved Data Stream Summary: The Count-Min Sketch and its Applications. Lecture Slides \| (Condensed)	Cuckoo Hashing May 14 Most hash tables give expected O(1) lookups. Can we make hash tables with no collisions at all, and if so, can we do it efficiently? Amazingly, the answer is yes. There are many schemes for achieving this, one of which, Cuckoo hashing, is surprisingly simple to implement, though the mathematical analysis is decidedly nontrivial. Readings: Pagh, Rasmus and Rodler, Flemming. Cuckoo Hashing Devroye, Luc and Morin, Pat. Cuckoo Hashing: Further Analysis Lecture Slides \| (Condensed) Handout 12: Problem Set 7 Handout 13: Practice Midterm
van Emde Boas Trees May 19 Binary search trees assume that the keys being stored are totally ordered, but otherwise makes no assumptions about them. If you know for a fact that the keys will be integers from a fixed range, you can harness properties of numbers to exponentially speed up BST operations. Doing so requires some tricks reminiscent of those we used in RMQ and will lead to a surprisingly elegant and efficient data structure. Readings: CLRS: Chapter 20 Lecture Slides \| (Condensed)	x-Fast and y-Fast Tries May 21 van Emde Boas trees have excellent runtimes for each operation, yet use a huge amount of space. The y-fast trie matches the runtime bounds of van Emde Boas trees, but use only linear space. More importantly, though, y-fast tries bring together all of the topics from the course so far - blocking, balanced BSTs, amortization, tries, and randomization. My hope is that they're a fitting pre-midterm send-off! Readings: Willard, Dan. Log-Logarithmic Worst-Case Range Queries are Possible in Space Θ(N) Lecture Slides \| (Condensed) Handout 15: Optional Problems 1
Memorial Day May 26 No class.	Disjoint-Set Forests May 28 When we covered Euler tour trees earlier in the quarter, we saw how to solve the fully dynamic connectivity problem in forests with cost O(log n) per operation. In this lecture, we'll explore a data structure called the disjoint-set forest for solving the incremental connectivity problem - in which edges are successively added to a graph - in general graphs. The resulting data structure is blindly fast and has a very clever analysis. Readings: CLRS, Chapter 21 Seidel, Raimund and Sharir, Micha. Top-Down Analysis of Path Compression Lecture Slides \| (Condensed)
Dynamic Graphs June 2 In real-world networks, links come online and offline all the time. Euler-tour trees can maintain connectivity in such networks as long as those networks are graphs. In this lecture, we'll consider two other cases: decremental connectivity in trees, and fully dynamic connectivity in undirected graphs. Readings: Holm, Jacob et al. Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity Lecture Slides \| (Condensed)	Where to Go from Here June 4 There are so many types of data structures out there - geometric data structures, persistent data structures, parallel data structures, etc. What else is out there? How are the skills from CS166 useful in a broader context? Readings: Lecture Slides

Monday	Wednesday
Range Minimum Queries March 31 The range minimum query problem is the following: given an array, preprocess it so that you can efficiently determine the smallest value in a variety of subranges. This problem has applications all over CS and there are many strategies to solve it. What are they? How do they work? Readings: TopCoder introduction to RMQ. Handout 00: Course Information Lecture Slides \| (Condensed) Extra Slides: Segment Trees	Cartesian Trees and RMQ April 2 There is a close connection between a special data structure called a Cartesian tree and the RMQ problem. By exploiting this similarity and using a technique called the Method of Four Russians, we can get both theoretical and practical speedups in RMQ computations. Readings: Wikipedia entry for Cartesian trees. Fischer, Johannes and Heun, Volker. Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE. Lecture Slides \| (Condensed) Handout 01: Mathematical Terms and Identities Handout 02: Problem Set Policies Handout 03: CS166 and the Honor Code Handout 04: Problem Set One
Balanced Search Trees I April 7 Balanced search trees are among the most versatile and flexible data structures. They're used extensively in theory and in practice. What sorts of balanced trees exist? How would you design them? And what can you do with them? Readings: CLRS, Chapter 13 CLRS, Chapter 18 Google Open Source Blog: "C++ containers that save time and memory." Lecture Slides \| (Condensed)	Balanced Search Trees II April 9 Balanced search trees support many operations besides insertions and lookups. Additionally, they can be augmented to support more advanced types of queries. These techniques, when combined, make balanced search trees powerful building blocks for later data structures. Readings: CLRS, Chapter 14 Lecture Slides \| (Condensed) Handout 05: Problem Set Two
Euler Tour Trees April 14 Euler tour trees are a data structure for maintaining information about collections of trees in a graph. They're also used as building blocks for more powerful data structures like dynamic graphs and as subroutines in many algorithms. Understanding how Euler tour trees work will give you a much better feel for some of the techniques that go into advanced data structures. Readings: Tarjan, Robert. Dynamic Trees as Search Trees via Euler Tours, Applied to the Network Simplex Algorithm. Lecture Slides \| (Condensed)	Intro to Amortization April 16 In many cases we only care about the total time required to process a set of data. In those cases, we can design data structures that make some operations more expensive in order to lower the total cost of all aggregate operations. How do you analyze these structures? Readings: CLRS: Chapter 17 Lecture Slides \| (Condensed) Handout 06: Problem Set 3
Binomial Heaps April 21 Binomial heaps are a simple and flexible priority queue structure that supports efficient melding of priority queues. The intuition behind binomial heaps is particularly elegant, and they'll serve as a building block toward the more complex Fibonacci heap data structure that we'll talk about on Wednesday. Readings: Vuillemin, Jean. A Data Structure for Manipulating Priority Queues Lecture Slides \| (Condensed)	Fibonacci Heaps April 23 Fibonacci heaps are a type of priority queue that efficiently supports decrease-key, an operation used as a subroutine in many graph algorithms (Dijkstra's algorithm, Prim's algorithm, the Stoer-Wagner min cut algorithm, etc.) They're formed by a clever transformation on a lazy binomial heap. Although Fibonacci heaps have a reputation for being ferociously complicated, they're a lot less scary than they might seem! Readings: CLRS: Chapter 19 Fredman, Michael and Tarjan, Robert. Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms My implementation of a Fibonacci heap. Lecture Slides \| (Condensed) Handout 07: Problem Set 4
Splay Trees April 28 A perfectly balanced BST might not be the best BST for a particular data set if the accesses aren't uniform. What happens if we try to optimize a BST by moving frequently-accessed elements toward the top? Readings: Sleator, Daniel and Tarjan, Robert. Self-Adjusting Binary Search Trees. Lecture Slides \| (Condensed)	Aho-Corasick String Matching April 30 How can you search a huge text for a bunch of shorter strings? Using a combination of tries and finite automata, you can do so about as asymptotically efficiently as possible. Readings: Aho, Alfred and Corasick, Margaret. Efficient String Matching: An Aid to Bibliographic Search. Lecture Slides \| (Condensed) Handout 08: Assignment 5
Suffix Trees May 5 Suffix trees are an incredibly versatile data structure for solving problems on strings. What is a suffix tree? Where does it come from? And why is it so useful? Readings: Lecture Slides \| (Condensed) Handout 09: Suggested Final Project Topics	Suffix Arrays May 7 Suffix arrays are a space-efficient alternative to suffix trees. They're closely related to suffix trees: given a suffix array and an auxiliary data structure called an LCP array, we can construct a suffix trees in linear time. This lecture explores suffix arrays, LCP arrays, and how to construct them. Readings: Kärkkäinen et al. Linear Work Suffix Array Construction. Kasai et al. Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications. Lecture Slides \| (Condensed) Handout 10: Problem Set 6 Handout 11: Final Project Proposal
Count-Min and Count Sketches May 12 How can Google keep track of frequent search queries without storing all the queries it gets in memory? How can you estimate frequently- occurring tweets without storing every tweet in RAM? As long as you're willing to trade off accuracy for space, you get get excellent approximations. Readings: Charikar et al. Finding Frequent Items in Data Streams. Cormode, Graham and Muthukrishnan, C. An Improved Data Stream Summary: The Count-Min Sketch and its Applications. Lecture Slides \| (Condensed)	Cuckoo Hashing May 14 Most hash tables give expected O(1) lookups. Can we make hash tables with no collisions at all, and if so, can we do it efficiently? Amazingly, the answer is yes. There are many schemes for achieving this, one of which, Cuckoo hashing, is surprisingly simple to implement, though the mathematical analysis is decidedly nontrivial. Readings: Pagh, Rasmus and Rodler, Flemming. Cuckoo Hashing Devroye, Luc and Morin, Pat. Cuckoo Hashing: Further Analysis Lecture Slides \| (Condensed) Handout 12: Problem Set 7 Handout 13: Practice Midterm
van Emde Boas Trees May 19 Binary search trees assume that the keys being stored are totally ordered, but otherwise makes no assumptions about them. If you know for a fact that the keys will be integers from a fixed range, you can harness properties of numbers to exponentially speed up BST operations. Doing so requires some tricks reminiscent of those we used in RMQ and will lead to a surprisingly elegant and efficient data structure. Readings: CLRS: Chapter 20 Lecture Slides \| (Condensed)	x-Fast and y-Fast Tries May 21 van Emde Boas trees have excellent runtimes for each operation, yet use a huge amount of space. The y-fast trie matches the runtime bounds of van Emde Boas trees, but use only linear space. More importantly, though, y-fast tries bring together all of the topics from the course so far - blocking, balanced BSTs, amortization, tries, and randomization. My hope is that they're a fitting pre-midterm send-off! Readings: Willard, Dan. Log-Logarithmic Worst-Case Range Queries are Possible in Space Θ(N) Lecture Slides \| (Condensed) Handout 15: Optional Problems 1
Memorial Day May 26 No class.	Disjoint-Set Forests May 28 When we covered Euler tour trees earlier in the quarter, we saw how to solve the fully dynamic connectivity problem in forests with cost O(log n) per operation. In this lecture, we'll explore a data structure called the disjoint-set forest for solving the incremental connectivity problem - in which edges are successively added to a graph - in general graphs. The resulting data structure is blindly fast and has a very clever analysis. Readings: CLRS, Chapter 21 Seidel, Raimund and Sharir, Micha. Top-Down Analysis of Path Compression Lecture Slides \| (Condensed)
Dynamic Graphs June 2 In real-world networks, links come online and offline all the time. Euler-tour trees can maintain connectivity in such networks as long as those networks are graphs. In this lecture, we'll consider two other cases: decremental connectivity in trees, and fully dynamic connectivity in undirected graphs. Readings: Holm, Jacob et al. Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity Lecture Slides \| (Condensed)	Where to Go from Here June 4 There are so many types of data structures out there - geometric data structures, persistent data structures, parallel data structures, etc. What else is out there? How are the skills from CS166 useful in a broader context? Readings: Lecture Slides