Announcements

May 12, 2016

We've gone and run our matchmaking algorithm and have finished assigning final project topics. The list of teams and project topics is now available online. Best of luck with your projects - we're looking forward to seeing what you come up with!

April 28, 2016

A number of you have started asking us questions about the final project for CS166, so we've just released a handout detailing final project requirements. You'll be required to submit a list of topics that you'd be interested in exploring for your final project by Tuesday, May 10, at which point we'll run a matchmaking algorithm to assign final topics.

We've put together a list of suggested topics for final projects with around sixty or so suggestions. You're welcome to propose a final project on any topic you'd like, so feel free to take something off of this list if you're interested. If you'd like to propose something else, that's perfectly fine - some of the best projects from last quarter were on topics we hadn't heard of before!

March 31, 2016

Problem Set One goes out today. It's due on Thursday of next week right before the start of class. Please submit your written answers through GradeScope and the programming problems using our submitter script; details are in the Problem Set Policies handout.

Good luck!

March 28, 2016

Welcome to CS166, a course in the design, analysis, and implementation of data structures. We've got an exciting quarter ahead of us - the data structures we'll investigate are some of the most beautiful constructs I've ever come across - and I hope you're able to join us.

CS166 has two prerequisites - CS107 and CS161. From CS107, we'll assume that you're comfortable working from the command-line; designing, testing, and debugging nontrivial programs; manipulating pointers and arrays; using bitwise operators; and reasoning about the memory hierarchy. From CS161, we'll assume you're comfortable designing and analyzing nontrivial algorithms; using O, o, Θ, Ω, and ω notation; solving recurrences; working through standard graph and sequence algorithms; and structuring proofs of correctness.

We'll update this site with more information as we get closer to the start of the quarter. In the meantime, feel free to email me at htiek@cs.stanford.edu if you have any questions about the class!

Schedule and Readings

This syllabus is still under construction and is subject to change as we fine-tune the course. Stay tuned for more information and updates!

Tuesday	Thursday
Where to Go from Here May 31 We've covered quite a lot of topics this quarter and it's time to ride off into the sunset. What's next in theoryland? What lessons have we learned this quarter? And what else do you want to know? This lecture hopefully addresses answers to those questions. Readings: Slides
Tuesday	Thursday
Disjoint-Set Forests May 24 In CS161, you learned how to use DFS or BFS to determine connectivity in a graph. But what happens if you want to intermix connectivity queries with modifications to the underlying graph? This problem, the dynamic connectivity problem, is significantly more challenging to solve efficiently but gives rise to some beautiful data structures. One such data structure, the disjoint-set forest, solves the incremental connectivity problem in a time bound so impressive that it's almost constant. This lecture explores how disjoint-set forests work and a beautiful proof of the time bound that introduces a whole new class of rapidly- and slowly-growing functions. Readings: CLRS, Chapter 21 Seidel, Raimund and Sharir, Micha. Top-Down Analysis of Path Compression Slides \| Condensed Slides	Dynamic Graphs May 26 In real-world networks, links come online and offline all the time. Euler-tour trees can maintain connectivity in such networks as long as those networks are forests, and dynamic graphs can handle arbitrary network topologies. Readings: Tarjan, Robert. Dynamic trees as search trees via Euler tours, applied to the network simplex algorithm Holm, Jacob et al. Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity Slides \| Condensed Slides
van Emde Boas Trees May 17 Binary search trees assume that the keys being stored are totally ordered, but otherwise makes no assumptions about them. If you know for a fact that the keys will be integers from a fixed range, you can harness properties of numbers to exponentially speed up BST operations. Doing so requires some tricks reminiscent of those we used in RMQ and will lead to a surprisingly elegant and efficient data structure. Readings: van Emde Boas, Peter et al. Design and Implementation of Efficient Priority Queues CLRS: Chapter 20 Slides \| Condensed Slides Handouts: Handout 13: Midterm Practice Problems	x-Fast and y-Fast Tries May 19 van Emde Boas trees have excellent runtimes for each operation, yet use a huge amount of space. The y-fast trie matches the runtime bounds of van Emde Boas trees, but use only linear space. More importantly, though, y-fast tries bring together all of the topics from the course so far - blocking, balanced BSTs, amortization, tries, and randomization. My hope is that they're a fitting pre-midterm send-off! Readings: Willard, Dan. Log-Logarithmic Worst-Case Range Queries are Possible in Space Θ(N) Slides \| Condensed Slides
Linear Probing May 10 Linear probing is one of the oldest and simplest strategies for building a hash table. It's easy to implement and, for a number of reasons, is extremely fast in practice. The analysis of linear probing dates was due to Knuth, which assumed truly random hash functions. But what happens if we tighten up this restriction and say that the hash functions aren't truly random but are, instead, just k-independent for some fixed k? Then the analysis gets a lot more challenging, but also a lot more interesting. In exploring how the math works out, we'll get a richer understanding for how to analyze randomized data structures. Readings: CLRS, Chapter 11.4 (summary of linear probing) Knuth, Donald. Notes on "Open" Addressing. Thorup, Mikkel. Lecture Notes on Linear Probing with 5-Independent Hashing. Patrascu, Mihai and Thorup, Mikkel. On the k-Independence Required by Linear Probing and Minwise Independence. Slides \| Condensed Slides	Cuckoo Hashing May 12 Most hash tables give expected O(1) lookups. Can we make hash tables with no collisions at all, and if so, can we do it efficiently? Amazingly, the answer is yes. There are many schemes for achieving this, one of which, Cuckoo hashing, is surprisingly simple to implement. Readings: Pagh, Rasmus and Rodler, Flemming. Cuckoo Hashing Devroye, Luc and Morin, Pat. Cuckoo Hashing: Further Analysis Slides \| Condensed Slides Handouts: Handout 11: Problem Set 5
Splay Trees May 3 Balanced binary search trees give worst-case O(log n) times on each tree operation. If we're trying to guarantee worst-case efficiency, this is as good as it gets. But what if we're trying to guarantee average-case efficiency? In that case, it's possible to improve upon the bounds given by balanced BSTs. In this lecture, we'll explore statically optimal BSTs and an amazing data structure called the splay tree that very well may be the best possible binary search tree. Readings: Sleator, Daniel and Tarjan, Robert. Self-Adjusting Binary Search Trees. Slides \| Condensed Slides Handouts: Handout 10: Problem Set 4	Count-Min and Count Sketches May 5 How can Google keep track of frequent search queries without storing all the queries it gets in memory? How can you estimate frequently- occurring tweets without storing every tweet in RAM? As long as you're willing to trade off accuracy for space, you get get excellent approximations. Readings: Charikar et al. Finding Frequent Items in Data Streams. Cormode, Graham and Muthukrishnan, C. An Improved Data Stream Summary: The Count-Min Sketch and its Applications. Slides \| Condensed Slides
Binomial Heaps April 26 Binomial heaps are a simple and flexible priority queue structure that supports efficient melding of priority queues. The intuition behind binomial heaps is particularly elegant, and they'll serve as a building block toward the more complex Fibonacci heap data structure that we'll talk about on Thursday. Readings: Vuillemin, Jean. A Data Structure for Manipulating Priority Queues Slides \| Condensed Slides	Fibonacci Heaps April 28 Fibonacci heaps are a type of priority queue that efficiently supports decrease-key, an operation used as a subroutine in many graph algorithms (Dijkstra's algorithm, Prim's algorithm, the Stoer-Wagner min cut algorithm, etc.) They're formed by a clever transformation on a lazy binomial heap. Although Fibonacci heaps have a reputation for being ferociously complicated, they're a lot less scary than they might seem! Readings: CLRS: Chapter 19 Fredman, Michael and Tarjan, Robert. Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms Slides \| Condensed Slides
Balanced Trees, Part II April 19 We've spent a lot of time trying to figure out how to build nice balanced trees. Now that we've got them, what else can we do with them? In this lecture, we'll see two more advanced techniques - tree augmentation and the split/join operations - that will make it possible to build significantly more complex data structures in the future. Readings: CLRS, Chapter 14. Slides \| Condensed Slides Handouts: Handout 07: Problem Set 3 Pugh, William. "Skip Lists: A Probabilistic Alternative to Balanced Trees" (for Q4)	Amortized Analysis April 21 We've spent a lot of time trying to figure out how to build nice balanced trees. Now that we've got them, what else can we do with them? In this lecture, we'll see two more advanced techniques - tree augmentation and the split/join operations - that will make it possible to build significantly more complex data structures in the future. In many cases we only care about the total time required to process a set of data. In those cases, we can design data structures that make some operations more expensive in order to lower the total cost of all aggregate operations. How do you analyze these structures? Readings: CLRS: Chapter 17 Slides \| Condensed Slides
Suffix Arrays April 12 Suffix trees have all sorts of magical properties, but they use a tremendous amount of space. The suffix array was designed as an alternative to suffix trees that uses significantly less space while still supporting most of the same operations. So what exactly is a suffix array? How does it compare to a suffix tree? And how do you build them? Readings: Kasai, Toru et al. Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications. Kärkkäinen, Juha et al. Linear Work Suffix Array Construction. Chapter 8 of CLRS, for radix sort. Slides \| Condensed Slides Handouts: Handout 06: Problem Set 2	Balanced Trees, Part I April 14 Balanced search trees are among the most versatile and flexible data structures. They're used extensively in theory and in practice. What sorts of balanced trees exist? How would you design them? And what can you do with them? Readings: CLRS, Chapter 13. CLRS, Chapter 18. Google Open Source Blog: "C++ containers that save time and memory." Slides \| Condensed Slides
Aho-Corasick Automata April 5 Suppose you have a fixed set of strings and you'd like to search for all copies of those strings in a larger text corpus. How could you do so efficiently? By using a combination of tries and techniques from finite automata, it's possible to solve this problem in linear time. Readings: Aho, Alfred V. and Corasick, Margaret J. Efficient String Matching: An Aid to Bibliographic Search Slides \| Condensed Slides	Suffix Trees April 7 The suffix tree is probably the most powerful and versatile data structure for string processing that's ever been invented. What is a suffix tree? Why are they so useful? And what techniques can be used on them? Readings: Slides \| Condensed Slides
Range Minimum Queries, Part One March 29 The range minimum query problem is the following: given an array, preprocess it so that you can efficiently determine the smallest value in a variety of subranges. RMQ has tons of applications throughout computer science and is an excellent proving ground for a number of advanced algorithmic techniques. Readings: Handout 00: Course Information Handout 01: CS166 Syllabus Slides \| Condensed Slides	Range Minimum Queries, Part Two March 31 Our last lecture took us very, very close to a ⟨O(n), O(1)⟩-time solution to RMQ. Using a new data structure called a Cartesian tree in conjunction with a technique called the Method of Four Russians, we can adapt our approach to end up with a linear-preprocessing-time, constant-query-time solution to RMQ. In doing so, we'll see a number of clever techniques that will appear time and time again in data structure design. Readings: Fischer, Johannes and Heun, Volker. Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE Slides \| Condensed Slides Handouts: Handout 02: Math Terms and Identities Handout 03: Problem Set Policies Handout 04: CS166 and the Honor Code Handout 05: Problem Set One

Tuesday	Thursday
Where to Go from Here May 31 We've covered quite a lot of topics this quarter and it's time to ride off into the sunset. What's next in theoryland? What lessons have we learned this quarter? And what else do you want to know? This lecture hopefully addresses answers to those questions. Readings: Slides
Tuesday	Thursday
Disjoint-Set Forests May 24 In CS161, you learned how to use DFS or BFS to determine connectivity in a graph. But what happens if you want to intermix connectivity queries with modifications to the underlying graph? This problem, the dynamic connectivity problem, is significantly more challenging to solve efficiently but gives rise to some beautiful data structures. One such data structure, the disjoint-set forest, solves the incremental connectivity problem in a time bound so impressive that it's almost constant. This lecture explores how disjoint-set forests work and a beautiful proof of the time bound that introduces a whole new class of rapidly- and slowly-growing functions. Readings: CLRS, Chapter 21 Seidel, Raimund and Sharir, Micha. Top-Down Analysis of Path Compression Slides \| Condensed Slides	Dynamic Graphs May 26 In real-world networks, links come online and offline all the time. Euler-tour trees can maintain connectivity in such networks as long as those networks are forests, and dynamic graphs can handle arbitrary network topologies. Readings: Tarjan, Robert. Dynamic trees as search trees via Euler tours, applied to the network simplex algorithm Holm, Jacob et al. Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity Slides \| Condensed Slides
van Emde Boas Trees May 17 Binary search trees assume that the keys being stored are totally ordered, but otherwise makes no assumptions about them. If you know for a fact that the keys will be integers from a fixed range, you can harness properties of numbers to exponentially speed up BST operations. Doing so requires some tricks reminiscent of those we used in RMQ and will lead to a surprisingly elegant and efficient data structure. Readings: van Emde Boas, Peter et al. Design and Implementation of Efficient Priority Queues CLRS: Chapter 20 Slides \| Condensed Slides Handouts: Handout 13: Midterm Practice Problems	x-Fast and y-Fast Tries May 19 van Emde Boas trees have excellent runtimes for each operation, yet use a huge amount of space. The y-fast trie matches the runtime bounds of van Emde Boas trees, but use only linear space. More importantly, though, y-fast tries bring together all of the topics from the course so far - blocking, balanced BSTs, amortization, tries, and randomization. My hope is that they're a fitting pre-midterm send-off! Readings: Willard, Dan. Log-Logarithmic Worst-Case Range Queries are Possible in Space Θ(N) Slides \| Condensed Slides
Linear Probing May 10 Linear probing is one of the oldest and simplest strategies for building a hash table. It's easy to implement and, for a number of reasons, is extremely fast in practice. The analysis of linear probing dates was due to Knuth, which assumed truly random hash functions. But what happens if we tighten up this restriction and say that the hash functions aren't truly random but are, instead, just k-independent for some fixed k? Then the analysis gets a lot more challenging, but also a lot more interesting. In exploring how the math works out, we'll get a richer understanding for how to analyze randomized data structures. Readings: CLRS, Chapter 11.4 (summary of linear probing) Knuth, Donald. Notes on "Open" Addressing. Thorup, Mikkel. Lecture Notes on Linear Probing with 5-Independent Hashing. Patrascu, Mihai and Thorup, Mikkel. On the k-Independence Required by Linear Probing and Minwise Independence. Slides \| Condensed Slides	Cuckoo Hashing May 12 Most hash tables give expected O(1) lookups. Can we make hash tables with no collisions at all, and if so, can we do it efficiently? Amazingly, the answer is yes. There are many schemes for achieving this, one of which, Cuckoo hashing, is surprisingly simple to implement. Readings: Pagh, Rasmus and Rodler, Flemming. Cuckoo Hashing Devroye, Luc and Morin, Pat. Cuckoo Hashing: Further Analysis Slides \| Condensed Slides Handouts: Handout 11: Problem Set 5
Splay Trees May 3 Balanced binary search trees give worst-case O(log n) times on each tree operation. If we're trying to guarantee worst-case efficiency, this is as good as it gets. But what if we're trying to guarantee average-case efficiency? In that case, it's possible to improve upon the bounds given by balanced BSTs. In this lecture, we'll explore statically optimal BSTs and an amazing data structure called the splay tree that very well may be the best possible binary search tree. Readings: Sleator, Daniel and Tarjan, Robert. Self-Adjusting Binary Search Trees. Slides \| Condensed Slides Handouts: Handout 10: Problem Set 4	Count-Min and Count Sketches May 5 How can Google keep track of frequent search queries without storing all the queries it gets in memory? How can you estimate frequently- occurring tweets without storing every tweet in RAM? As long as you're willing to trade off accuracy for space, you get get excellent approximations. Readings: Charikar et al. Finding Frequent Items in Data Streams. Cormode, Graham and Muthukrishnan, C. An Improved Data Stream Summary: The Count-Min Sketch and its Applications. Slides \| Condensed Slides
Binomial Heaps April 26 Binomial heaps are a simple and flexible priority queue structure that supports efficient melding of priority queues. The intuition behind binomial heaps is particularly elegant, and they'll serve as a building block toward the more complex Fibonacci heap data structure that we'll talk about on Thursday. Readings: Vuillemin, Jean. A Data Structure for Manipulating Priority Queues Slides \| Condensed Slides	Fibonacci Heaps April 28 Fibonacci heaps are a type of priority queue that efficiently supports decrease-key, an operation used as a subroutine in many graph algorithms (Dijkstra's algorithm, Prim's algorithm, the Stoer-Wagner min cut algorithm, etc.) They're formed by a clever transformation on a lazy binomial heap. Although Fibonacci heaps have a reputation for being ferociously complicated, they're a lot less scary than they might seem! Readings: CLRS: Chapter 19 Fredman, Michael and Tarjan, Robert. Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms Slides \| Condensed Slides
Balanced Trees, Part II April 19 We've spent a lot of time trying to figure out how to build nice balanced trees. Now that we've got them, what else can we do with them? In this lecture, we'll see two more advanced techniques - tree augmentation and the split/join operations - that will make it possible to build significantly more complex data structures in the future. Readings: CLRS, Chapter 14. Slides \| Condensed Slides Handouts: Handout 07: Problem Set 3 Pugh, William. "Skip Lists: A Probabilistic Alternative to Balanced Trees" (for Q4)	Amortized Analysis April 21 We've spent a lot of time trying to figure out how to build nice balanced trees. Now that we've got them, what else can we do with them? In this lecture, we'll see two more advanced techniques - tree augmentation and the split/join operations - that will make it possible to build significantly more complex data structures in the future. In many cases we only care about the total time required to process a set of data. In those cases, we can design data structures that make some operations more expensive in order to lower the total cost of all aggregate operations. How do you analyze these structures? Readings: CLRS: Chapter 17 Slides \| Condensed Slides
Suffix Arrays April 12 Suffix trees have all sorts of magical properties, but they use a tremendous amount of space. The suffix array was designed as an alternative to suffix trees that uses significantly less space while still supporting most of the same operations. So what exactly is a suffix array? How does it compare to a suffix tree? And how do you build them? Readings: Kasai, Toru et al. Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications. Kärkkäinen, Juha et al. Linear Work Suffix Array Construction. Chapter 8 of CLRS, for radix sort. Slides \| Condensed Slides Handouts: Handout 06: Problem Set 2	Balanced Trees, Part I April 14 Balanced search trees are among the most versatile and flexible data structures. They're used extensively in theory and in practice. What sorts of balanced trees exist? How would you design them? And what can you do with them? Readings: CLRS, Chapter 13. CLRS, Chapter 18. Google Open Source Blog: "C++ containers that save time and memory." Slides \| Condensed Slides
Aho-Corasick Automata April 5 Suppose you have a fixed set of strings and you'd like to search for all copies of those strings in a larger text corpus. How could you do so efficiently? By using a combination of tries and techniques from finite automata, it's possible to solve this problem in linear time. Readings: Aho, Alfred V. and Corasick, Margaret J. Efficient String Matching: An Aid to Bibliographic Search Slides \| Condensed Slides	Suffix Trees April 7 The suffix tree is probably the most powerful and versatile data structure for string processing that's ever been invented. What is a suffix tree? Why are they so useful? And what techniques can be used on them? Readings: Slides \| Condensed Slides
Range Minimum Queries, Part One March 29 The range minimum query problem is the following: given an array, preprocess it so that you can efficiently determine the smallest value in a variety of subranges. RMQ has tons of applications throughout computer science and is an excellent proving ground for a number of advanced algorithmic techniques. Readings: Handout 00: Course Information Handout 01: CS166 Syllabus Slides \| Condensed Slides	Range Minimum Queries, Part Two March 31 Our last lecture took us very, very close to a ⟨O(n), O(1)⟩-time solution to RMQ. Using a new data structure called a Cartesian tree in conjunction with a technique called the Method of Four Russians, we can adapt our approach to end up with a linear-preprocessing-time, constant-query-time solution to RMQ. In doing so, we'll see a number of clever techniques that will appear time and time again in data structure design. Readings: Fischer, Johannes and Heun, Volker. Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE Slides \| Condensed Slides Handouts: Handout 02: Math Terms and Identities Handout 03: Problem Set Policies Handout 04: CS166 and the Honor Code Handout 05: Problem Set One