Announcements

Final Project Topics
May 12, 2016

We've gone and run our matchmaking algorithm and have finished assigning final project topics. The list of teams and project topics is now available online. Best of luck with your projects - we're looking forward to seeing what you come up with!

Final Project Logistics
April 28, 2016

A number of you have started asking us questions about the final project for CS166, so we've just released a handout detailing final project requirements. You'll be required to submit a list of topics that you'd be interested in exploring for your final project by Tuesday, May 10, at which point we'll run a matchmaking algorithm to assign final topics.

We've put together a list of suggested topics for final projects with around sixty or so suggestions. You're welcome to propose a final project on any topic you'd like, so feel free to take something off of this list if you're interested. If you'd like to propose something else, that's perfectly fine - some of the best projects from last quarter were on topics we hadn't heard of before!

Problem Set One Out
March 31, 2016

Problem Set One goes out today. It's due on Thursday of next week right before the start of class. Please submit your written answers through GradeScope and the programming problems using our submitter script; details are in the Problem Set Policies handout.

Good luck!

Welcome to CS166!
March 28, 2016

Welcome to CS166, a course in the design, analysis, and implementation of data structures. We've got an exciting quarter ahead of us - the data structures we'll investigate are some of the most beautiful constructs I've ever come across - and I hope you're able to join us.

CS166 has two prerequisites - CS107 and CS161. From CS107, we'll assume that you're comfortable working from the command-line; designing, testing, and debugging nontrivial programs; manipulating pointers and arrays; using bitwise operators; and reasoning about the memory hierarchy. From CS161, we'll assume you're comfortable designing and analyzing nontrivial algorithms; using O, o, Θ, Ω, and ω notation; solving recurrences; working through standard graph and sequence algorithms; and structuring proofs of correctness.

We'll update this site with more information as we get closer to the start of the quarter. In the meantime, feel free to email me at htiek@cs.stanford.edu if you have any questions about the class!

 

Schedule and Readings

This syllabus is still under construction and is subject to change as we fine-tune the course. Stay tuned for more information and updates!

Tuesday Thursday
Where to Go from Here
May 31

We've covered quite a lot of topics this quarter and it's time to ride off into the sunset. What's next in theoryland? What lessons have we learned this quarter? And what else do you want to know? This lecture hopefully addresses answers to those questions.

Readings:

Tuesday Thursday
Disjoint-Set Forests
May 24

In CS161, you learned how to use DFS or BFS to determine connectivity in a graph. But what happens if you want to intermix connectivity queries with modifications to the underlying graph? This problem, the dynamic connectivity problem, is significantly more challenging to solve efficiently but gives rise to some beautiful data structures. One such data structure, the disjoint-set forest, solves the incremental connectivity problem in a time bound so impressive that it's almost constant. This lecture explores how disjoint-set forests work and a beautiful proof of the time bound that introduces a whole new class of rapidly- and slowly-growing functions.

Readings:

Dynamic Graphs
May 26

In real-world networks, links come online and offline all the time. Euler-tour trees can maintain connectivity in such networks as long as those networks are forests, and dynamic graphs can handle arbitrary network topologies.

Readings:

van Emde Boas Trees
May 17

Binary search trees assume that the keys being stored are totally ordered, but otherwise makes no assumptions about them. If you know for a fact that the keys will be integers from a fixed range, you can harness properties of numbers to exponentially speed up BST operations. Doing so requires some tricks reminiscent of those we used in RMQ and will lead to a surprisingly elegant and efficient data structure.

Readings:

Handouts:

x-Fast and y-Fast Tries
May 19

van Emde Boas trees have excellent runtimes for each operation, yet use a huge amount of space. The y-fast trie matches the runtime bounds of van Emde Boas trees, but use only linear space. More importantly, though, y-fast tries bring together all of the topics from the course so far - blocking, balanced BSTs, amortization, tries, and randomization. My hope is that they're a fitting pre-midterm send-off!

Readings:

Linear Probing
May 10

Linear probing is one of the oldest and simplest strategies for building a hash table. It's easy to implement and, for a number of reasons, is extremely fast in practice. The analysis of linear probing dates was due to Knuth, which assumed truly random hash functions. But what happens if we tighten up this restriction and say that the hash functions aren't truly random but are, instead, just k-independent for some fixed k? Then the analysis gets a lot more challenging, but also a lot more interesting. In exploring how the math works out, we'll get a richer understanding for how to analyze randomized data structures.

Readings:

Cuckoo Hashing
May 12

Most hash tables give expected O(1) lookups. Can we make hash tables with no collisions at all, and if so, can we do it efficiently? Amazingly, the answer is yes. There are many schemes for achieving this, one of which, Cuckoo hashing, is surprisingly simple to implement.

Readings:

Handouts:

Splay Trees
May 3

Balanced binary search trees give worst-case O(log n) times on each tree operation. If we're trying to guarantee worst-case efficiency, this is as good as it gets. But what if we're trying to guarantee average-case efficiency? In that case, it's possible to improve upon the bounds given by balanced BSTs. In this lecture, we'll explore statically optimal BSTs and an amazing data structure called the splay tree that very well may be the best possible binary search tree.

Readings:

Handouts:

Count-Min and Count Sketches
May 5

How can Google keep track of frequent search queries without storing all the queries it gets in memory? How can you estimate frequently- occurring tweets without storing every tweet in RAM? As long as you're willing to trade off accuracy for space, you get get excellent approximations.

Readings:

Binomial Heaps
April 26

Binomial heaps are a simple and flexible priority queue structure that supports efficient melding of priority queues. The intuition behind binomial heaps is particularly elegant, and they'll serve as a building block toward the more complex Fibonacci heap data structure that we'll talk about on Thursday.

Readings:

Fibonacci Heaps
April 28

Fibonacci heaps are a type of priority queue that efficiently supports decrease-key, an operation used as a subroutine in many graph algorithms (Dijkstra's algorithm, Prim's algorithm, the Stoer-Wagner min cut algorithm, etc.) They're formed by a clever transformation on a lazy binomial heap. Although Fibonacci heaps have a reputation for being ferociously complicated, they're a lot less scary than they might seem!

Readings:

Balanced Trees, Part II
April 19

We've spent a lot of time trying to figure out how to build nice balanced trees. Now that we've got them, what else can we do with them? In this lecture, we'll see two more advanced techniques - tree augmentation and the split/join operations - that will make it possible to build significantly more complex data structures in the future.

Readings:

Handouts:

Amortized Analysis
April 21

We've spent a lot of time trying to figure out how to build nice balanced trees. Now that we've got them, what else can we do with them? In this lecture, we'll see two more advanced techniques - tree augmentation and the split/join operations - that will make it possible to build significantly more complex data structures in the future.

In many cases we only care about the total time required to process a set of data. In those cases, we can design data structures that make some operations more expensive in order to lower the total cost of all aggregate operations. How do you analyze these structures?

Readings:

Suffix Arrays
April 12

Suffix trees have all sorts of magical properties, but they use a tremendous amount of space. The suffix array was designed as an alternative to suffix trees that uses significantly less space while still supporting most of the same operations. So what exactly is a suffix array? How does it compare to a suffix tree? And how do you build them?

Readings:

Handouts:

Balanced Trees, Part I
April 14

Balanced search trees are among the most versatile and flexible data structures. They're used extensively in theory and in practice. What sorts of balanced trees exist? How would you design them? And what can you do with them?

Readings:

Aho-Corasick Automata
April 5

Suppose you have a fixed set of strings and you'd like to search for all copies of those strings in a larger text corpus. How could you do so efficiently? By using a combination of tries and techniques from finite automata, it's possible to solve this problem in linear time.

Readings:

Suffix Trees
April 7

The suffix tree is probably the most powerful and versatile data structure for string processing that's ever been invented. What is a suffix tree? Why are they so useful? And what techniques can be used on them?

Readings:

Range Minimum Queries, Part One
March 29

The range minimum query problem is the following: given an array, preprocess it so that you can efficiently determine the smallest value in a variety of subranges. RMQ has tons of applications throughout computer science and is an excellent proving ground for a number of advanced algorithmic techniques.

Readings:

Range Minimum Queries, Part Two
March 31

Our last lecture took us very, very close to a ⟨O(n), O(1)⟩-time solution to RMQ. Using a new data structure called a Cartesian tree in conjunction with a technique called the Method of Four Russians, we can adapt our approach to end up with a linear-preprocessing-time, constant-query-time solution to RMQ. In doing so, we'll see a number of clever techniques that will appear time and time again in data structure design.

Readings:

Handouts: