9. Fast Fourier Transform
Thu Nov 12
Our goal for this week will be to take the code for a naive Fourier transform and team together to optimize through recursive divide-and-conquer, taking key insight from the Cooley-Tukey Fast Fourier Transform algorithm. Here is my derivation of the math behind the very clever Cooley-Tukey decomposition, this math forms the basis for how to rework the code.
I'd like to ask each of you to put in a little prep work to come with a basic understanding of the Fourier transform. I've been hunting down introductory resources and can highly recommend these two:
- A superb visual explanation from 3Blue1Brown https://www.youtube.com/watch?v=spUNpyF58BY
- Grant's ability to communicate deep concepts with illuminating visualization is unsurpassed. This video is well worth 20 minutes of your time!
- A fun interactive guide from Better Explained https://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/
- Helpful use of analogy, motivating context, and great use of examples and interactive tools to build understanding through intuition and exploration. I had great fun playing the visualization of time <-> frequency. Try it out!
Fourier transform visualized through periodic motion is a popular approach in Youtube videos. I am not sure these work as an introduction all on their own, but after getting the gist of things from the sites above, these animations can reinforce your understanding and are mesmerizing to watch:
- Harmonic circles https://www.youtube.com/watch?v=LznjC4Lo7lE
- Square wave https://www.youtube.com/watch?v=k8FXF1KjzY0
- Smarter Every Day https://www.youtube.com/watch?v=ds0cmAV-Yek
- The famous Fourier of Homer Simpson https://youtu.be/QVuU2YCwHjw
I found many good analytic resources, but most dive straight into the math and rarely surface for air. If you're intrigued to learn more, exploring these heftier resources could be a project for winter break.
- FFTW Fastest Fourier Transform in the West http://www.fftw.org/
- Hyper-optimized, full-feature C library implementation developed by Frigo/Johnson at MIT
- Julius Smith III, "Mathematics of the Discrete Fourier Transform" https://ccrma.stanford.edu/~jos/log/
- The definitive reference on the DFT written by a professor at Stanford CRMA (and former colleague of mine at NeXT!)
- Lovely Digital Signal Processing primer from Jack Schaedler https://jackschaedler.github.io/circles-sines-signals/
- Brad Osgood teaches an entire Stanford course EE 261 on the Fourier transform and applications .
- Norwood Russell Hanson, "The Mathematical Power of Epicyclic Astronomy", 1960. A volume of sheer insight packed into 10 pages, wow!
- Visual Microphone https://www.youtube.com/watch?v=FKXOucXB4a8
- Research using Fourier analysis to reconstruct sound from video, amazing (and a little creepy)
- Abe Davis TED Talk https://www.youtube.com/watch?v=npNYP2vzaPo
8. Data compression
Thu Nov 05
For today's meeting, I'm planning to give a quick introduction to the Lempel-Ziv-Welch compression algorithm and then quickly move us into breakouts so we can get coding. If we hunker down and apply the power of teamwork, we may be able to pull off a complete LZW encoder-decoder, which would be a cool accomplishment! We can then compare/contrast Huffman to LZW in terms of effectiveness, performance, ease of implementation, and so on.
- Wikipedia https://en.wikipedia.org/wiki/LZW
- Note there are a number of different variants in the family. Abraham Lempel and Jakob Ziv published the original
LZ78in 1977 and 1978. We are going to look specifically at
LZW, an improved variant from Welch in 1984.
- Note there are a number of different variants in the family. Abraham Lempel and Jakob Ziv published the original
- Eric Roberts taught a very popular Sophomore College seminar "The Intellectual Excitement of Computer Science". Here are his resources on data compression and LZW https://cs.stanford.edu/people/eroberts/courses/soco/projects/data-compression/lossless/lz78/index.htm
- Algorithm walkthrough with examples http://michael.dipperstein.com/lzw/
- Note that decoder runs behind by one step on building the dictionary, making possible for decode to encounter "surprise" (needs special case handling)
- I remember reading Mark Nelson's LZW article in Dr Dobbs in 1989 (I'm old!) https://marknelson.us/posts/1989/10/01/lzw-data-compression.html
- Mark's page links to a more recent update with code for implementation in C++ https://marknelson.us/posts/2011/11/08/lzw-revisited.htm
- (Let's plan to not refer to this when writing our own, save for later review/reference)
- Duke University CS Prof Owen Astrachan is a giant in the field (and a long-time friend!). Owen is always up to something cool. Check out this Duke project to develop an assignment where students explore different compression algorithms. https://www2.cs.duke.edu/csed/curious/compression/ Should we do something like this in our CS106B?
- A video by our very own Niveditha on the Shannon limit https://www.youtube.com/watch?v=r7GK4pMODmI
- Claude Shannon, "A Mathematical Theory of Communication", 1948
- His 55-page masters thesis is the seminal work on information theory. (Scientific American dubbed it the "Magna Carta of the Information Age")
- "The Joy of Data", Mathematician Hannah Fry explains information theory to a lay audience in this documentary film.
- The Burroughs-Wheeler transform is another cool and accessible algorithm for data compression – check it out! https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform
7. Simulation and Visualization
Thu Oct 29
We'll explore how to generate results via simulation and consider how data visualization can aid our understanding of a simulated process. The simulation algorithms we will look at belong to the family of "Monte Carlo" methods that use repeated random trials or statistical sampling. We will use the GWindow class of Stanford's CS106B library for simple drawing
- Wikipedia https://en.wikipedia.org/wiki/Monte_Carlo_method
- A classic example to estimate pi by throwing darts
- Concise writeup with visualization video https://felixdmr.com/2020/09/20/pi-from-monte-carlo/
- If you distrust computational pseudo-randomness, get true randomness from nature, raindrops! https://www.youtube.com/watch?v=I-BC_vI4CAE
- Random walk on roulette wheel
- Configuration of colors on a roulette wheel http://www.fouroulette.com/roulette-wheel.htm
- Quorans respond to offer "best red-black betting system?" https://www.quora.com/What-is-the-best-way-to-win-roulette-in-red-and-black?
- Try proposed strategies in simulation – do they perform any better than random bet?
- Application of Monte Carlo to fractals and stock market https://www.youtube.com/watch?v=x3b_DAEmZnU
- What is the probability that a random chord of a circle has length longer than the radius?
- Bertrand's Paradox https://en.wikipedia.org/wiki/Bertrand_paradox_(probability)
- Arguments can be made for
3/4, and "confirmed" by a Monte Carlo simulation. How is that possible? (afterwards, argument for another option,
- How can visualization help illuminate what is different about the random sampling?
- Extend idea to sequence of events to get Markov Chain Monte Carlo https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo
- Generating random text https://blog.demofox.org/2019/05/11/markov-chain-text-generation/
- Image synthesis http://graphics.cs.cmu.edu/people/efros/research/EfrosLeung.html
Thu Oct 22
We'll get a head start on CS106B by doing our own exploration of sorting in advance of Friday's lecture. The competition to be "the" standard sort for language/library is stiff. What does it take to be "The One"? Let's take a deep dive into Quicksort, one of the most celebrated of all sorting algorithms.
- Wikipedia https://en.wikipedia.org/wiki/Quicksort
- Robert Sedgewick's dissertation on Quicksort
- Bob's was the first CS Phd thesis I ever read cover to cover. You couldn't check them out of library, and I remember sitting on the library floor until my legs fell asleep. It really was that riveting!
- Jon Bentley & Doug McIlroy, "Engineering a Sort Function" , Software Practice and Experience, Nov 1993
- This highly-engineered sort function was the gold standard of sorting for nearly 20 years.
- I think everything about this paper is awesome – packed with algorithmic know-how, beautiful math, performance tuning, testing strategies and solid engineering practice.
- In 2009, the "Dual Pivot Quicksort" displaced the previous king and become the standard sort in JDK >= v8
- Video with algorithm animation and helpful voiceover https://www.youtube.com/watch?v=yRJ1Zb4pCM4
- Java code https://algs4.cs.princeton.edu/23quicksort/QuickDualPivot.java.html
- Doug McIlroy, "A Killer Adversary for QuickSort", Software Practice and Experience, Dec 1999
- Not so practical, no, but a devilish sort of fun – any quicksort can be "broken"
- Timsort https://en.wikipedia.org/wiki/Timsort
- Standard sort for python (merge/insertion hybrid, high-performance, stable)
- The post from Tim Peters himself with motivation for timsort https://mail.python.org/pipermail/python-dev/2002-July/026837.html
- Algorithm explained https://hackernoon.com/timsort-the-fastest-sorting-algorithm-youve-never-heard-of-36b28417f399
- Sorting diversions
- Bubble sort won't win any prizes, but does have entertainment value
- Owen Astrachan, "Bubble Sort: An Archaeological Algorithm Analysis" , SIGCSE 2003
- Barack Obama cracks the coding interview! https://www.youtube.com/watch?v=k4RRi_ntQc8
- (and Larry Schwimmer, who asked question, was one of my undergrad advisees!)
5. Super-charging recursion
Thu Oct 15
Recursion is a powerful and elegant problem-solving technique, but the exhaustive nature of certain recursive algorithms can come with a high cost in runtime performance. Let's look at techniques for paring down those costs by trading space for time and accepting approximation in place of exactness.
- Memoization is a simple process to save the computed result of an expensive operation to avoid re-computing it in future. It belongs to the broader category of caching; a technique used in hardware, operating systems, networks, browsers, filesystems, database lookups, and more.
- Dynamic programming
- If the problem to solve is constructed of overlapping subproblems, dynamic programming may offer phenomenal improvement. e.g. bringing an exponential-time recursive algorithm down to polynomial
- Optimal solution to 0/1 knapsack is a classic example for DP https://medium.com/@fabianterh/how-to-solve-the-knapsack-problem-with-dynamic-programming-eb88c706d3cf
- What is pseudo-polynomial time?
- A DP algorithm we will explore runs in pseudo-polynomial time, a new category in our Big O ranking.
- Good explanation on StackOverflow https://stackoverflow.com/questions/19647658/what-is-pseudopolynomial-time-how-does-it-differ-from-polynomial-time
- I have it on good authority that the author of this post is none other than the fabulous Keith Schwarz!
- Greedy approximation/heuristic
- If able to relax goal of finding globally optimal solution, a greedy approach that make the locally optimal choice may work well as fast heuristic.
- Wikipedia on greedy algorithms https://en.wikipedia.org/wiki/Greedy_algorithm
- Several greedy approximation for the knapsack problem http://www.brainkart.com/article/Approximation-Algorithms-for-the-Knapsack-Problem_8066/
- P != NP A proof will earn fortune and fame beyond your wildest dreams.
- Millenium Prize https://www.claymath.org/millennium-problems/p-vs-np-problem
- Understanding Minesweeper can make you a millionaire https://www.claymath.org/sites/default/files/minesweeper.pdf
Thu Oct 08
Students who followed up with me after last week were clamoring for "more code!", so I plan to move in direction of more hands-on activities where we can. For this week, we'll be playing with code that riffs on the classic Sierpinski fractal and exposes intriguing and surprising connections from fractals to other parts of mathematics, including chaos and dynamic systems, iterated function systems, cellular automata, number theory, binary/ternary number representation, space-filling curves, and more. It is all quite magical!
Here are some wonderful background resources. If you have some time to check into these in advance, please do!
- Grant Sanderson of 3Blue1Brown produced two delightful videos that tease out the connections between counting in binary/ternary, the Towers of Hanoi and the Sierpinski curve. The fabulous Keith Schwarz makes a guest appearance, too.
- Michael Barnsley's Chaos game, related to iterated function systems
- What a surprising result!
- Ben Sparks of Numberphile has a great video explanation: https://www.youtube.com/watch?v=kbKtFN71Lfs
- Great site on IFS http://larryriddle.agnesscott.org/ifs/ifs.htm
- Elementary cellular automata
- Good introduction at Wolfram Mathworld https://mathworld.wolfram.com/ElementaryCellularAutomaton.html
- Longer treatment in Nature of Code book https://natureofcode.com/book/chapter-7-cellular-automata/
- Coloring Pascal's triangle by divisibility
- http://larryriddle.agnesscott.org/ifs/siertri/Pascal.htm, includes proof of Lucas' theorem, binomial coefficients mod p, neat number theoretic result, relationship to binary expansion – wow
- Lindermayer string-rewriting systems, Sierpinski curve
- Wikipedia on L-systems https://en.wikipedia.org/wiki/L-system
- Why Sierpinski is the fractal of Halloween https://blogs.scientificamerican.com/roots-of-unity/a-few-of-my-favorite-spaces-the-sierpinski-triangle/
- The mother of all Sierpinski resources: 270 pages of non-stop action that explores every nook and cranny relating to Sierpinski
- Just for fun, a fractal poem http://larryriddle.agnesscott.org/ifs/poem.htm
I'll have some sample code prepped that will get us started and you'll have a chance to work more closely on the code in small groups in breakouts. If you are able to, please update your Zoom to version >= 5.3. ("Check for Updates" in Zoom menu). The 5.3 version adds a few convenience features, one of which is the ability to self-select a breakout room.
3. Search engines, Google's Page Rank algorithm
Thu Oct 01
Here are some resources to check out in advance of this Thursday's meeting. I'm not asking you to read all of these, but please do try to get the gist of the basic ideas underpinning Page Rank (Wikipedia page is a good start). If you're eager to go deeper, the original epic Brin/Page papers are quite accessible. To be an informed citizen and consumer of search it behooves us to know some of the ways those results can be distorted, I picked a few press articles to give a taste.
- Tim Bray (inventor of XML) has a nice series of blog posts on search engine architecture https://www.tbray.org/ongoing/When/200x/2003/07/30/OnSearchTOC
- Sergey Brin and Larry Page's landmark paper introducing Google at the 1998 WWW7 Conference in 1998. "The Anatomy of a Large-Scale Hypertextual Web Search Engine" http://infolab.stanford.edu/pub/papers/google.pdf
- The Wikipedia page on PageRank is good overview (although gets a little math-y) https://en.wikipedia.org/wiki/PageRank
- The paper that started it all: Larry Page's "The PageRank Citation Ranking: Bringing Order to the Web" http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf
- Some short articles from popular press on ethical issues raised by search, result filtering/ranking:
- "Google is finally admitting it has a filter-bubble problem" https://qz.com/1194566/google-is-finally-admitting-it-has-a-filter-bubble-problem/
- "The Trouble With the Echo Chamber Online" https://www.nytimes.com/2011/05/29/technology/29stream.html
- "The Dirty Little Secrets of Search" https://www.nytimes.com/2011/02/13/business/13search.html
- Latest update on antitrust case against Google https://www.nytimes.com/2020/09/22/technology/justice-dept-case-google-search-dominance.html
- Wikipedia on Google bombing https://en.wikipedia.org/wiki/Google_bombing
If you know of other good resources, please share on Ed!
2. Binary search
Thu Sep 24
- We will be using Ed Workspaces for collaborative coding; be sure you are signed up for 106M Ed forum.
- Familiarize yourself with basics of Stanford Vector class
- Review the CS106B Guide to Testing
For this week's meeting, I'm asking you to not read up on binary search in advance. Just come with a "clean mind" and we'll explore together. I will post reference links after the meeting for follow up.
Here are references on binary search. Some of these are excerpts from books available in the Stanford Library digital collection. Click the link, authenticate with your Stanford sunet, and find link to full book in section labeled "Available online".
- Jon Bentley, "Writing Correct Programs", Column 4 of Programming Pearls
- Tim Bray, On the Goodness of Binary Search
- Mike Taylor, Are you one of the 10% of programmers who can write a binary search?
- Albert Savoia, "Beautiful Tests", Chapter 7 of Beautiful Code
- Josh Block, Nearly All Binary Searches are Broken
We got to just the tip of the iceberg this week. Let's keep the discussion going! Below are some further exercises and questions to consider. If you get a chance to try some out, please share your results on Ed and we can all learn from each other and offer feedback.
- Refine algorithm into its cleanest, clearest form
- should low/high be inclusive or exclusive?
- how should low/high change each iteration? what invariant is being maintained?
is it necessary to ensure low/high always stay within bounds?
To ponder: if low and high are in bounds at start of loop, then mid is in bounds. Updating bound low=mid or high=mid thus guarantees low/high to stay in bounds. But including mid in the remainder to search is not quite right, and as we saw in class, when low was already mid, this does not advance and gets stuck in infinite loop. Updating bound to mid+1 and mid-1 seems correct. But in some cases this can cause low/high to go out of bounds. When does this happen? Would this cause a problem? Why or why not?
- what is correct stopping condition?
- some argue that the cleanest form is for the loop to test for match and return early, but always run loop down to range of single element. If that final element is the target, return index, otherwise, not present. Try rewriting into that form, do you agree that is it cleaner? Why or why not?
- how should duplicate elements be handled?
- as-is returns index of a match, but not necessarily first/last.
- if specification is to return index of first match, how could you change to do so?
- does this version still run in logarithmic time? if not, how could you make it so?
- what if requirement were to return the count of matching elements?
- Confirm the correctness of your algorithm
- could be proof via pre/post conditions, invariants
- and/or rigorous test suite
- what mix of whitebox/blackbox/fuzz constitutes "rigorous" coverage?
- Performance studies using
- confirm that binary search is operating in logarithmic time
- determine cost of a recursive implementation versus iterative
- compute "payback horizon"
- how many searches to hit break even where savings of binary vs linear search offsets costs of one-time sort?
1. Meet & Greet
Thu Sep 17
Nothing to prep, just bring yourself and your enthusiasm. Zoom details. We will use our first meeting to introduce ourselves and brainstorm our plans for the seminar. Look over the list of possible topics.
Everyone please join the Ed forum and weigh in with your topic interests.
Thank you all for coming today and sharing your stories! We had two dozen attend, almost all first year or transfer students. An awesome journey awaits you at Stanford, I'm happy to be a part of welcoming you!