Lecture 27 Zoom Q&A

Q: can we put our personal projects on github?

A1: So please no assignment solutions on public sites, but I think personal projects should be fine, thanks for asking

Q: what would you say is the current frontier of computer science?

A1: a “frontier” is actually a fractal coastline, the closer you look, the more detail you see!

Q: do skip lists have to be sorted then?

A1: live answered

Q: Does that mean that |L1| should be about sqrt(|L2|)?

A1: live answered

A2: wait for it….

Q: can we add more "express" tracks with even sparser stops, so it has a sort of cascading effect? Or does this just add complexity?

A1: wait for it…!

Q: How can we differentiate a function that has support over integers? Shouldn't it be discoutinous at different values of L1?

Q: Wait did we come up with k = log n by taking the derivative of k * n^(1/k)?

A1: Yes

Q: if it acts like a binary tree, then why just use binary tree

A1: live answered

Q: If our input stream was pre-sorted, would it be most efficient to bring up all even-indexed elements to the higher list for every list?

A1: live answered

Q: So is this kind of like a linked grid, but not fully filled in?

A1: live answered

Q: to clarify then, this is an entire linked list? no use of any other structures?

A1: yup, just a collection of a couple linked lists

A2: Yes, but each node has 4 links (pointers)

Q: could you also think of this as a graph?

A1: It shares some properties with graphs but is much more constrained (only 4 neighbors, layers of neighbors, etc)

Q: Why don’t we use skip lists in practice, they seem like they would be super useful?

A1: They are often used by libraries to provide high-performance sorted data struture. I think Chris’s point was more than you as a programmer will not likely implement one yourself (similar to how you would be unlikely to implement an AVL or red-black tree, but would use one provided by library)

Q: Should we always use two hash functions?

A1: You can tune the number of hash functions + number of bits to achieve different false positive/negative rate — stay tuned

Q: ^ why do we have to k values for each number?

A1: K is the number of hash functions — stay tuned for more info on how to choose the value of k

Q: can you calculate the probablility that a number is present, or are we stuck with just "probably"?

A1: Yes, you can compute the probability of false positive/negative based on k, number of bits

Q: could we keep track of how many times we put a 1 into 5? that way if we only put a 1 in 5 once we could not have auto?

A1: That would refine case where k1 == k2, but won’t broadly eliminate other false positives

Q: What are the benefits of Bloom Filters compared to Hashsets?

A1: A bloom filter doesn’t store the element values at all — only N bits, super-compact but operation support is only contains/not (probabilistically)

Q: Is this similiar to fake news detection?

A1: live answered

Q: Couldn't you store ints instead of bits? So you decrement when you delete?

A1: Yes, but that eats into the cost savings you had in mind (i.e. 1 bit per entry -> 32 bits)

Q: How many hash functions would be used in an application like chrome? Like a ballpark

A1: live answered

Q: what's a typical number of hash functions for Bloom filters?

A1: live answered

Q: Are bloom filters very space efficient?

A1: super space efficient!

Q: Is the application for 107E over already?

A1: live answered

A2: Nope, we haven’t even opened it, stay tuned!

Q: wait where do we look for the 107e application? Does it just go through Axess, or is it separate?

A1: It is separate