Lecture 23 Zoom Q&A


Q: Do we have office hour today

A1: live answered

A2: yes, although I would double chekc with the OH calendar


Q: we will have lecturesf from the 16th - 20th?

A1: live answered

A2: yup


Q: What OS do polling machines run on? COBOL?

A1: live answered


Q: I was having trouble finding the code from the linked list lectures where there was coding in qt. Are they on the website?

A1: Is it the linked code on this page? https://web.stanford.edu/class/cs106b/lectures/20-lists2/

A2: The other LL code is here, not sure exactly which ones you were referring to https://web.stanford.edu/class/cs106b/lectures/19-lists1/


Q: dah, im sorry I see it now. Election brain

A1: No worries – I feel that!


Q: why is 23 not balance?

A1: live answered


Q: Is the AVL equivalent to this: there is no node that is without sibling but has only one child?

A1: live answered


Q: Will we discuss how the Stanford Library’s sets and maps make use of balanced binary search?

A1: Not in detail, no.


Q: How are these levels indexed using pointers? Just add another data spot to the node struct?

A1: Yes, each node is annotated with its balance factor


Q: Why did it have to look at the root for the double rotation?

A1: live answered


Q: is the rotation process recursive?

A1: The imbalance can be fixed with eithe a single or double rotation (the idea being that we never let things get very much out of whack before we intervene)


Q: Does this mean that in the Node struct, we have to keep track of the left and right levels?

A1: Yes, each node is annotated with its balance factor in some form


Q: What makes a node black vs red? I’m not really getting the difference between AVL and red black trees

A1: live answered

A2: Good place to learn more about red-black trees: https://www.cs.auckland.ac.nz/software/AlgAnim/red_black.html


Q: what compression algorithm does zoom use?

A1: live answered

A2: H.264?


Q: How do you determine the binary code of a letter?

A1: live answered

A2: It depends on the encoding system that you’re using. For ASCII, each letter is associated witha numerical value, which can then be converted to binary


Q: Do we assume the first digit is 0 to make them 8 bits?

A1: Yes

A2: Yup, for ASCII we can assume that


Q: Are prefix codes related to posets and Sperner’s theorem?

A1: Not sure — come to office hours and we can try to explore further!


Q: So is the frequency redetermined for every document? (Is the frequency custom to each compression, or is it by frequency in the english language)

A1: To be optimal, it is specific to the document being encoding.


Q: If a letter matched the average frequency would it be counted as a 0 or a 1

A1: We pick a splitting point that is close to themiddle, so everything either goes to the left or to the right


Q: why smaller is on the right

A1: It is arbitrary


Q: not the left


Q: did they prove that it was possible to have a more efficient algorithm than Shannon Fano before huffman existed?

A1: Not sure


Q: If we’re supposed to divide the characters close to the frequency midpoint, why are there 4 characters on the left and 6 on the right?

A1: If you sum the all the frequencies, the first 4 have about half the ottoal frequency

A2: *total


Q: In shannon-fano, all characters are at the end of the tree, right?

A1: They are all leaves, correct

A2: Yes, all the nodes containing characters are leaf nodes, good observation! This is the important property for making sure we have a prefix code scheme


Q: What's the 'value' (not priority) of the combined node? How does the insertion work?

A1: value is the aggregated occurrence count for this subtree

A2: Each node has two values – a numerical count and a character. Only leaf nodes have valid character values. The character value of internal (non-lead) nodes should be ignored.


Q: isn’t that tree unbalanced or does that not matter for huffman coding?

A1: It is relatively balanced, but an unbalanced huffman tree can actually have interesting properties that are good – I would encourage thinking about why!


Q: are these codes shorter than the previous codes for each letter?

A1: ASCII uses a fixed 8-bit code for each letter

A2: It depents on what you mean by the previous codes? But yes, we do have average encoding length < 8 which is better than ASCII here


Q: what is the big O of the huffman coding algorithm? O(n^2)?

A1: You do N enqueue/dequeue on Priority Queue, if it operates in log N (like heap), sould be NlogN, if it were sorted array, would be N^2


Q: Isn’t the 26234 node unbalanced?

A1: yes, the difference in height is greater than 1


Q: does this done iteratively or recursively?

A1: The tree building part is done iteratively


Q: In this specific case, isn’t Huffman exactly as good as Shannon?

A1: live answered


Q: How does the receiving end know what the codes are?

A1: Good obsevration! You have to send along some encoding of the tree, otherwise the person on the other end wouldn’t be able to decode (you will explore this in the assignment)


Q: is there some kind of universal frequency (like for each letter in all of english as a whole) or does it depend on how many times each character appears in the data set you are trying to compress

A1: For optimal encoding you would be using the frequencies in the document you’re trying to encode


Q: how does adding prefixes to ASCII characters compress a data stream?

A1: We’re not adding prefixes to ASCII characters – we’re totally replacing the encoding stream to use less bits per character than ASCII would use


Q: if you send it to somebody how will they know what encoding you used?

A1: Great question! You would need to also send the encoding with the file


Q: So is Huffman coding basically an alternative to the ASCII coding system?

A1: Correct


Q: So the big picture is we apply this algorithm to compress data/a file?

A1: Correct!


Q: For clarification, i meant to task are the codes for the huffman tree shorter than the codes of the shannon fano algorithim for each letter?

A1: live answered


Q: Followup: Does compression encrypt the data?

A1: No


Q: Why is going bottom up logically better than the Shannon method?

A1: live answered


Q: '@Julie is it ever inefficient to send the encoding in a file since that might take up some amount of data?

A1: Yes! It possible that the tree encoding + encoded data is lonegr than the original data

A2: Yes, it can, but typically if the file is that short to begin with, then there was not much to be gained by compressing it. The encoding tree can be transmitted in 9 bits per letter, which is pretty compact


Q: what's the best-case compression ratio with Huffman encoding?

A1: live answered


Q: When we were convering the letter J into base-2, why did we add a 0 at the beginning? Shouldn’t it only have 7 bits?

A1: We generally consider 8 bits = 1 byte so the encodings get padded to be 8 bits


Q: If we compress data before sending it somewhere, would we also need to send some sort of "decompression key" along with the compressed data so that the receiver can get the original data back?

A1: live answered

A2: Yes, great observation. You would need to send along the huffman tree ins ome format so the person on the other end can decompress the message

A3: Depending on whether the compression scheme is universal or document-specific — for huffman, it is the latter


Q: how dese the frequency of videos work, by pixels?

A1: live answered

A2: Video/image compression tends to either area based (averaged around pixels) and/or number of pixels changing from frame to frame, so less about individual color frequencies


Q: How do bits relate to a byte? What does megabyte refer to?

A1: live answered

A2: 8 bits = 1 byte 1 megabyte = 1 million bytes


Q: just curious–how did you make the encoding animations?

A1: Eric wrote this really neat lecture animation tool of his own


Q: Does this relate to Principal Component Analysis?

A1: not sure!


Q: When we were converting J into base-2, we padded encodings to 8 bits so that it would be 1 byte, but why does it matter if it’s 1 byte? In our Huffman coding system, characters could have any variable number of bits, so why dont we pad there?

A1: live answered