L13

Today: ethics: privacy, program style: readability and decomposition, bits and bytes

Ethics - Privacy

By "privacy" here we're referring to individuals vs. the government. (Thanks to Ethics Fellow Wanheng Hu for feedback.)

Encryption Technology - SMS vs. E2E

1. SMS is a traditional setup. Alice and Bob have a key, but Verizon also has the key. The message is encrypted in transit, but Verizon has a copy. (The keys may be per-hop, but the essential feature is that Verizon has a copy.)

2. E2E. Alice and Bob both have a key, and Verizon does not. Thus Verizon only sees the ciphertext. Essential point: if the government asks for the plaintext, Verizon does not have it.

Encryption Technology - Phones, Hard Drives

The files on phones are typically encrypted, only unlocked by the owner's PIN or fingerprint/face-id unlock. Likewise, an external hard drive can be encrypted with a user's password. Such encryption is effective, even in the face of law-enforcement efforts.

Ethics: Respect Some Privacy

Respect some privacy for yourself and others. Allowing people some privacy is good for society.

Short answer: tolerance. Giving people some privacy helps give them some individual freedom, even in the face of intolerance. In computer terms, privacy could be described as a "hack" which helps get a sort of tolerance.

Privacy is not a black-and-white issue. You do not want 0% or 100% privacy. This gets back to the dual-use pattern — we have both sympathetic and unsympathetic users of privacy, so we end up with a compromise of "some", but not 100%, privacy.

Privacy - Sympathetic examples

The terror group ISIS was very unfriendly to gay people. That such a person is able to keep their phone, messages encrypted away from ISIS seems good. Note also the de-facto tolerance angle. Or a dissident smuggling their memoirs out of an authoritarian regime.

If we just look at these examples, privacy looks great. But unfortunately there are just as many unsympathetic examples

Privacy - Unsympathetic examples

Criminals are highly aware of using encryption for chats, data etc. A man was an alleged pedophile and refused to unlock his encrypted hard drive. The courts kept him in jail for years, and eventually he was released. The legality of this situation is currently debated in the US. Does the 5th amendment right against self-incrimination apply to one's phone?

The Nth Room case of blackmail and cybersex trafficking on (encrypted) Telegram.

Aside - Crazy "Phone For Criminals" Story

There was a privacy-focussed phone, marketed to criminals. It turned out to be an FBI front, which used the information for convictions. For an entertaining hour, check out the Search Engine podcast episode:Best Phone For Crimes. Evidence was primarily not used against US citizens, I suspect because its collection violated the limited-government need for a warrant — perhaps an example of the system working as intended.

Compromise - Some Privacy

So we end up with a compromise where individuals have some, but not absolute privacy.

US History - limited government. Includes limitations on the government spying on citizens. Compromise: government needs a warrant, probably cause to get info.

Edward Snowden PRISM US was spying on citizens to some degree.

In contrast, in the above crime-phone story .. they did not pursue US citizens with the info. Here the limited-government rules seemed to be followed.

E2E vs. Warrant

Note that the technology of end-to-end encryption short-circuits the warrant system. Verizon does not have the data to give.

Law Enforcement Back-Door Requests

Law enforcement has lobbied for a "back door" to be added to encryption, where trusted parts of the government can, say, decrypt anyone's phone. Apple/Google argue convincingly that any such backdoor will then be used by ISIS, Russia etc. etc. The current state in the US is that there is no back door.

Current Headline: UK Asks For Backdoor

UK Asks Apple For Backdoor

This shows the E2E vs. backdoor issue is very live!

History Note - Democracy vs. Authoritarian

Democracy was increasing 1945-2000, but now Authoritarianism seems to be on the rise. I suspect this is temporary, and Democracy will again increase. But who knows, perhaps this is my own wishful thinking? This will an interesting arc of history that coincides with your adult life, see what happens.

Note that China, North Korea, Iran ... Whats App is illegal in all these countries. Authoritarian governments do not like to extend privacy to their citizens. I think citizens flourish more in democracies, and that's where I want to live.

Big Picture - The Truth About Software

We'll start a the highest level, seeing the truisms that guide software building. There's software in everything, so you should know the lay of the land.

Goal #1 - Code That Computes the Correct Answer

The main thing we want from code. If code produces the wrong answer, do we really care how fast it runs?

Problem: Natural Sate of Code = Broken

"Broken" is the natural state of code. It's easy to type in some code, and have it not work. We need a plan to work in this environment. Code can work so nicely, we should keep in mind that even more easily it can fail to work.

Can You Judge Code By Looking at It?

Can you judge code correctness by looking at it? The surprising answer is - no. To really judge, you need to simulate what the loops and if-statements will with various inputs. In effect, you need to run the code to see what it does.

How To Judge - Run Tests

We need to run the code against a few inputs, checking the output for each case. If the code works against a few cases, suggests it is probably correct. It is not a 100% proof, which is surprisingly difficult or impossible to obtain, but tests are very good in practice.

Corollary: Code Not Run is Probably Buggy

Code that the computer has never run over likely has bugs in it.

This can happen if an if-test is always false in a program. This happened with the AT&T phone network, where there was some code in the phone-switching system like this.

if rare_error_condition:
    code to
    route around     # un-noticed bug here
    error condition

The error handling code within the if_statement had a simple bug in it, but those lines had never run, so nobody noticed. Until one day the if-statement was true and the code ran (for the first time) and crashed, taking out a part of the US phone system for a while.

Code tests can help with this. There are modern "code coverage" tools that look at all the tests, making sure that every line has been run in some test or other.

Goal #2 - Clean Code

Clean code with good style. This helps reduce bugs in the first place, and it's easier to fix and add features to code that is already clean. Stanford has always put an emphasis on writing clean code with good style.

Goal #3 - Run Fast

If the code is works correctly and looks good, we might also want to tune it to run fast or use less memory. For some bits of code, speed is crucial. However, the best strategy is generally getting the code working first before messing with it for maximum performance.

Now we'll look through rules for writing clean code, from the very simple, to the zoomed-out architecture.

Style Chapter 1 - Code Tactics PEP8

This has to do with the simplest issues of spaces and words.

Python Guide: PEP8 Tactics (mostly did this one on an earlier lecture)

Style Chapter 2 - Code Readability

We prefer code that is "readable" - looking at the code, what it does is apparent. Readable code has fewer bugs, and bugs (and the time they chew up) is a big problem in finishing code.

Python Guide: Readable Code - key points copied to these notes.

Readable-1 - Good Function Names

Good function names are the first step in readable code. Function names often use verbs indicating what calling the function will accomplish. Look at how the function names below make the surrounding code read nicely.

delete_files(files)


if is_url_sketchy(url):
    display_alert('That url looks sketchy!')
else:
    html = download_url(url)


s = remove_digits(s)


count = count_duplicates(coordinates)


canvas.draw_line(0, 0, 10, 10)

Boolean Functions: is_xxx() has_xxx()

If a function returns a boolean value, starting its name with is_ or has_ can be a good choice. Think about how the function call will read when used in an if or while:

if is_weak(password):
    ...

Function Name - Principle Of Least Surprise

is_url_sketchy(url)  # does what?

The Principle of Least Surprise is a convention for function names. When designing a function, e.g. is_url_sketchy(url), imagine that another programmer is writing code to call this function. Assume that all the other programmers knows is its name since they don't bother to read the documentation. Therefore, the function should only take actions that one might expect given its name. So is_url_sketchy() should not, say, delete a bunch of files.

Readability-2 - Good Variable Names

The code in a function is a story, a narrative, and the variable and function names help you keep the parts of the story clear in your mind. A variable name provides a short label for a bit of data in the story.

Bugs - mix up two values. Many bugs result from the programmer mixing up two data values just in the two minutes they are working on those lines, resulting in a round of debugging.

brackets() Example

Previous lecture example - "left" is a fine variable name in there, labelling and distinguishing that value within the function. "x" or "i" would not be good choices.

def brackets(s):
    left = s.find('[')
    if left == -1:
        return ''
    right = s.find(']')
    return s[left + 1: right]

Too Long and Too Short Names

Here are some other possible names for left, exploring how long or short a variable name could be.

left                  # fine
left_index            # fine


int_index_of_left_paren   # too long
index_of_left_paren       # too long
# Don't need to spell out
# every detail in the name

a         # meaningless
li        # cryptic
l         # too short, and don't use "l"

Var Names For Similar/Related Values

Suppose the algorithm stored both the index and the character at that index - two values it would be very easy to mix up in the code. In that case, the variable names need added words to keep the two values straight:

left_index       # index of left char
left_ch          # char at that index

From the Sand homework, the x_from and x_to variables are good variable name examples. That code was difficult, but at least each variable was labeled as what it was. The code would have been more difficult if the four x/y variables were named a, b, c, d.

x_from
x_to

brackets() - Bad Names `a, b, c` Example

Here is a version of brackets() with bad, meaningless names - a, b, c:

def brackets(a):
    c = a.find('[')
    if c == -1:
        return ''
    b = a.find(']')
    return a[b + 1:c]   # compare below

Good vs. Bad Vars Example

Looking at the last lines of the good and bad versions demonstrates the role of good variable names. Look at the last line of the bad names version below. Is that line correct?

# Bad names version
return a[b + 1:c]  # buggy?


# Good names version
return s[left + 1:right]

With a bad variable, you have to look upwards in the code to remind yourself what value it holds. That's the sign of bad variable naming! The name of the variable should tell the story right there, not scrolling up to remind yourself what it holds. Save yourself some time and give the variable a sensible name.

Idiomatic Short Variable Names

There are some circumstances that are so common and idiomatic, that there are standard, idiomatic short variable names tuned for that situation.

s - idiomatic generic string
ch or char - character from a string
i, j, k - idiomatic index loop: 0, 1, 2, ... max-1
x, y - idiomatic x, y 2-d coordinates
Use to store x,y coordinate values
Use these to loop across 2-d
e.g. for x in range(image.width):
n - idiomatic generic int value
f - idiomatic opened file
lst - idiomatic list variable
There is no 1-letter variable for lists, since lowercase 'l' should be avoided
d - idiomatic dict variable

Never name a variable lowercase L or O - these look too much like the digits 1 and 0.

Style chapter 3 - Big Picture Strategy

Why are CS106A programs structured the way the are - the many functions, the main(), the Doctests. These structures are not an accident. There is a reason it's all done a particular way, and here it is.

Cost of N Lines of Code - N²

As rough rule of thumb, the difficulty of completing a body of code of N lines seems to be proportionate to N². This applies if the lines depend on each other directly, not if they are split into separate functions.

alt: n-squared costs per line

Naive: One Big main()

Say we are solving a 500 line problem. The naive approach would be to write all the code as one 500 line main() function. This is a terrible strategy, getting the worst of the N² curve - too much code all in one piece

alt:one big main()

Solution: Divide and Conquer - Functions - Modularity

The central CS technique to break the N² trap is dividing the program into a series of relatively small functions. This is known as "modularity" in the program.

alt: program divided into separate functions

Functions Talk To Each Other - Data Flow

We can divide the program into functions, but how do the functions work with each other? With the black-box model, we use the input and output data for each function as its contact point with all the other functions.

alt: black box function: parameters in, return value out

Whole Program Picture - Data Pipeline

If the functions are all separate, how do they work with each other? The black box model helps here - we connect the output of one function to the input of the next. The functions are separate, but work together.

alt: pipeline of functions, take in data file to start, putting out conclusions at the end

The functions are all separate, and yet they are working together to solve the whole problem. Their input/output interactions are kept as narrow and simple as possible.

Homework Functions - All Run Together

Most of the time on the homeworks, you are zoomed in on just one function, which is the right way to do it. It is harder to see the end-to-end pipeline the functions make once they are all run together.

CS Terms - Abstraction and Implementation

You are already familiar with the input/output framing of a function. Here we'll add on CS terms "abstraction" and "implementation". These ideas are crucial for computer systems, but you may find them handy for many parts of life.

alt: abstraction vs. implementation

Black Box 1 - Abstraction

The abstraction of a function is what it accomplishes - what it requires as input and what it will produce as output. We can think of this as the "contract" for the function: what is required to go in. What the function promises to provide. The abstraction contract is also basically what is written in the triple quoted """Docstring""" at the top of a function.

Black Box 2 - Implementation

The implementation detail of a function is all the code and complexity within the function that does the actual work. The word "detail" is sometimes used as a blanket term for all the implementation features hidden inside a function.

Usually, the abstraction for a function is relatively simple compared to its implementation.

Key: Calling A Function - Abstraction

What do you need to know to call a function correctly? Just the abstraction. The implementation can be hidden inside the function. Our strategy is to hide "implementation detail" inside the function so the rest of the program does not need to know or depend on it. This is howe we fight the N² curve.

alt: implementation is hidden

Example: Calling `datetime.now()`

now = datetime.now()

The builtin function datetime.now() returns a date-time value that represents the current date and time - suitable for printing, or recording in a file or something. That is its abstractions, which is simple.

What is its implementation? What chip on the computer does it query to get this info - we don't know. That's an implementation detail. It just promises to work when we call it; a nice simple abstraction.

It's very common that calling a function is relatively easy, while all sorts of detail and effort is hidden in the function's implementation.

Abstraction and Implementation Howto

1. function1()

Suppose you are writing a big program, and now it's time to work on function1():

def function1(s):
   """Given string s ..."""
   .. lots of detail ..
   .. in here ..

Work on function1. At this time, your mind is focussed on the function1 abstraction, and you are wrestling with the detail and bugs and whatnot of its implementation. Eventually you get it working perfectly.

2. function2()

Now it's time to work on function2() which calls function1() as a helper. Look at the key line below.

def function2(s):
    """..."""
    ....
    part = function1(s)  # the key line
    ...

What is your state of mind writing the key line? The abstraction of function1. Do not think about the implementation details of function1, though you were just working on it.

One Implementation in Mind At a Time

What the N² trap tells us, is that keeping all of the function implementations in mind at one time is not a good strategy. Here we only think about one implementation at a time. Once a function is done, we work only in terms of its abstraction. Work on one thing at a time.

The Power Of Not Knowing

We have sectioned off some of the program complexity inside function1. When it's time to call function1() we could think about how it is implemented. Instead, we embrace not knowing what's going on in there. Just call it, and it should meet its contract. Leverage the abstraction to only know what's needed as we go.

This is why Python and other languages have the """Docstring""" documentation, allowing the contract to be written out and easily accessible on the fly, so programmers can access just the abstraction they need, not looking at the implementation details.

Python Abstraction Features: Docstring, Doctests

Think about the abstraction for a funciton you are writing in Python. Choose a good function name, summarizing what it does. The parameters list its inputs. The """Docstring""" at the top of a function summarizes its abstraction in words. What does it require as input? What does it promise to return as output? We generally use the word "given" in here to refer to the parameters, like "Given values x and y, returns something something."

You can delete the ":param s: " stuff PyCharm puts in. That syntax is seldom used at this time. You can summarize the abstraction with the "Given x ..." Docstring.

The Doctests are another way to summarize the abstraction - not with words, but with a series of input/output examples. They also have the benefit of helping you debug your code.

def del_chars(s, target):
    """
    Given string s and a "target" string,
    return a version of s with all chars that
    appear in target removed, e.g. s 'abc'
    with target 'bx', returns 'ac'.
    (Not case sensitive)
    >>> del_chars('abC', 'acx')
    'b'
    >>> del_chars('ABc', 'aCx')
    'B'
    >>> del_chars('', 'a')
    ''
    """
    result = ''
    target = target.lower()
    for ch in s:
        if ch.lower() not in target:
            result += ch
    return result

Extra topic for fun if we have time.

Bits and Bytes

At the smallest scale in the computer, information is stored as bits and bytes. In this section, we'll look at how that works.

Bit

a "bit", like an atom, the smallest unit of storage
A bit stores just a 0 or 1
"In the computer it's all 0's and 1's" ... bits
Anything with two separate states can store 1 bit
Nick's tennis racket example
In a chip: electric charge = 0/1
In a hard drive: spots of North/South magnetism = 0/1
A bit is too small to be much use
Group 8 bits together to make 1 byte

Byte

One byte = grouping of 8 bits
e.g. 0 1 0 1 1 0 1 0
One byte can store one roman character, e.g. 'A' or 'x' or '$'

How Many Patterns With N Bits?

How many different patterns can be made with 1, 2, or 3 bits?

Number of bits	Different Patterns
1	0 1
2	00 01 10 11
3	000 001 010 011 100 101 110 111

Combare 3 bits vs. 2 bits
Consider just the leftmost bit
It can only be 0 or 1
Leftmost bit is 0, then append 2-bit patterns
Leftmost bit is 1, then append 2-bit patterns again
Result ... 3-bits has twice as many patterns as 2-bits
Every row - double the number of patterns of previous row

Number of bits	Different Patterns
1	0 1
2	00 01 10 11
3	000 001 010 011 100 101 110 111

In general: add 1 bit, double the number of patterns
1 bit -> 2 patterns
2 bits -> 4 patterns
3 bits -> 8 patterns
n bits -> 2ⁿ - 2 to the nth power
number of patterns is exponential of number of bits
Few things in life are exponential!
Compound interest (note: try to put something in 401k when young)
Spread of novel pathogen in population
Exponential growth is so fast, it is unintuitive

Number of bits	Number of Patterns
1	2
2	4
3	8
4	16
5	32
6	64
7	128
8	256

One Byte - 256 Patterns

1 byte is a group of 8 bits
8 bits can make 256 different patterns
How to store an int number in 1 byte?
Each number gets its own pattern
e.g. binary pattern 110 is the int 12, but we're not doing the details of that today
Imagine assigning each number its own pattern, starting with 0:
0 = pattern 1
1 = pattern 2
2 = pattern 3
...
254 = pattern 255
255 = pattern 256
There are 256 possible patterns, so 255 is the max int stored in one byte
pixel.red takes in a number 0..255, why?
The red/green/blue numbers of a pixel are each stored in one byte
That's why it's 0..255

"HDR" Image

HDR High Dynamic Range - more than 256 values
HDR uses 10 bits per color
How many more colors is that than 8 bits? 258 colors?
No it's exponential, doubling with each bit
8 bits = 256 colors
9 bits = 512 colors
10 bits = 1024 colors - HDR
10 bit HDR is maybe close to the human perceptual limit anyway
Uses a little more space, looks better