Today: part-1: coding style and design. part-2: string foreach, lists

Large Code Projects - Deceptively Difficult

Large code projects are often difficult
Often organizations have a body of code that never works right
It keeps absorbing programmer hours but is never debugged
Good style and a disciplined approach are need create reliable working code
CS106A has this in mind from the start

Readable

"Readable" high level goal of good code
Eye sees the code text, what it does is apparent
The code "reads"
Read each line
Follow the narrative idea
Helped by good function names find() (verb)
Helped by good var names, e.g. left (noun)
Why do we care?
Fewer bugs!
What is a bug?
The code does something different from our intent
i.e. looked at code, did not see what it actually did
Techniques: good variable names, good function names, decomposition, spacing, comments

Readable 1.0 - Good Function Names

Good function name - what action does this function take?
Does not need to spell out everything
A few words is the sweet spot
Enough words so the fn-call "reads" in context
Think about how function name will look when called...

if is_url_sketchy(url):
  ...


delete_files(files)


if distance(loc1, loc2) < 1.0:
  ...


# Is "compute_distance" a better name?
# In this case the one word reads fine IMHO,
if compute_distance(loc1, loc2) < 1.0:
  ...

Readable 2.0 - Good Variable Names

Variable name = what value does this hold?
The code is a story
Variable names label the values progressing through the story
The payoff of readable code is right now
e.g. left and right in the example below
Tension: shorter var name, less space, easy to type
Longer var names: better spell out is in the var
Do not: spell out every true thing about the value
Do: label concept sufficiently to distinguish from others in this function

Variable Names Pay Off Right Now

You are writing a 10 line function. You have data that flows through, changes from line to line. You need to keep track of these in your own mind as you go from line to line to get this function written. Good variable and function names are big help here.

brackets() Code - Good Var Names

Previous lecture example - "left" is a fine variable name in there. "x" or "i" would not be good choices.

brackets(s): Look for a pair of brackets '[...]' within s, and return the text between the brackets, so the string 'cat[dog]bird' returns 'dog'. If there are no brackets, return the empty string. If the brackets are present, there will be only one of each, and the right bracket will come after the left bracket.

def brackets(s):
    left = s.find('[')
    if left == -1:
        return ''
    right = s.find(']')
    return s[left + 1: right]

brackets() with Bad Var Names

Here is a buggy version of brackets() with bad variables. Look at the last line. Is that line correct? For each var, you have to look up to remind yourself what value it is. That's a bad sign! Better that the name of the variable just tells the story right there.

def brackets(x):
    z = x.find('[')
    if z == -1:
        return ''
    y = x.find(']')
    return x[y + 1: z]  # buggy?

Variable Name Choices for "left"

Identify the noun/role within this function
Distinguish from the other nouns here
Do not need to include every true thing about it
The variable name is just a handle

int_index_of_left_paren   # Too long.
                          # Do not spell out
                          # every true thing.
index_of_left_paren       # Too long.

left_index            # fine
left                  # fine
li                    # too short/cryptic
l                     # too short, and don't use "l"

Exceptions: Idiomatic 1 Letter / Short Var Names

"idiomatic" - a common practice by many programmers, so it becomes a readable, recognizable shorthand.

There are a few idiomatic 1 letter names
s - idiomatic generic string
ch - idiomatic for single char in string
i, j, k - idiomatic index loop: 0, 1, 2, ... max-1
n - idiomatic generic int value
x, y - idiomatic x, y 2-d coordinates
f - idiomatic opened file
lst - idiomatic list variable (soon)
d - idiomatic dict variable (soon)
Never name a variable lowercase L or O - look like digits 1 0
Notice that the 1-letter name "s" is fine for brackets()
There is nothing semantic about s we are trying to keep track of

Decomp By Var Strategy

You have something complicated to compute
Could write it as one big line
Instead, break it into separate lines
Store partial results in variables as you go
This a form of divide and conquer!
Use variables to take on the problem piece by piece
Breaking a long horizontal line into a vertical steps
Lecture examples very frequently decomp by var like this

Decomp By Var Example Problem 'x3412y'

This is a classic make-a-drawing index problem. Getting this perfect is not so easy.

Function: Given a string s of even length, if the string length is 2 or less, return it unchanged. Otherwise take off the first and last chars. Consider the remaining middle piece. Split the middle into front and back halves. Swap the order of these two halves, and return the whole thing with the first and last chars restored. So 'x1234y' returns 'x3412y'.

Decomp By Var Solution

The variable names here help us keep the various parts clear through the narrative, even at the moment we are working out each line. The variable names are naturally similar to those in the specification.

def foo(s):
    if len(s) <= 2:
        return s
    first = s[0]
    last = s[len(s) - 1]
    mid = s[1:len(s) - 1]
    halfway = len(mid) // 2
    return first + mid[halfway:] + mid[:halfway] + last

The variable names don't have to be super detailed. Just enough to label the concepts through this narrative. Note that the one letter "s" is fine - there is nothing semantic about s that we need to keep track of beyond it's a string. In contrast, "first" "last" etc. have specific roles in the algorithm.

Point here: writing this function with a blank screen. Use good variable names to pick off and name parts of the problem as you work ahead.

The variables are sort of divide-and-conquer within the function - separate out and name individual steps of the algorithm vs. doing it in 1 big jump.

Bad Solution - No Decomp By Var

Here is the above function written without any good variables. Just because something is 1 line, does not make it better. I believe it's correct, but it' hard to tell!

This is a good example of not readable.

def foo(s):
    if len(s) <= 2:
        return s
    return (s[0] + s[1:len(s) - 1][(len(s) - 2) // 2:] +
            s[1:len(s) - 1][:(len(s) - 2) // 2] + s[len(s) - 1])

The bad code also repeats computations, like (len(s) - 2) // 2. The good solution computes that value once and stores it in the variable halfway for use by later lines.

Trick: If You Cannot Get A Line Working

If you have a line that you just cannot get working
Break it into separate steps with decomp-by-var
Going in smaller steps can help you spot the bug

Avoid Needless Computation in Loop - Store in Var

Suppose we have this loop - n copies of the lowercase form of s. This code is fine, we will just point out a slight improvement.

def n_copies(s, n):
    result = ''
    for i in range(n):
        result += s.lower()
    return result

Notice that s.lower() computes the lowercase form of s in the loop. The readability is fine, but the code computes that lowercase form again and again and again. The lowercase of 'Hello' is the same 'hello' every time through the loop. This is a little wasteful. Could compute it once, store in a variable, use the variable in the loop:

def n_copies(s, n):
    result = ''
    low = s.lower()
    for i in range(n):
        result += low
    return result

This is a slight improvement. It would be especially important if the s.lower() computation was costly. This issue appears in HW4. The first job is calling the helper function to get the right data in hand. A lesser question is - does this value need to be computed every time through the loop, or can we just compute it once?

Big Picture Software Costs - N²

A software project might be planned to take 2 months
And 2 years later, it still doesn't really work
How is that possible?
It all comes down to N-squared

N Squared Trap

The central insight that drives program design
Decomposition is fact
Question: how much work is 500 line program vs. a 1000 line program?
How many hours does it take as the number of line goes up?
Goes up linearly - the intuitive but wrong answer
CS experience: it's much worse than that
Difficulty goes up as the square of the number of lines
It's a concave-up curve

alt: hours to finish is proportionate to number of lines squared

Decomposition - Escape N-Squared Trap

Do not write a 1000 line program
Write a series of 20 line functions
Decomposition is about getting to the left on the n-squared curve
A series of functions, each with just a few lines
Never have all the lines in your head at one time

Black Box Model - 1. Abstraction

(add 2 CS terms to the black box model)
1. "Abstraction"
External contract what this function does
What goes in? (the params)
What comes out? (the return value)
What does it compute given its params?
"""what is in the tripe-quote string"""

Black Box - 2. Implementation

2. "Implementation" Details
All the code inside the function, complicated
The word "detail" is associated with implementation
Q: Does the caller need to know the internal details of the function?
A: No!
Our strategy is to hide "implementation detail" inside the function
Calling a function, just need to know what it accomplishes
Calling a function is simple relative to its internal details

Ride To Airport Abstraction vs. Implementation

We use abstraction all the time in life
Ride to airport abstraction:
Pick up time and place
Drop off time and place
Ride shared with others
Ride to airport implementation details, don't care about:
Car has LED headlights?
Color of the seats?
Is the driver wearing a hat?
Is the gas tank more than 1/2 full
.. we care about drop off, which covers the detail about having enough gas to get there
The point: abstraction is much simpler than implementation
Calling a function - just the abstraction

How To Write a Program - Avoiding n² Trap

N line program
Avoid having all N lines in your head at once
1. Work on function1()
Look at function1 abstraction (contract)
Work on function1 implementation to return result
Have function1 implementation details in head now
2. Work on function2()
Look at function2 abstraction
Work on function2 implementation
Call function 1, think only of its abstraction
Do not think about function1 implementation
With each function, concentrate on just its implementation
Build on other function abstractions
This is our technique to build something big
This is the central CS engineering trick for big projects

Abstraction in CS

Working bigger problems
You will constantly call some function you did not write
Depend on its abstraction, not worrying about its implementation
It is hard to overstate how much we depend on this pattern to build computer systems

# get list of filenames in named directory
filenames = os.listdir('Downloads')

# Get the current date and time
now = datetime.now()

Mechanics: Fn name, PyDoc, Doctests

Applying these ideas in Python syntax
1. Have a good verb function name
2. List of params with good names - the inputs
We use the word "given" for these often
3. Abstraction - contract
Given these inputs, computes and returns what?
Summarize the contract within Pydoc """triple quotes"""
Given params X Y Z
Returns xxx
We've seen this many times
Can delete the ":param s: " stuff PyCharm puts in, not needed at this level
The Doctests are another way to express the contract, also help debugging

def del_chars(s, target):
    """
    Given string s and a "target" string,
    return a version of s with all chars that
    appear in target removed, e.g. s 'abc'
    with target 'bx', returns 'ac'.
    (Not case sensitive)
    >>> del_chars('abC', 'acx')
    'b'
    >>> del_chars('ABc', 'aCx')
    'B'
    >>> del_chars('', 'a')
    ''
    """
    result = ''
    target = target.lower()
    for i in range(len(s)):
        if s[i].lower() not in target:
            result += s[i]
    # could use "for char in s" form, since not using index
    return result

How Not To Write a Program

First type in all the code
Have huge functions that each do many things
Only when it's all typed in, try running it
Ty to debug all the functions concurrently

How To Write a Program

Decompose the program into separate functions
Work on one function at a time
Each function has well defined input and output
Try to test each function independently
Doctests a great feature for this
Then move on to the next function
Don't have it all in your head at once
Subtle benefit:
Can run + get feedback on ideas quite soon
Long before all functions are done

string foreach

We have used for/i/range to index into string
This highlighted the role of index numbers in string algorithms
Which I wanted to do!
There is a simpler way to loop over the chars in a string
for ch in s:
Loops over all the chars in s, left to right
You do not get the index number here, just the char
No square brackets [ ] in this form
The variable name ch or char is idiomatic for one character
Use this form if you do not need access to index numbers
Use the for/i/range form if you need access to index numbers

String Foreach Examples

> String Foreach examples

double_char2() example with foreach

def double_char2(s):
    result = ''
    for ch in s:
        result = result + ch + ch
    return result

Python Lists

See guide: Python List for more details about lists

"list" type stores a linear collection of any type of python value
Use list to store many of something
e.g. a thousand urls - a list of url strings
e.g. a million temperature readings - a list of float values
Things in a list called "elements"
Theme: python tries to be uniform:
len(), square brackets .. list works the same as string
"lst" is a generic list variable name

1. List Literal: `[1, 2, 3]`

Use square brackets [..] to write a list in code (a "literal" list value), separating elements with commas

>>> lst = ['a, 'b', 'c']

"empty list" is just 2 square brackets with nothing within: []

2. Length of list: `len(lst)`

Use len() function, just like string

>>> len(lst)
3

3. Square Brackets to access element

Use square brackets to access an element in a list, like string again (bad index err possible). Valid index numbers are 0..len-1.

>>> lst[0]
'a'
>>> lst[2]
'c'
>>> lst[9]
Error:list index out of range

List Mutable

The big difference from strings is that lists are mutable - lists can be changed. Elements can be added, removed, changed over time.

1. List append()

Lists can contain any type (today int, str)
lst.append('something') - adds an elem to end of list
Modifies the list, returns nothing
Common list-build pattern:

# 1. make empty list, then call .append() on it
>>> lst = []         
>>> lst.append('a')
>>> lst.append('b')
>>> lst.append('c')
>>> 
>>> lst
['a', 'b', 'c']
>>> len(lst)
3
>>> lst[0]
'a'
>>> lst[2]
'c'
>>>
# 2. Similar, using loop/range to call .append()
>>> lst = []
>>> for i in range(6):
...     lst.append(i * 10)
... 
>>> lst
[0, 10, 20, 30, 40, 50]
>>> len(lst)
6
>>> lst[5]
50

2. List "in" / "not in" Tests

How to tell if a value is in a list?
hint: like string!
The in operator tests if a value is in a list
not in works too, reads nicely
Style preference
x not in lst - preferred form
'not x in lst` - equivalent, but not preferred (applies to string too)

>>> lst = ['a', 'b', 'c']
>>> 'c' in lst
True
>>> 'x' in lst
False
>>> 'x' not in lst  # preferred form to check not-in
True
>>> not 'x' in lst  # not preferred equivalent
True

3. Foreach On List

for "foreach" loop works to loop over elements in a list
This is a common code pattern, since many algorithms want to look at all the elements
No Change do not change the list - add/remove/change - during iteration
Kind of reasonable rule: how would iteration work if elements left and appeared in the midst of iteration

>>> lst = ['a', 'b', 'c']
>>> for s in lst:
...   # use s in here
...   print(s)
... 
a
b
c

4. list.index(target) - Find Index of Target

Similar to str.find(), but with one big difference
list.index(target) - returns int index of target if found
Flaw: only works if target is in the list
Code should check with in first, only call lst.index() if in is True
This design is annoying
It would be easier if lst.index() just returned -1, but it doesn't
Variant: list.index(target, start_index) - begin search at start_index instead of 0

>>> lst = ['a', 'b', 'c']
>>> lst.index('c')
2
>>> lst.index('d')
ValueError: 'd' is not in list
>>> 'd' in lst
False
>>> 'c' in lst
True

List Code Examples

> list1 examples

list_n() - create list [1, 2, 3, ..n] - use range() and append()
donut_index() - use "in" and index()
list_censor() - use everything

Constants in Python

STATES = ['CA, 'NY', 'NV', 'KY', 'OK']

Simple form name=value at far left, not within a def
This is a type of "global" variable
A variable not inside a function
In this case it's in effect a constant
Functions can just refer to STATES to get its value
Convention: upper case means its a de-facto constant
Best style: a read-only value, don't modify
Python does not enforce this for us
Modified global variables are iffy style, we don't do it
Can have "global" declaration
We'll never do this, enables read/write that we do not do

e.g. HW4 Crypto

# provided ALPHABET constant - list of the regular alphabet
# in lowercase. Refer to this simply as ALPHABET in your code.
# This list should not be modified.
ALPHABET = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

...

def foo():
    for ch in ALPHABET:  # this works
        print(ch)

main() - Monday

Need to show you how to write a main()
Uses lists
Last bit of Crypto - you write the main()
Can go look at main() of crazycat example - simple example main()