Today: part-1: coding style and design. part-2: string foreach, lists
Large Code Projects - Deceptively Difficult
- Large code projects are often difficult
- Often organizations have a body of code that never works right
It keeps absorbing programmer hours but is never debugged
- Good style and a disciplined approach are need create reliable working code
- CS106A has this in mind from the start
Readable
- "Readable" high level goal of good code
- Eye sees the code text, what it does is apparent
- The code "reads"
Read each line
Follow the narrative idea
Helped by good function names find() (verb)
Helped by good var names, e.g. left (noun)
- Why do we care?
- Fewer bugs!
- What is a bug?
- The code does something different from our intent
- i.e. looked at code, did not see what it actually did
- Techniques: good variable names, good function names, decomposition, spacing, comments
Readable 1.0 - Good Function Names
- Good function name - what action does this function take?
- Does not need to spell out everything
- A few words is the sweet spot
- Enough words so the fn-call "reads" in context
- Think about how function name will look when called...
if is_url_sketchy(url):
...
delete_files(files)
if distance(loc1, loc2) < 1.0:
...
# Is "compute_distance" a better name?
# In this case the one word reads fine IMHO,
if compute_distance(loc1, loc2) < 1.0:
...
Readable 2.0 - Good Variable Names
- Variable name = what value does this hold?
- The code is a story
- Variable names label the values progressing through the story
- The payoff of readable code is right now
- e.g. left and right in the example below
- Tension: shorter var name, less space, easy to type
- Longer var names: better spell out is in the var
- Do not: spell out every true thing about the value
- Do: label concept sufficiently to distinguish from others in this function
Variable Names Pay Off Right Now
You are writing a 10 line function. You have data that flows through, changes from line to line. You need to keep track of these in your own mind as you go from line to line to get this function written. Good variable and function names are big help here.
brackets() Code - Good Var Names
Previous lecture example - "left" is a fine variable name in there. "x" or "i" would not be good choices.
brackets(s): Look for a pair of brackets '[...]' within s, and return the text between the brackets, so the string 'cat[dog]bird' returns 'dog'. If there are no brackets, return the empty string. If the brackets are present, there will be only one of each, and the right bracket will come after the left bracket.
def brackets(s):
left = s.find('[')
if left == -1:
return ''
right = s.find(']')
return s[left + 1: right]
brackets() with Bad Var Names
Here is a buggy version of brackets() with bad variables. Look at the last line. Is that line correct? For each var, you have to look up to remind yourself what value it is. That's a bad sign! Better that the name of the variable just tells the story right there.
def brackets(x):
z = x.find('[')
if z == -1:
return ''
y = x.find(']')
return x[y + 1: z] # buggy?
Variable Name Choices for "left"
- Identify the noun/role within this function
- Distinguish from the other nouns here
- Do not need to include every true thing about it
- The variable name is just a handle
int_index_of_left_paren # Too long.
# Do not spell out
# every true thing.
index_of_left_paren # Too long.
left_index # fine
left # fine
li # too short/cryptic
l # too short, and don't use "l"
Exceptions: Idiomatic 1 Letter / Short Var Names
"idiomatic" - a common practice by many programmers, so it becomes a readable, recognizable shorthand.
- There are a few idiomatic 1 letter names
s - idiomatic generic string
ch - idiomatic for single char in string
i, j, k - idiomatic index loop: 0, 1, 2, ... max-1
n - idiomatic generic int value
x, y - idiomatic x, y 2-d coordinates
f - idiomatic opened file
lst - idiomatic list variable (soon)
d - idiomatic dict variable (soon)
- Never name a variable lowercase
L or O - look like digits 1 0
- Notice that the 1-letter name "s" is fine for brackets()
There is nothing semantic about s we are trying to keep track of
Decomp By Var Strategy
- You have something complicated to compute
- Could write it as one big line
- Instead, break it into separate lines
- Store partial results in variables as you go
- This a form of divide and conquer!
- Use variables to take on the problem piece by piece
- Breaking a long horizontal line into a vertical steps
- Lecture examples very frequently decomp by var like this
Decomp By Var Example Problem 'x3412y'
This is a classic make-a-drawing index problem. Getting this perfect is not so easy.
Function: Given a string s of even length, if the string length is 2 or less,
return it unchanged.
Otherwise take off the first and last chars.
Consider the remaining middle piece.
Split the middle into front and back halves.
Swap the order of these two halves, and return
the whole thing with the first and last chars
restored.
So 'x1234y' returns 'x3412y'.
Decomp By Var Solution
The variable names here help us keep the various parts clear through the narrative, even at the moment we are working out each line. The variable
names are naturally similar to those in the specification.
def foo(s):
if len(s) <= 2:
return s
first = s[0]
last = s[len(s) - 1]
mid = s[1:len(s) - 1]
halfway = len(mid) // 2
return first + mid[halfway:] + mid[:halfway] + last
The variable names don't have to be super detailed. Just enough to label the concepts through this narrative. Note that the one letter "s" is fine - there is nothing semantic about s that we need to keep track of beyond it's a string. In contrast, "first" "last" etc. have specific roles in the algorithm.
Point here: writing this function with a blank screen. Use good variable names to pick off and name parts of the problem as you work ahead.
The variables are sort of divide-and-conquer within the function - separate out and name individual steps of the algorithm vs. doing it in 1 big jump.
Bad Solution - No Decomp By Var
Here is the above function written without any good variables. Just because something is 1 line, does not make it better. I believe it's correct, but it' hard to tell!
This is a good example of not readable.
def foo(s):
if len(s) <= 2:
return s
return (s[0] + s[1:len(s) - 1][(len(s) - 2) // 2:] +
s[1:len(s) - 1][:(len(s) - 2) // 2] + s[len(s) - 1])
The bad code also repeats computations, like (len(s) - 2) // 2. The good solution computes that value once and stores it in the variable halfway for use by later lines.
Trick: If You Cannot Get A Line Working
- If you have a line that you just cannot get working
- Break it into separate steps with decomp-by-var
- Going in smaller steps can help you spot the bug
Avoid Needless Computation in Loop - Store in Var
Suppose we have this loop - n copies of the lowercase form of s. This code is fine, we will just point out a slight improvement.
def n_copies(s, n):
result = ''
for i in range(n):
result += s.lower()
return result
Notice that s.lower() computes the lowercase form of s in the loop. The readability is fine, but the code computes that lowercase form again and again and again. The lowercase of 'Hello' is the same 'hello' every time through the loop. This is a little wasteful. Could compute it once, store in a variable, use the variable in the loop:
def n_copies(s, n):
result = ''
low = s.lower()
for i in range(n):
result += low
return result
This is a slight improvement. It would be especially important if the s.lower() computation was costly. This issue appears in HW4. The first job is calling the helper function to get the right data in hand. A lesser question is - does this value need to be computed every time through the loop, or can we just compute it once?
Big Picture Software Costs - N2
- A software project might be planned to take 2 months
- And 2 years later, it still doesn't really work
- How is that possible?
- It all comes down to N-squared
N Squared Trap
- The central insight that drives program design
- Decomposition is fact
- Question: how much work is 500 line program vs. a 1000 line program?
- How many hours does it take as the number of line goes up?
- Goes up linearly - the intuitive but wrong answer
- CS experience: it's much worse than that
- Difficulty goes up as the square of the number of lines
- It's a concave-up curve
Decomposition - Escape N-Squared Trap
- Do not write a 1000 line program
- Write a series of 20 line functions
- Decomposition is about getting to the left on the n-squared curve
- A series of functions, each with just a few lines
- Never have all the lines in your head at one time
Black Box Model - 1. Abstraction
- (add 2 CS terms to the black box model)
- 1. "Abstraction"
- External contract what this function does
- What goes in? (the params)
- What comes out? (the return value)
- What does it compute given its params?
- """what is in the tripe-quote string"""
Black Box - 2. Implementation
- 2. "Implementation" Details
All the code inside the function, complicated
The word "detail" is associated with implementation
- Q: Does the caller need to know the internal details of the function?
- A: No!
- Our strategy is to hide "implementation detail" inside the function
- Calling a function, just need to know what it accomplishes
- Calling a function is simple relative to its internal details
Ride To Airport Abstraction vs. Implementation
- We use abstraction all the time in life
- Ride to airport abstraction:
Pick up time and place
Drop off time and place
Ride shared with others
- Ride to airport implementation details, don't care about:
Car has LED headlights?
Color of the seats?
Is the driver wearing a hat?
Is the gas tank more than 1/2 full
.. we care about drop off, which covers the detail about having enough gas to get there
- The point: abstraction is much simpler than implementation
- Calling a function - just the abstraction
How To Write a Program - Avoiding n2 Trap
- N line program
- Avoid having all N lines in your head at once
- 1. Work on function1()
Look at function1 abstraction (contract)
Work on function1 implementation to return result
Have function1 implementation details in head now
- 2. Work on function2()
Look at function2 abstraction
Work on function2 implementation
Call function 1, think only of its abstraction
Do not think about function1 implementation
- With each function, concentrate on just its implementation
- Build on other function abstractions
- This is our technique to build something big
- This is the central CS engineering trick for big projects
Abstraction in CS
- Working bigger problems
- You will constantly call some function you did not write
- Depend on its abstraction, not worrying about its implementation
- It is hard to overstate how much we depend on this pattern to build computer systems
# get list of filenames in named directory
filenames = os.listdir('Downloads')
# Get the current date and time
now = datetime.now()
Mechanics: Fn name, PyDoc, Doctests
- Applying these ideas in Python syntax
- 1. Have a good verb function name
- 2. List of params with good names - the inputs
We use the word "given" for these often
- 3. Abstraction - contract
Given these inputs, computes and returns what?
- Summarize the contract within Pydoc """triple quotes"""
Given params X Y Z
Returns xxx
- We've seen this many times
- Can delete the ":param s: " stuff PyCharm puts in, not needed at this level
- The Doctests are another way to express the contract, also help debugging
def del_chars(s, target):
"""
Given string s and a "target" string,
return a version of s with all chars that
appear in target removed, e.g. s 'abc'
with target 'bx', returns 'ac'.
(Not case sensitive)
>>> del_chars('abC', 'acx')
'b'
>>> del_chars('ABc', 'aCx')
'B'
>>> del_chars('', 'a')
''
"""
result = ''
target = target.lower()
for i in range(len(s)):
if s[i].lower() not in target:
result += s[i]
# could use "for char in s" form, since not using index
return result
How Not To Write a Program
- First type in all the code
- Have huge functions that each do many things
- Only when it's all typed in, try running it
- Ty to debug all the functions concurrently
How To Write a Program
- Decompose the program into separate functions
- Work on one function at a time
- Each function has well defined input and output
- Try to test each function independently
Doctests a great feature for this
- Then move on to the next function
- Don't have it all in your head at once
- Subtle benefit:
Can run + get feedback on ideas quite soon
Long before all functions are done
string foreach
- We have used for/i/range to index into string
- This highlighted the role of index numbers in string algorithms
Which I wanted to do!
- There is a simpler way to loop over the chars in a string
for ch in s:
Loops over all the chars in s, left to right
You do not get the index number here, just the char
No square brackets [ ] in this form
- The variable name
ch or char is idiomatic for one character
- Use this form if you do not need access to index numbers
- Use the for/i/range form if you need access to index numbers
String Foreach Examples
>
String Foreach examples
double_char2() example with foreach
def double_char2(s):
result = ''
for ch in s:
result = result + ch + ch
return result
Python Lists
See guide: Python List for more details about lists
- "list" type stores a linear collection of any type of python value
- Use list to store many of something
e.g. a thousand urls - a list of url strings
e.g. a million temperature readings - a list of float values
- Things in a list called "elements"
- Theme: python tries to be uniform:
len(), square brackets .. list works the same as string
- "lst" is a generic list variable name
1. List Literal: [1, 2, 3]
Use square brackets [..] to write a list in code (a "literal" list value), separating elements with commas
>>> lst = ['a, 'b', 'c']
"empty list" is just 2 square brackets with nothing within: []
2. Length of list: len(lst)
Use len() function, just like string
>>> len(lst)
3
3. Square Brackets to access element
Use square brackets to access an element in a list, like string again (bad index err possible). Valid index numbers are 0..len-1.
>>> lst[0]
'a'
>>> lst[2]
'c'
>>> lst[9]
Error:list index out of range
List Mutable
The big difference from strings is that lists are mutable - lists can be changed. Elements can be added, removed, changed over time.
1. List append()
- Lists can contain any type (today int, str)
lst.append('something') - adds an elem to end of list
- Modifies the list, returns nothing
- Common list-build pattern:
# 1. make empty list, then call .append() on it
>>> lst = []
>>> lst.append('a')
>>> lst.append('b')
>>> lst.append('c')
>>>
>>> lst
['a', 'b', 'c']
>>> len(lst)
3
>>> lst[0]
'a'
>>> lst[2]
'c'
>>>
# 2. Similar, using loop/range to call .append()
>>> lst = []
>>> for i in range(6):
... lst.append(i * 10)
...
>>> lst
[0, 10, 20, 30, 40, 50]
>>> len(lst)
6
>>> lst[5]
50
2. List "in" / "not in" Tests
- How to tell if a value is in a list?
hint: like string!
- The in operator tests if a value is in a list
- not in works too, reads nicely
- Style preference
x not in lst - preferred form
'not x in lst` - equivalent, but not preferred (applies to string too)
>>> lst = ['a', 'b', 'c']
>>> 'c' in lst
True
>>> 'x' in lst
False
>>> 'x' not in lst # preferred form to check not-in
True
>>> not 'x' in lst # not preferred equivalent
True
3. Foreach On List
- for "foreach" loop works to loop over elements in a list
- This is a common code pattern, since many algorithms want to look at all the elements
- No Change do not change the list - add/remove/change - during iteration
Kind of reasonable rule: how would iteration work if elements left and appeared in the midst of iteration
>>> lst = ['a', 'b', 'c']
>>> for s in lst:
... # use s in here
... print(s)
...
a
b
c
4. list.index(target) - Find Index of Target
- Similar to str.find(), but with one big difference
list.index(target) - returns int index of target if found
- Flaw: only works if target is in the list
- Code should check with
in first, only call lst.index() if in is True
- This design is annoying
It would be easier if lst.index() just returned -1, but it doesn't
- Variant:
list.index(target, start_index) - begin search at start_index instead of 0
>>> lst = ['a', 'b', 'c']
>>> lst.index('c')
2
>>> lst.index('d')
ValueError: 'd' is not in list
>>> 'd' in lst
False
>>> 'c' in lst
True
List Code Examples
>
list1 examples
- list_n() - create list [1, 2, 3, ..n] - use range() and append()
- donut_index() - use "in" and index()
- list_censor() - use everything
Constants in Python
STATES = ['CA, 'NY', 'NV', 'KY', 'OK']
- Simple form name=value at far left, not within a def
- This is a type of "global" variable
- A variable not inside a function
- In this case it's in effect a constant
- Functions can just refer to STATES to get its value
- Convention: upper case means its a de-facto constant
- Best style: a read-only value, don't modify
- Python does not enforce this for us
- Modified global variables are iffy style, we don't do it
- Can have "global" declaration
We'll never do this, enables read/write that we do not do
e.g. HW4 Crypto
# provided ALPHABET constant - list of the regular alphabet
# in lowercase. Refer to this simply as ALPHABET in your code.
# This list should not be modified.
ALPHABET = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
...
def foo():
for ch in ALPHABET: # this works
print(ch)
main() - Monday
- Need to show you how to write a main()
- Uses lists
- Last bit of Crypto - you write the main()
- Can go look at main() of crazycat example - simple example main()