L23

Today: whole program example - pylibs, floating point, Ghost demo, mimic demo

Writing a Whole Program

Start with big problem
Decompose out functions for sub-problems
Solve them individually
Black box: parameters in, return value out
Doctests per function
Fit the functions together, call from main()
Profit!
Top down decomposition
Start with main()
Think up helpers that would be useful
Then go write them

Pylibs Exercise

Today we'll do a whole program in class to walk through the whole process.

Download pylibs.zip to get started. We'll work through this together.

Do a decomposition starting with nothing
Starter file is mostly blank
I'll code some, you code some
"Top Down" Decomposition
Start with main()
At each step - imagine what a useful function would be

Pylibs Problem

First, look at what problem we want to solve - like Madlibs.

Say we have two files. It's handy to invent terminology for the parts of your abstract problem to then use in yours docs, your var names, etc. Here we'll define "terms" and "template":

1. Terms file

The "terms" file provides a list of words for each "noun" type category. One category per line, separated by commas, like this:

noun,cat,donut,velociraptor
verb,nap,run

2. Template file

The "template" file has text, and within it are markers like "[noun]" where a random substitution should be done.

I had a [noun]
and it liked to [verb] all day

Command Line

We want to run this program giving it the terms and templates files, and get the output like this

$ $ python3 pylibs.py test-terms.txt test-template.txt 
I had a velociraptor 
and it liked to nap all day

Let's do it. Look at pylibs.py

1. main() - Useful Helper #1?

Have terms and template files
What would be a useful helper here?
Helper idea: read_terms()
in: terms filename
out: terms dict

2. Write code: read_terms(filename)

Read terms file, build and return dict
First word on each line is like 'noun'
Use split(',')
Look at inputs and outputs to work out the code

Input line from terms file like this:

noun,cat,rabbit,velociraptor

Use line = line.strip() to remove newline. Use parts = line.split(',') to separate on the commas.

Create entry in terms dict like:

'noun': ['cat', 'rabbit', 'velociraptor']

File 'test-terms.txt' - write a Doctest

noun,cat,donut,velociraptor
verb,nap,run

Write a Doctest for the file 'test-terms.txt', so we know this code is working before proceeding.

read_terms() Solution

Here is our solution complete with docs and doctest - in class, anything that works is doing pretty well.

def read_terms(filename):
    """
    Given the filename of the terms file, read
    it into a dict with each 'noun' word as a key
    and its value is its list of subs ['apple', 'donut', 'unicorn'].
    Return the terms dict.
    >>> read_terms('test-terms.txt')
    {'noun': ['cat', 'donut', 'velociraptor'], 'verb': ['nap', 'run']}
    """
    terms = {}
    with open(filename) as f:
        for line in f:
            # line is: noun,apple,rabbit,velociraptor,balloon
            line = line.strip()  # remove \n
            parts = line.split(',')
            term = parts[0]
            words = parts[1:]
            terms[term] = words
    return terms

3. main() Again

Call: terms = read_terms(args[0])
What is next helper to call from here?
How about: process_file(terms-dict, filename)
note: process_template() maybe a better name
Reads through filename, prints out text with substitutions

main() - calls two helpers, just need to write them

    # command line: terms-file template-file
    if len(args) == 2:
        terms = read_terms(args[0])
        process_file(terms, args[1])

4. Write code: process_file(terms, filename)

The beginning of this function is pretty standard, read through lines of filename. Here is the code for the start:

def process_file(terms, filename):
    """
    Given terms dict and filename of template.
    Process the template file, printing out its lines
    with the substitution done.
    """
    with open(filename) as f:
        for line in f:
            words = line.split()  # ['had', '[noun]']
            # print each word with substitution done

Key trick - use line.split() to get the list of words that make up each line. This also takes care of the \n at the end.

line.split() -> ['I', 'had', 'a', '[noun]']

Q: What is the useful helper we want here?

A: A function that did the substitution for one word, so calling it with '[noun]' returns 'apple' would be handy here - decompose that out.

5. Write code: substitute(terms, word)

If word is of the form '[noun]' return a random substitute for it from the terms dict. Otherwise return the word unchanged.

Note 1: s.startswith() / s.endswith() very handy here to look for square brackets

Note 2: random.choice(lst) returns a random element from list.

Here our solution has all the Doctests added, but for in-class anything that works is fine.

This is a nice example of a helper function: (1) isolates some complexity within this function were we can solve and test it. (2) Also makes its caller function more tractable.

substitute() Solution

def substitute(terms, word):
    """
    Given terms dict and a word from the template.
    Return the substituted form of that word.
    If it is of the form '[noun]' return a random
    word from the terms dict. Otherwise
    return the word unchanged.
    >>> substitute({'noun': ['apple']}, '[noun]')
    'apple'
    >>> substitute({'noun': ['apple']}, 'donut')
    'donut'
    """
    if word.startswith('[') and word.endswith(']'):
        word = word[1:len(word) - 1]  # trim off [ ]
        if word in terms:
            subs = terms[word]  # list of ['apple', 'donut', ..]
            return random.choice(subs)
    return word

6. Complete process_file(), calling substitute()

Note: print a word followed by one space and no newline:
print(word + ' ', end='')

            ...
            words = line.split()
            for word in words:
                sub = substitute(terms, word)
                print(sub + ' ', end='')
            print()

7. Run from main()

We have main() process_file() and substitute() wired together. Try it from the command line, with the files 'terms.txt' and 'template.txt'

$ cat terms.txt 
noun,velociraptor,donut,ray of sunshine
verb,run,nap,eat the bad guy
adjective,blue,happy,flat,shiny
$
$ cat template.txt 
I had a [noun] and
it was very [adjective]
when it would [verb]
$ 
$ python3 pylibs.py terms.txt template.txt 
I had a ray of sunshine and 
it was very shiny 
when it would nap
$
$ python3 pylibs.py terms.txt template.txt 
I had a velociraptor and 
it was very shiny 
when it would eat the bad guy 
$

Two Math Systems, "int" and "float" (Recall)

Two Systems
int and float are two different worlds
"float" .. floating decimal point, moves around
Float and int - each have their own area on the chip
Look similar, but distinct
6 - the int six
6.0 - the float six

# int
3  100  -2

# float, has a "."
3.14  -26.2  6.022e23

Math Works

Math works: + - * / min() max() for both int and float fine:
i.e. mostly don't have to think about it
Need to use int for indexing - [ ], grid.get(x, y)
Foreshadow:
Float mostly works easily
BUT Float has one crazy flaw .. revealed below
Clickbait: you will not believe what float does

Mixed Case: int + float = float "promotion"

Mixed case: int + float
Combine int and float .. yields float
Any float value "promotes" the computation to float
Note how output below is all float, some int inputs

>>> 1 + 1 + 1
3
>>> 1 + 1 + 1.0  # float promotion
3.0
>>> 3.14 * 2
6.28
>>> 3.0 * 3
9.0
>>> 3.14 * 2 + 1
7.28

float() int() Conversions

-Use float() to convert str to float value, similar to int()

>>> float('3.14')   # str -> float
3.14
>>> int(3.14)       # float -> int, truncation
3
>>> int('16')
16
>>> int('3.14')
ValueError: invalid literal for int() with base 10: '3.14'

Float - One Crazy Flaw - 1/10

Note: do not panic! We can work with this. But it is shocking.
Float arithmetic is a little imprecise
Off at the 15th digit .. there are erroneous "garbage" digits
1. Idea of 1/10th, mathematically pure
2. In Python code: looks like this 0.1
3. In the computer memory, actually: 0.100000000000076
There are some garbage digits way off to the right
The Math Will Not Come Out Exactly Right
This is a deep feature of computer floats, applies to all languages
The print routine hides a few digits, so often the garbage is hidden
But in the computation, the garbage is there

Crazy Flaw Demo - Adding 1/10th

Garbage digits are very often part of a float value
Printing omits a few stored digits at right
So often do not see the garbage
But eventually the garbage gets big enough to print...

>>> 0.1
0.1
>>> 0.1 + 0.1
0.2
>>> 0.1 + 0.1 + 0.1    # this is why we can't have nice things
0.30000000000000004
>>> 
>>> 0.1 + 0.1 + 0.1 + 0.1
0.4
>>> 0.1 + 0.1 + 0.1 + 0.1 + 0.1
0.5
>>> 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
0.6
>>> 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
0.7
>>> 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
0.7999999999999999     # here the garbage is negative

Another example with 3.14

>>> 3.14 * 3
9.42
>>> 3.14 * 4
12.56
>>> 3.14 * 5
15.700000000000001   # d'oh

Summary: float math is slightly wrong

Why Must We Have This Garbage?

The short answer, is that with a fixed number of bytes to store a floating point number in memory, there are some unavoidable problems where numbers have these garbage digits on the far right. It is similar to the way the number 1/3 is not possible to write it out precisely as a decimal number.

Crazy, But Not Actually A Problem

Everyone needs to remember:
float arithmetic always comes out a tiny bit wrong
(int arithmetic, comes out perfect)
The error is typically far less than 1-trillionth part
But the error is not zero
Most computations can handle an error of 1-trillionth part
Actually not a problem
How many digits of accuracy in the inputs, 6 digits?

Must Avoid One Thing: no ==

There is one concrete coding rule
Do not use == with float

>>> a = 3.14 * 5
>>> b = 3.14 * 6 - 3.14
>>> a == b   # Observe == not working right
False
>>> b
15.7
>>> a
15.700000000000001

abs(x) - the absolute value function
Instead of ==, look at abs(a-b)

>>> abs(a-b) < 0.00001
True

Exception: 0.0 is reliable for ==
Any float value * 0.0 will be exactly 0.0

Float Conclusions

1. Two systems, int 67 and float 3.14
2. Math works for both int/float seamlessly
3. Float has tiny error many digits to the right, don't use ==

Demo HW7 Ghost

Look at image series - think about outlier
clock tower
monster
Algorithmic Look At Pixels