L24

Today: advanced lambda sort, int div, whole program example - pylibs

Advanced Sorting Exercises

More advanced uses of sorted/lambda.

Demo/Exercise: midpointy()

[9, 4, 6, 5, 1, 7]
# midpoint is (9 + 1) / 2 -> 5.0
# result ->  [4, 5, 6]

Given a list of 3 or more int numbers. We'll say the midpoint is the float average of the min and max numbers in the list. Return a list of the 3 numbers closest to the midpoint, sorted into increasing order. Use sorted/lambda.

Idea: Compute the midpoint using the list min() and max() functions.

mid = (min(nums) + max(nums)) / 2

Sort the numbers by their distance from the midpoint with abs().

Te lambda code can refer to variables defined within midpointy(), e.g. mid, since the lambda is inside midpointy(). Python and almost all computer languages use "lexical scoping" where a variable in a function is visible to code in that function, and not visible to code in other functions.

midpointy() Solution

This is some dense/powerful code.

def midpointy(nums):
    # (1) Compute mid. (2) Sort by distance from mid.
    # (3) Slice.
    mid = (min(nums) + max(nums)) / 2
    close = sorted(nums, key=lambda num: abs(mid - num))
    return sorted(close[:3])

Right Half of String - float vs. int

Suppose I want to extract the right half of a string.

>>> s = 'Python'

We'll say the right half begins at the index equal to half the string's length, rounding down if needed. So if the length is 6, the right begins at index 3. The obvious approach is something like this, which has some problems:

>>> s = 'Python'
>>> 
>>> right = len(s) / 2
>>> right
3.0
>>>

alt: right half of 'Python' starts at index 3

In the code above, "right" comes out as a float, 3.0, since the division operator / always returns a float value.

Unfortunately, every attempt to index or slice or use range() with the float fails. These only work with int values:

>>> s[right]
TypeError: string indices must be integers
>>> s[right:]
TypeError: slice indices must be integers or None or have an __index__ method
>>> range(right)
TypeError: 'float' object cannot be interpreted as an integer

Solution: int div `//`

Python has a separate "int division" operator //. It does division and discards any remainder, rounding the result down to the next integer.

>>> 7 // 2
3
>>> 8 // 2
4
>>> 99 // 25
3
>>> 100 // 25
4
>>> 102 // 25
4

Right Half of String

Use int div // to compute the right index of the string, and we are all set since it produces an int.

>>> s = 'Python'
>>> right = len(s) // 2
>>> right
3              # note: int
>>> 
>>> s[right:]    # int works!
'hon'
>>>

The int div rounds down, so length 6 and 7 will both treat 3 as the start of the right half, essentially putting the extra char for length 7 in the right half. If the string is odd length, we need to accept that one or the other "half" will have an extra character. Because int-div rounds down, problem specifications will commonly choose round-down to deal with the extra char to keep things simple.

(optional) Exercise right_half()

> right_half()

Big Picture Strategy
Writing a Whole Program

Our big strategy is dividing the big program up into separate, testable functions, and we've gotten a lot of mileage out of this strategy.

But where do the functions come from?

alt: divide program into functions

Divide and Conquer Strategy
aka Decomposition
Divide the program into smaller functions
Solve / test functions individually
Black box functions: parameters in, return value out
Fit the functions together, call from main()
But how to know what the functions should be?

Bottom Up Decomposition

Write the simplest helper function first
Then write bigger functions that use the helper
Write main() last
Many CS106A homework project handouts follow this order, it's very effective
But what is you have a blank page .. what are the funtions?

Top Down Decomposition

Another way to think about it, starting with a blank page
Start with main()
Think of a helper that would be useful
Like if the Code Genie could make a function appear magically - aspirational
Go write the helper, even though the other pieces are missing
As you go along, may think of other helpers that would be useful
Gradually write and mesh all the helpers together

Pylibs Exercise

Today we'll do a whole top-down process for the PyLibs in class, walking through the whole process.

Download pylibs.zip to get started. We'll work through this together.

Starter file is mostly blank
Start with main()
Think up helper functions as we go
At each step - imagine what a useful helper function would be

Pylibs Problem

First, look at what problem we want to solve - like Madlibs.

Say we have two files, a "terms" file and a "template" file. (It's handy to have terminology for the parts of your abstract problem to then use in yours docs, var names, etc.):

1. Terms file (have)

The "terms" file defines categories like 'noun' and 'verb' and example words for each category. Each line in the file has the category word first followed by examples of that category all separated by commas, like this:

noun,cat,donut,velociraptor
verb,nap,run

2. Template file (have)

The "template" file has lines of text, and within it are markers like "[noun]" where a random substitution should be done.

I had a [noun]
and it liked to [verb] all day

3. Pytlibs Output (want)

We want to run this program giving it the terms and templates files, and get the output like this

$ python3 pylibs.py test-terms.txt test-template.txt 
I had a velociraptor 
and it liked to nap all day

Let's do it. Write code in pylibs.py

Here we will follow a top-down order. At each step - think up what would be a useful helper to have, and then go write that helper. We still end at our traditional structure - helper functions to solve smaller sub-problems.

1. Look at main() - Think of Useful Helper

Have terms and template filenames
What would be a useful helper here?
Helper idea: read_terms()
in: terms filename
out: terms dict

Thought process: I have X and want Y. Write a function that takes X as input and returns Y, or perhaps the function returns something halfway to Y.

2. Write code: read_terms(filename)

Read terms file, build and return dict
First word on each line is like 'noun'
Use split(',')
Look at inputs and outputs below to get started

Looking at the input and desired output data is a nice way to get started on the code. Input line from terms file like the following. Sometimes I will paste an example line into the source code, where I'm writing the parse code for that sort of data.

noun,cat,donut,velociraptor

Here's our standard file-read code:

    with open(filename) as f:
        for line in f:
            line = line.strip()

Have the standard line = line.strip() to remove newline. Use parts = line.split(',') to separate on the words between the commas.

For each line, create an entry in terms dict like:

'noun': ['cat', 'donut', 'velociraptor']

File 'test-terms.txt' - write a Doctest

noun,cat,donut,velociraptor
verb,nap,run

Write a Doctest so we know this code is working before proceeding: read_terms('test-terms.txt')

Doctest trick: could just run the Doctest, look at what it returns, paste that into the Doctest as the desired output if it looks right. We are not the first programmers to have thought of this little shortcut.

read_terms() Solution

Here is our solution complete with docs and doctest - in lecture, anything that works is doing pretty well.

def read_terms(filename):
    """
    Given the filename of the terms file, read
    it into a dict with each 'noun' word as a key
    and its value is its list of substitutions
    like ['apple', 'donut', 'unicorn'].
    Return the terms dict.
    >>> read_terms('test-terms.txt')
    {'noun': ['cat', 'donut', 'velociraptor'], 'verb': ['nap', 'run']}
    """
    terms = {}
    with open(filename) as f:
        for line in f:
            line = line.strip()
            # line like: noun,apple,rabbit,velociraptor,balloon
            parts = line.split(',')
            term = parts[0]    # 'noun'
            words = parts[1:]  # ['apple', 'rabbit' ..]
            terms[term] = words
    return terms

3. main() Again

Call: terms = read_terms(args[0])
What is next helper to call from here?
How about: process_template(terms-dict, filename)
Reads through template file, prints out text with substitutions
Call that helper here, then we need to go write it

main() - calls two helpers, just need to write them

    # command line: terms-file template-file
    if len(args) == 2:
        terms = read_terms(args[0])
        process_template(terms, args[1])

4. Write code: process_template(terms, filename)

Here is the beginning code for process_template() which starts with the standard file for/line/f loop.

Handy trick - use line.split() (no parameters) to get the list of words that make up each line. This also takes care of the \n at the end.

line.split() -> ['I', 'had', 'a', '[noun]']

You can paste this in to get started.

def process_template(terms, filename):
    with open(filename) as f:
        for line in f:
            words = line.split()  # ['I', 'had', 'a', '[noun]']
            # Print each word with substitution done

Q: What would be a useful helper to have here?

A: A function that did the substitution for one word, e.g. a helper function where we pass in '[noun]' and it returns 'apple'.

5. Write code: substitute(terms, word)

If the word is of the form '[noun]' return a random substitute for it from the terms dict. Otherwise return the word unchanged.

Note 1: s.startswith() / s.endswith() very handy here to look for square brackets

Note 2: random.choice(lst) returns a random element from a list.

Here our solution has all the Doctests added, but for in-class anything that works is fine.

substitute() Solution

def substitute(terms, word):
    """
    Given terms dict and a word from the template.
    Return the substituted form of that word.
    If it is of the form '[noun]' return a random
    word from the terms dict. Otherwise
    return the word unchanged.
    >>> substitute({'noun': ['apple']}, '[noun]')
    'apple'
    >>> substitute({'noun': ['apple']}, 'pie')
    'pie'
    """
    if word.startswith('[') and word.endswith(']'):
        word = word[1:len(word) - 1]  # trim off [ ]
        if word in terms:
            words = terms[word]  # list of ['apple', 'donut', ..]
            return random.choice(words)
    return word

6. Complete process_template(), calling substitute()

Note: ultimately, the inner loop prints each word with the substitution done, followed by one space and no newline::
print(word + ' ', end='')

This printing is not difficult but is not obvious. The details are explained below if we have time.

            ...
            words = line.split()
            # Print each word with substitution done
            for word in words:
                sub = substitute(terms, word)
                print(sub + ' ', end='')
            print()

Observe: Nice Helper Function Example

Decomposing out substitute() is a nice example of a helper function: (1) separate out a sub-problem to solve and test in the helper independently. (2) Decomposing the helper function also makes the caller code in process_template() more clear. With the helper, the process_template() code is fairly simple, so it's not hard to imagine writing it correctly the first time.

(optional) Why print() This Way

Could write the inner loop with a simople print() as below, which would be a reasonable first guess at the code:

                sub = substitute(terms, word)
                print(sub)

It's perhaps easiest to understand this bug by running the code to see what it prints. The above version prints each word on a line by itself, since that's what print() does by default. Then add the end='' option, which turns off the ending '\n' in print(), and see what that prints. Then add the space following each word. The print() outside the loop prints a single newline to end each line of output words.

7. Run from main()

Run the finished code from the command line, with the files 'terms.txt' and 'template.txt'

$ cat terms.txt 
noun,velociraptor,donut,ray of sunshine
verb,run,nap,eat the bad guy
adjective,blue,happy,flat,shiny
$
$ cat template.txt 
I had a [noun] and
it was very [adjective]
when it would [verb]
$ 
$ python3 pylibs.py terms.txt template.txt 
I had a ray of sunshine and 
it was very shiny 
when it would nap
$
$ python3 pylibs.py terms.txt template.txt 
I had a velociraptor and 
it was very shiny 
when it would eat the bad guy 
$

The code in main() calls read_terms() and process_template() - see how the data flows between the separate functions. The terms dict is returned by read_terms() to be stored in main briefly. Main() then passes the terms dict in to process_template().

In the end we have a well-decomposed program — we have helper functions to solve sub problems, and each helper can be written and tested independently. Then the helpers are knitted together to solve the whole program.

Demo HW7a Ghost

Handout out now, don't need to start right away
Very algorithmic project
Leverage sorted()/lambda
Look at image series - think about outlier
clock tower
monster

One More Thing

World War II - England 1940, by itself
A hinge-of-history moment
If someone less anti-Hitler than Churchill were in charge?
England makes peace, Hitler dominates Europe?
German Enigma machine cryptography
Encrypts char by char, like our Cryptography homework
English cryptanalysts, breaking the Enigma code
By hand - looking for patterns
Scan for cipher text of 'ein'
Looking char by char, like your code
Alan Turing et al - created "bombe" machines to try char combinations
Bombe article
You can see early CS coming together at this moment
Like the invention of the loop, going through chars

Today's puzzle:

puzzle-crypt.txt