Today: advanced lambda sort, int div, whole program example - pylibs
More advanced uses of sorted/lambda.
Given a list of 3 or more int numbers. Return the 3 numbers closest to 10, sorted into increasing order. For example the input [1, 10, 2, 9, 3, 12] returns [9, 10, 12]. Note: abs(n) is the absolute value function. Use sorted/lambda.
[1, 10, 2, 9, 3, 12] -> [9, 10, 12] [100, 5, 200, 15, 8, 4] -> [5, 8, 15]
Idea: For each number, n, consider the distance between n and 10. Basically this amounts to subtraction. Use the absolute value function abs() so negative numbers are changed to positive.
[9, 4, 6, 5, 1, 7] # midpoint is (9 + 1) / 2 -> 5.0 # result -> [4, 5, 6]
Given a list of 3 or more int numbers. We'll say the midpoint is the float average of the min and max numbers in the list. Return a list of the 3 numbers closest to the midpoint, sorted into increasing order. Use sorted/lambda.
Idea: Compute the midpoint using the list min() and max() functions
mid = (min(nums) + max(nums)) / 2
Sort the numbers by their distance from the midpoint with abs(). Detail: the lambda code can refer to variables defined within midpointy(), e.g. mid, since the lambda is inside midpointy().
CS terminology: This is "lexical scoping" where a variable in a function is visible to code in that function, and not visible to code in other functions. Each function is a variable "scope" and they are kept separate.
This is some dense/powerful code.
def midpointy(nums):
mid = (min(nums) + max(nums)) / 2
# Order with closest to mid first
close = sorted(nums, key=lambda num: abs(mid - num))
# Slice to grab the 3 closest
return sorted(close[:3])
Suppose I want to extract the right half of a string.
>>> s = 'Python'
We'll say the right half begins at the index equal to half the string's length, rounding down if needed. So if the length is 6, the right begins at index 3. The obvious approach is something like this, which has some problems:
>>> s = 'Python' >>> >>> right = len(s) / 2 >>> right 3.0 >>>
In the code above, "right" comes out as a float, 3.0, since the division operator / always returns a float value.
Unfortunately, every attempt to index or slice or use range() with the float fails. These only work with int values:
>>> s[right] TypeError: string indices must be integers >>> s[right:] TypeError: slice indices must be integers or None or have an __index__ method >>> range(right) TypeError: 'float' object cannot be interpreted as an integer
//Python has a separate "int division" operator //. It does division and discards any remainder, rounding the result down to the next integer.
>>> 7 // 2 3 >>> 8 // 2 4 >>> 59 // 10 5 >>> 60 // 10 6 >>> 100 // 25 4 >>> 102 // 25 4
Use int div // to compute the right index of the string, and we are all set since it produces an int.
>>> s = 'Python' >>> mid = len(s) // 2 >>> mid 3 # note: int >>> >>> s[mid:] # int works! 'hon' >>>
The int div rounds down, so length 6 and 7 will both treat 3 as the start of the right half, essentially putting the extra char for length 7 in the right half. If the string is odd length, we need to accept that one or the other "half" will have an extra character. Because int-div rounds down, problem specifications will commonly choose round-down to deal with the extra char to keep things simple.
Today we'll do a whole program in class to walk through the whole process.
Download pylibs.zip to get started. We'll work through this together.
First, look at what problem we want to solve - like Madlibs.
Say we have two files, a "terms" file and a "template" file. (It's handy to have terminology for the parts of your abstract problem to then use in yours docs, var names, etc.):
The "terms" file defines categories like 'noun' and 'verb' and example words for each category. Each line in the file has the category word first followed by examples of that category all separated by commas, like this:
noun,cat,donut,velociraptor verb,nap,run
The "template" file has lines of text, and within it are markers like "[noun]" where a random substitution should be done.
I had a [noun] and it liked to [verb] all day
We want to run this program giving it the terms and templates files, and get the output like this
$ python3 pylibs.py test-terms.txt test-template.txt I had a velociraptor and it liked to nap all day
Let's do it. Write code in pylibs.py
Here we will follow a top-down order, thinking up what a useful helper would be as go, and then writing the helper. We still end at our traditional structure - helper functions to solve smaller sub-problems.
Thought process: I have X and want Y. Write a function that takes X as input and returns Y, or perhaps the function returns something halfway to Y.
Looking at the input and desired output data is a nice way to get started on the code. Input line from terms file like this:
noun,cat,donut,velociraptor
Have the standard line = line.strip() to remove newline. Use parts = line.split(',') to separate on the commas.
Create entry in terms dict like:
'noun': ['cat', 'donut', 'velociraptor']
File 'test-terms.txt' - write a Doctest
noun,cat,donut,velociraptor verb,nap,run
Write a Doctest so we know this code is working before proceeding: read_terms('test-terms.txt')
Doctest trick: could just run the Doctest, look at what it returns, paste that into the Doctest as the desired output if it looks right. We are not the first programmers to have thought of this little shortcut.
Here is our solution complete with docs and doctest - lice in lecture, anything that works is doing pretty well.
def read_terms(filename):
"""
Given the filename of the terms file, read
it into a dict with each 'noun' word as a key
and its value is its list of substitutions
like ['apple', 'donut', 'unicorn'].
Return the terms dict.
>>> read_terms('test-terms.txt')
{'noun': ['cat', 'donut', 'velociraptor'], 'verb': ['nap', 'run']}
"""
terms = {}
with open(filename) as f:
for line in f:
line = line.strip()
# line like: noun,apple,rabbit,velociraptor,balloon
parts = line.split(',')
term = parts[0] # 'noun'
words = parts[1:] # ['apple', 'rabbit' ..]
terms[term] = words
return terms
main() - calls two helpers, just need to write them
# command line: terms-file template-file
if len(args) == 2:
terms = read_terms(args[0])
process_template(terms, args[1])
Here is the beginning code for process_template() which starts with the standard file for/line/f loop.
Handy trick - use line.split() (no parameters) to get the list of words that make up each line. This also takes care of the \n at the end.
line.split() -> ['I', 'had', 'a', '[noun]']
You can paste this in to get started.
def process_template(terms, filename):
with open(filename) as f:
for line in f:
words = line.split() # ['I', 'had', 'a', '[noun]']
# Print each word with substitution done
Q: What would be a useful helper to have here?
A: A function that did the substitution for one word, so calling it with '[noun]' returns 'apple' would be handy here - decompose that out.
If word is of the form '[noun]' return a random substitute for it from the terms dict. Otherwise return the word unchanged.
Note 1: s.startswith() / s.endswith() very handy here to look for square brackets
Note 2: random.choice(lst) returns a random element from list.
Here our solution has all the Doctests added, but for in-class anything that works is fine.
def substitute(terms, word):
"""
Given terms dict and a word from the template.
Return the substituted form of that word.
If it is of the form '[noun]' return a random
word from the terms dict. Otherwise
return the word unchanged.
>>> substitute({'noun': ['apple']}, '[noun]')
'apple'
>>> substitute({'noun': ['apple']}, 'pie')
'pie'
"""
if word.startswith('[') and word.endswith(']'):
word = word[1:len(word) - 1] # trim off [ ]
if word in terms:
words = terms[word] # list of ['apple', 'donut', ..]
return random.choice(words)
return word
Note: print a word followed by one space and no newline:
print(word + ' ', end='')
The end='' option for print() suppresses the printed newline. Then have a single print() after the loop to print one newline.
...
words = line.split()
# Print each word with substitution done
for word in words:
sub = substitute(terms, word)
print(sub + ' ', end='')
print()
Decomposing out substitute() is a nice example of a helper function: (1) separate out a sub-problem to solve and test in the helper independently. (2) Decomposing the helper function also makes the caller code in process_template() more clear.
Could write the inner loop this way at first, which is a reasonable first guess:
sub = substitute(terms, word)
print(sub)
Run this version to see how it's not quite right. It prints each word on a line by itself, since that's what print() does by default. Then add the end='' option, which turns off the ending '\n' in print(). Then add the space following each word.
The code in main() passes calls read_terms() and process_template() - see how terms dict is returned by read_terms(), and then is passed as a parameter in to process_template(). Try it from the command line, with the files 'terms.txt' and 'template.txt'
$ cat terms.txt noun,velociraptor,donut,ray of sunshine verb,run,nap,eat the bad guy adjective,blue,happy,flat,shiny $ $ cat template.txt I had a [noun] and it was very [adjective] when it would [verb] $ $ python3 pylibs.py terms.txt template.txt I had a ray of sunshine and it was very shiny when it would nap $ $ python3 pylibs.py terms.txt template.txt I had a velociraptor and it was very shiny when it would eat the bad guy $
Today's puzzle: