Today: lambda output, lambda/def, custom sorting with lambda, wordcount sorting, introduction to modules

Lambda - Power Feature

Lambda is powerful feature, letting you express a lot of computation in very little space. As a result, it's weird looking at first, but when it clicks, you should feel like a Power Hacker when you wield it.

Lambda - Code as a Parameter

Countless times, you have called a function and passed in some data for it to use. The function name is the verb, and the parameters are extra nouns to guide the computation:

# e.g. "range" is the verb, up to 10
range(10)

# e.g. "draw_line" is the verb, with these int coords
canvas.draw_line(0, 0, 100, 50, color='red')

With lambda, we open up a new category, passing in code as the parameter for the function to use, e.g. with map():

map(lambda s: s.upper() + '!', ['pass', 'code']) ->
 ['PASS!', 'CODE!']

Having an easy way to pass code between functions can be very handy.

Map With Type-Change

Look at the "lambda1" section on the experimental server again.

> lambda1

The output list does not need to have the same element type as the input list. The lambda can output any type it likes, and that will make the output list. See examples: super_tuple() and lens()

lens(strs): Given a list of strings. return a list of their int lengths.

Solution

def lens(strs):
    return map(lambda s: len(s), strs)

Lambda vs. Def

Lambda and def are similar:

def double(n):
    return n * 2

Equivalent lambda

lambda n: n * 2

Def Features

def introduces a name for the code
Def has room for real code features:
Multiple lines
If statements
Variables
Loops
Inline comments
Lambda: best without any of that, just short, 1-line

Def vs. Lambda

Shown map() with lambda many times - that's the sweet spot
What do if computation does not fit in 1 line?
Just write a def
map() can use the def

map/def Example - map_parens()

In lambda1, see the map_parens() problem.

['xx(hi)xx', 'abc(there)xyz', 'fish'] -> ['hi', 'there', 'fish']

Solution Code

def map_parens(strs):
    return map(parens, strs)


def parens(s):
    left = s.find('(')
    right = s.find(')', left)
    
    if left == -1 or right == -1:
        return s
    return s[left + 1:right]

Lambda-2 Examples (Optional)

> lambda-2 examples

More filled-out uses of lambda, more realistic. Some of these use a list of numbers, some with list of (x, y) tuples.

xy_sum()

Given a list of len-2 (x, y) tuples. Return a list of the sums of each tuple. Shows that the result-list does not need to hold the same type as the input list. Solve with a map/lambda.

[(4, 2), (1, 2) (2, 3)]  ->  [6, 3, 5]

This is a list of points. Q: What type is the param to the lambda? A: one point

Solution

def xy_sum(points):
    return map(lambda pt: pt[0] + pt[1], points)

xs()

Given a list of len-2 (x, y) tuples. Return a list of just the x value of each tuple. Solve with a map/lambda.

Solution

def xs(points):
    return map(lambda pt: pt[0], points)

min_x()

Given a non-empty list of len-2 (x, y) tuples. What is the leftmost x among the tuples? Return the smallest x value among all the tuples, e.g. [(4, 2), (1, 2) (2, 3)] returns the value 1. Solve with a map/lambda and the builtin min(). Recall: min([4, 1, 2]) returns 1

[(4, 2), (1, 2), (2, 3)]  -> 1

Solution

def min_x(points):
    return min(map(lambda pt: pt[0], points))
    # Use map/lambda to form a list of
    # just the x coords. Feed that into min()

Custom Sort - Power Feature

Python sorting has a lot of power in it
Use lambda to guide the sorting
This code feels powerful and dense
More examples in section!

Python Custom Sort - Food Examples

Lamest food I could think of - Radish
Suppose I have len-3 food tuples, each with 3 parts:
food = (name, tasty 1-10, healthy 1-10)
food[0] = its name
food[1] = how tasty it is 1-10
food[2] = how healthy it is 1-10
how sorted() works on list of tuples
By default compares [0] first, then [1], ...

>>> foods = [('radish', 2, 8), ('donut', 10, 1), ('apple', 7, 9), ('broccoli', 6, 10)]
>>> 
>>> # By default, sorts food tuples by [0]
>>> sorted(foods)
[('apple', 7, 9), ('broccoli', 6, 10), ('donut', 10, 1), ('radish', 2, 8)]
>>>

Sort By Tastiness

Say I want to sort by tastiness
e.g. the radish vs. donut dimension
Control how sorted() looks a the data
Like drawing a circle around tasty values - sort by these!
How can we get the code to do this?

Project Out Sort-By Values

How to code sort-by-tasty
For each element in list
"Project out" a sort-by value to be used in sorting comparisons
Here, for each food, project out its tasty int
aka "Proxy" strategy
Each element, proxy value is used for sorting comparisons

Project Out With Lambda

Q: how to project out these sort-by proxy values?
A: lambda

Custom Sort Lambda - Plan

1. Call sorted() as usual
2. provide key=lambda to control sorting
Lambda here takes one parameter - an elem from the list
The lambda projects out the sort-by value to use for comparisons
e.g. sort by tasty
lambda food: food[1]
e.g. sort by healthy
lambda food: food[2]

Q: What is the parameter to the lambda?

A: One elem from the list (similar to map() function)

Food Sort Examples

sorted(lst) - default sort, increasing order
sorted(lst, reverse=True) - reverse option, decreasing order
examples below
sort by tasty (reverse .. highest value first)
sort by healthy
sort by tasty * healthy

Sort By Tasty

>>> foods = [('radish', 2, 8), ('donut', 10, 1), ('apple', 7, 9), ('broccoli', 6, 10)]
>>> 
>>> sorted(foods, key=lambda food: food[1])
[('radish', 2, 8), ('broccoli', 6, 10), ('apple', 7, 9), ('donut', 10, 1)]

Most Tasty (reverse=True)

>>> sorted(foods, key=lambda food: food[1], reverse=True)  # most tasty
[('donut', 10, 1), ('apple', 7, 9), ('broccoli', 6, 10), ('radish', 2, 8)]

Most Healthy

>>> sorted(foods, key=lambda food: food[2], reverse=True)  # most healthy
[('broccoli', 6, 10), ('apple', 7, 9), ('radish', 2, 8), ('donut', 10, 1)]

Most tasty * healthy

Not limited to just projecting out existing values. We can project out a computed value. Here we compute tasty * healthy and sort on that. So apple is first, 7 * 9 = 63, broccoli is second with 6 * 10 = 60. Donut is last :(

>>> sorted(foods, key=lambda food: food[1] * food[2], reverse=True)
[('apple', 7, 9), ('broccoli', 6, 10), ('radish', 2, 8), ('donut', 10, 1)]
>>>

Sorted vs. Min Max

What code give us the most tasty food?
Or the least tasty?
Sorting n things is kind of expensive
Could sort, take the last item - overly expensive approach
Use max(), max takes a key=lambda just like sorted()
e.g. pull out most or least tasty food - change "sorted" to "max" or "min"

>>> foods = [('radish', 2, 8), ('donut', 10, 1), ('apple', 7, 9), ('broccoli', 6, 10)]
>>> max(foods)     # uses [0] by default - tragic!
('radish', 2, 8)
>>> 
>>> sorted(foods, key=lambda food: food[1])
[('radish', 2, 8), ('broccoli', 6, 10), ('apple', 7, 9), ('donut', 10, 1)]
>>> 
>>> max(foods, key=lambda food: food[1])  # most tasty
('donut', 10, 1)
>>> min(foods, key=lambda food: food[1])  # least tasty
('radish', 2, 8)

Key performance point: computing one max/min element is much faster than sorting all n elements.

Python Custom Sort String Examples

Default sorted() uses "<"
With strings, < places uppercase before lowercase, rarely what we want

>>> # The default sorting is not good with upper/lower case
>>> strs = ['coffee', 'Donut', 'Zebra', 'apple', 'Banana']
>>> sorted(strs)
['Banana', 'Donut', 'Zebra', 'apple', 'coffee']

String Sort Lambda

Fix: project out lowercase version of string as sort-by
The lambda takes in one elem from list - in this case 1 string
e.g. lambda s: s.lower()
Examples: sort not case-sensitive, sort by last char

>>> strs = ['coffee', 'Donut', 'Zebra', 'apple', 'Banana']
>>> 
>>> sorted(strs, key=lambda s: s.lower())    # not case sensitive
['apple', 'Banana', 'coffee', 'Donut', 'Zebra']
>>> 
>>> sorted(strs, key=lambda s: s[len(s)-1])  # by last char
['Zebra', 'Banana', 'coffee', 'apple', 'Donut']
>>>

Put It All Together - WordCount + Sorted

Look at wordcount project, apply custom sorting to the output stage.

Sorted vs. Dict Count Items

Wordcount has a "counts" dict, key is a word, value is its count
Use counts.items()
Gives us a "items" list of pairs: (char, count)
I'll use "items" as the var name here, echoing the "d.items()" function name

>>> items = [('z', 1), ('a', 3), ('e', 11), ('b', 3), ('c', 2)]

Copy that items list into interpreter, try these code challenges
Questions we could ask of the pairs - demo or you-try-it
1. How to sort items in increasing order by char (easy!)
2: How to sort items in increasing order by count?
3. How to sort items in decreasing order by count?
4: How to access the pair with the largest count?

>>> items = [('z', 1), ('a', 3), ('e', 11), ('b', 3), ('c', 2)]
>>> 
>>> # sort by [0]=word is the default
>>> sorted(items)
[('a', 3), ('b', 3), ('c', 2), ('e', 11), ('z', 1)]
>>> 
>>> sorted(items, key=lambda pair: pair[1])   # sort by count
[('z', 1), ('c', 2), ('a', 3), ('b', 3), ('e', 11)]
>>> 
>>> sorted(items, key=lambda pair: pair[1], reverse=True)
[('e', 11), ('a', 3), ('b', 3), ('c', 2), ('z', 1)]
>>> 
>>> max(pairs, key=lambda pair: pair[1])      # largest count
('e', 11)

Wordcount - Top-Count - Lambda

Here is the WordCount project we had before. This time look at the print_counts() and print_top() functions.

> wordcount.zip

print_counts() - Alphabetic Output

Here is the output of the regular print_counts() function, which prints out in alphabetic order. Output looks like:

$ python3 wordcount.py poem.txt 
are 2
blue 2
red 2
roses 1
violets 1
$

print_counts() Solution

This is the standard dict-output sorted loop.

def print_counts(counts):
    """
    Given counts dict, print out each word and count
    one per line in alphabetical order, like this
    aardvark 1
    apple 13
    ...
    """
    for word in sorted(counts.keys()):
        print(word, counts[word])
    # Alternately use .items() to access all the key/value data
    # for key, value in sorted(counts.items()):
    #    print(key, value)

print_top()

The print_top(counts, n) function - print the n most common words in decreasing order by count.

$ python3 wordcount-solution.py -top 10 alice-book.txt 
the 1639
and 866
to 725
a 631
she 541
it 530
of 511
said 462
i 410
alice 386

Look at print_top() function
Recall: dict.items() - random order of word/count pairs
[('sister', 12), ('rabbit', 5), ...]
Need to order the pairs in decreasing order by count
Use sorted/lambda
This code is incredibly short and powerful

print_top() Solution

def print_top(counts, n):
    """
    Given counts dict and int N, print the N most common words
    in decreasing order of count
    the 1045
    a 672
    ...
    """
    items = counts.items()
    # Could print the items in raw form, just to see what we have
    # print(items)
    pass
    # Your code - my solution is 3 lines long, but it's dense!
    # Sort the items with a lambda so the most common words are first.
    # Then print just the first N word,count pairs with a slice
    items = sorted(items, key=lambda pair: pair[1], reverse=True) # 1. Sort largest count first
    for word, count in items[:n]:                                 # 2. Slice to grab first N
        print(word, count)

Modules and Modern Coding

"module" - unit of code to use, aka "library"
Every module has a name, e.g. "math"
Module contains lots of functions, solving common problems
Modern coding:
1. Writing custom code
2. Calling built-in functions e.g. sorted(), str.lower()
3. Calling module code, e.g. math - today

Standard Modules - import math

Python comes with "standard" modules
Standard modules are installed when Python is installed, so no extra step required
These are preferred
e.g. the "math" module - mathematics
e.g. the "sys" module - interface with operating system
import module by name
Use dot to refer to functions etc. in the module
math.sqrt(2) - function call
math.pi - pi constant within math
Shown here in interpreter, but works in .py too
Common error: forgetting to do the import
Aside: there are other import forms, but this name/dot form is the most important

>>> import math
>>> math.sqrt(2)  # call sqrt() fn
1.4142135623730951
>>> math.sqrt

>>> 
>>> math.log(10)
2.302585092994046
>>> math.pi       # constants in module too
3.141592653589793

Quit and restart the interpreter without the import, see common error:

>>> # quit and restart interpreter
>>> math.sqrt(2)  # OOPS forgot the import
Traceback (most recent call last):
NameError: name 'math' is not defined
>>>
>>> import math
>>> math.sqrt(2)  # now it works
1.4142135623730951

Hacker: Use dir() and help() (optional)

Feel like a hacker, use dir() and help() on module
In the interpreter >>>
dir(module) - shows a list of all the defs in the module
help(module.fn) - shows some help text for that function
The """Pydoc""" we write to describe each function
That Pydoc is what help() returns (demo later)

>>> import math
>>> dir(math)
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']
>>>
>>> help(math.sqrt)
Help on built-in function sqrt in module math:

sqrt(x, /)
    Return the square root of x.
>>>
>>> help(math.cos)
Help on built-in function cos in module math:

cos(x, /)
    Return the cosine of x (measured in radians).

Module Recap

What is a module?
Has a name
Holds lots of functions
i.e. "library" of functions
Use import math to bring it in
Use math.sqrt(5) syntax to call a function in a module

wordcount.py Is a Module

The file wordcount.py

Forms a module named wordcount

Suppose you have built some useful functions
Someone else in your lab wants to use them....
Them pasting in their own copy is not ideal
What does a module contain?
We have wordcount.py
python3 wordcount.py - runs main()
wordcount.py is also a module named just "wordcount"
Think of all the defs in wordcount: read_counts(), clean(), print_counts(),
import works on wordcount (in the same directory)
Access functions as module.xxx just like usual
Run python interpreter in wordcount directory to try this
Try importing wordcount, calling the read_counts() function
Call wordcount.clean()

>>> # Run interpreter in wordcount directory
>>> import wordcount
>>>
>>> wordcount.read_counts('test1.txt')
{'a': 2, 'b': 2}

A module/file contains many defs
Can import a module/file, call its defs:
module.fn_name()
Style: for a function to be usable from another module...
it should take in data as parameters and return a value
i.e. black box style
we've done this all along, see now the bigger picture
Babygraphics project:
treats babynames.py as a module
import babynames
calls babynames.read_files()
This is how part (b) calls the part (a) code across files

dir() and help() work on wordcount Too.

Look at wordcount.py, look at the functions
dir() and help() work here too
See where the """Pydoc""" goes!

>>> dir(wordcount)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'clean', 'main', 'print_counts', 'print_top', 'read_counts', 'sys']
>>> 
>>> help(wordcount.read_counts)

Help on function read_counts in module wordcount:

read_counts(filename)
    Given filename, reads its text, splits it into words.
    Returns a "counts" dict where each word
    ...

Module Summary

You have written many foo.py files with defs in the. Now we see, another program can import foo, and then call foo.bar() in your module to access your functions (in the same directory). It helps that your functions have well defined input/output, so they can be used and re-mixed by other programs.

Optional: that main thing

Two Ways of wordcount.py

Two ways this file can be used
1. python3 wordcount.py poem.txt
load up all the wordcount.py code
run main() ... it runs all the functions
2. Some code does import wordcount
In this case:
Load all the wordcount.py code
do not run main
Module sits there
Waiting for a call like wordcount.read_counts() to happen
How are these two cases distinguished?
(1) is common BTW, (2) is rare but should work

How Does Python Know To Run main()?

There is a special variable
__name__
Python aside: when a name begins with underbars
Signifies an internal detail that normal code should not mess with
When __name__ has the special value __main__
It means we have case (1), we are supposed to run main()
This is why there's that crazy if-statement at the bottom of the file!

Typical last 2 lines of .py file:

if __name__ == '__main__':
    main()

Experiment: Put these lines at end of wordcount.py. Then try running wordcount from command line, and loading it in interpreter.

if __name__ == '__main__':
    print("I feel like running main()")
    main()
else:
    print("Not running main()")

Design Idea: Easy Things Should be Easy

Design idea: doing the ordinary thing should be easy, not require thought. Doing something hard should be possible, but may require work.

This __name__ business seems a weak point in Python's design. It is not great that every vanilla python program has to carry around these 2 obscure looking lines. There should be a less obscure way of getting the default behavior that 99% of python programs want.

Lambda - Power Feature

Lambda - Code as a Parameter

Map With Type-Change

Lambda vs. Def

Def Features

Def vs. Lambda

map/def Example - map_parens()

Lambda-2 Examples (Optional)

xy_sum()

xs()

min_x()

Custom Sort - Power Feature

Python Custom Sort - Food Examples

Sort By Tastiness

Project Out Sort-By Values

Project Out With Lambda

Custom Sort Lambda - Plan

Food Sort Examples

Sort By Tasty

Most Tasty (reverse=True)

Most Healthy

Most tasty * healthy

Sorted vs. Min Max

Python Custom Sort String Examples

String Sort Lambda

Put It All Together - WordCount + Sorted

Sorted vs. Dict Count Items

Wordcount - Top-Count - Lambda

print_counts() - Alphabetic Output

print_counts() Solution

print_top()

print_top() Solution

Modules and Modern Coding

Standard Modules - import math

Hacker: Use dir() and help() (optional)

Module Recap

wordcount.py Is a Module

dir() and help() work on wordcount Too.

Module Summary

Optional: that __main__ thing

Two Ways of wordcount.py

How Does Python Know To Run main()?

Design Idea: Easy Things Should be Easy

Optional: that main thing