L27

Today: loose ends - comprehension-if, truthy logic, modules, float flaws

Comprehensions - Recall 1-2-3

Recall - comprehension re-uses syntax of other constructs
1. type in a pair of outer brackets [ ]
2. inside write a foreach "for n in nums" - choose var name "n" or "s" ..
e.g. typical Python: choose var name to keep your ideas straight
3. then the result expression "n * n" goes on the left

>>> nums = [1, 2, 3, 4, 5, 6]
>>> [n * n for n in nums]
[1, 4, 9, 16, 25, 36]

Comprehension + If

Can add "if" filter on the right hand side
add at right: if n > 3
Mnemonic: re-use syntax again
Left hand side can be just "n" to pass value through unchanged

>>> nums = [1, 2, 3, 4, 5, 6]
>>> [n for n in nums if n > 3]
[4, 5, 6]
>>> [n * n for n in nums if n > 3]
[16, 25, 36]

Example/Exercises Comprehensions

These are all 1-liner solutions with comprehensions.

Syntax reminder - e.g. make a list of nums doubled where n > 3

[2 * n for n in nums if n > 3]

Section on server: Comprehensions

> up_only (has if)

> even_ten (has if)

Comprehensions Replace map()

Comprehensions are easier to write than map(), so you can use them instead. Why did we learn map() then? Because map() is the ideal way to see how lambda works. At this point, you can use comprehensions instead of map, (and exam problems will give full credit to either form, your choice).

Comprehension Fever - 1 Line Is Ideal

Programmers can get into Comprehension Fever - trying to write your whole program as nested comprehensions. Probably 1-line is the sweet spot.

Using regular functions, loops, variables etc. for longer phrases is fine.

Pre-Truthy Example

Say we want to print a string if it is non-empty, this code works fine and you can write it this way, but there is a shorter way to do it shown below.

if s != '':
    print(s)

Truthy True/False

The if and while are a little more flexible than we have shown thus far. They use the "truthy" system to distinguish True/False.

You never need to use this in CS106A, just mentioning it in case you see it in the future.

For more detail see "truthy" section in the if-chapter Python If - Truthy

Truthy `False`

Truthy logic says that "empty" values count as False. The following values, such the empty-string and the number 0 all count as False in an if-test:

# Count as False:
''
0
0.0
None
[]
{}

Truthy `True`

Any other value counts as True. Anything that is not one of the above False values:

# Count as True:
6
3.14
'Hello'
[1, 2]
{1: 'b'}

How To Use Truthy

With truthy-logic, you can use a string or list or whatever as an if-test directly. This makes it easy to test, for example, for an empty string like the following. Testing for "empty" data is such a common case, truthy logic is a shorthand for it. For CS106A, you don't ever need to use this shorthand, but it's there if you want to use it. Also, many other computer languages also use this truthy system, so we don't want you to be too surprised when you see it.

# pre-truthy way:
if s != '':
    print(s)


# truthy equivalent:
if s:
    print(s)

Truthy Example - nums

We have a nums list of numbers, print all the non-zero numbers, one per line..

nums = [1, 17, 0, 13, 0]

# pre-truthy
for n in nums:
    if n != 0:
        print(n)


# truthy equivalent:
for n in nums:
    if n:
        print(n)

Easily skipping over empty-string or 0 or None .. common use of truthy-logic.

Just an optional shortcut, not something you need to use. You will see it in other computer languages as well.

(optional) Truthy Example/Exercise

> no_zero

> not_empty

Explain Glossed Over Lines - see top of
wordcount.py / pylibs.py

There are lines of Python code we have glossed over. Piece by piece, we will fill these in.

Look at Glossed Over Lines 3x

e.g. in file pylibs.py

1. #! thing at the top

2. import sys near the top - a whole topic

3. 'if __name__.. at the bottom

#!/usr/bin/env python3

"""
Stanford CS106A Pylibs Example
Nick Parlante
"""

import sys
import random

def read_terms(filename):
...
... lots of Python code ...
...

if __name__ == '__main__':
    main()

1. `#!/usr/bin/env python3`

#!/usr/bin/env python3
This should be the very first line of your python file
This indicates that it's a python-3 file
Older, spec that it's a Python-2 file
#!/usr/bin/python
Not a requirement, but a good practice
This is an ancient Unix syntax for specifying the type of a file
Unix is ancient and influential
Most modern Operating Systems follow Unix conventions: Mac OS, Linux, iOS, Android
Windows is the exception
But windows software can still use that line

Python-2 vs. Python-3

There are not huge differences between Python version-2 and version-3. You could easily write Python-2 code if you needed to, but Python-3 is strongly preferred for all new work. That said, many orgs may have old python-2 programs laying around, and it's easiest if they just use them and don't update or edit them. The first line #!/usr/bin/env python3 is a de-facto way of marking which version the file is for.

2. End With Boilerplate If-Statement

You do not need to remember all those details. Just remember this: have that if-statement at the bottom of your file as a couple boilerplate lines. It calls the main() function when this file is run on the command line.

#!/usr/bin/env python3

...

if __name__ == '__main__':
    main()

So if we run like this..

$ python3 pylibs.py

Python will load the pylibs.py file, and then call its main() function. That's what the if-statement does. It's a historical quirk that Python does not simply call main() automatically, but it doesn't, so we have this if-statement at the bottom of the file.

Modules - `import sys`

What about the import lines..

#!/usr/bin/env python3


import sys
import random

Import a module named "sys"
Later lines can call the functions defined in sys
Let's talk about modules...

Module/Library -Modern Coding

Modules hold code for common problems, ready for your code to use. Also commonly known as "libraries" of code. We say that you build your code "on top of" the module. It is very common with modern coding that part of your coding is custom, and part is building on top of module code.

alt:your code built on top of modules like sys

Great Deal - ♥ Modules

Question while coding:
Is part of this solved in a module already?
Using module code you didn't have to write is very attractive
This is kind of a no-brainer case to make!
Somebody else wrote it, you can just use it
It's well tested
It has real documentation - aka docs
Your teammates may already be familiar with it
CS106A we see this "module" theme a little
CS106A needs to cover fundamentals
loops, strs, dicts, files ..
Courses beyond CS106A, likely use modules more

Module = Name + Code + Docs

A module has a name, e.g. "math"
A module contains functions, solving common problems
A module also has documentation "docs" explaining the use of its functions
e.g. "math" module contains math functions
e.g. "random" module contains functions for pseudo-random numbers
e.g. "urllib" module contains functions for urls and web requests

Step 1: `import math`

To use a module, include a import math line
Import the module by its name
Typically these are grouped near the top of your file

Step 2: `math.sqrt(2)`

On later lines, refer to functions in the module with a dot
e.g. math.sqrt(2)
e.g. random.randrange(10)
Readable: in this way, it's clear when calling a function..
that it is coming from that module
There are other ways of doing import
This form is simplest: import math .. math.sqrt()

>>> import math
>>> math.sqrt(2)  # call sqrt() fn
1.4142135623730951
>>> math.sqrt

>>> 
>>> math.log(10)
2.302585092994046
>>> math.pi       # constants in module too
3.141592653589793

Quit and restart the interpreter without the import, see common error:

>>> # quit and restart interpreter
>>> math.sqrt(2)  # OOPS forgot the import
Traceback (most recent call last):
NameError: name 'math' is not defined
>>>
>>> import math
>>> math.sqrt(2)  # now it works
1.4142135623730951

Module = Dependency

When you write code using a module
Your code now depends on that module's existence
If that module disappeared, your code would stop working

1. "Standard" Modules — Fine

Standard = included/maintained as part of Python3 install
These are the best modules to use
Can rely on this module now and in the future
Very rare for a module to be dropped
No separate install is required
The standard module is installed when python is installed

Many Standard Modules

Do not: memorize whole list of modules
Do: check the list for help when starting a project
Standard Python Modules List
A few examples...
csv module to read CSV files, such as produced by Excel
datetime module of calendar functions
zipfile module for reading/creating .zip files
urllib module for making web http requests from Python

2. Non-Standard "pip" Modules — Depends

Other modules are valuable but they are not a standard part of Python. For code using non-standard module to work, the module must be installed on that computer via the "pip" Python tool. e.g. for homeworks we had you pip-install the "Pillow" module with this command:

$ python3 -m pip install Pillow
..prints stuff...
Successfully installed Pillow-5.4.1

A non-standard module can be great, although the risk is harder to measure. The history thus far is that popular modules continue to be maintained. Sometimes the maintenance is picked up by a different group than the original module author. A little used module is more risky.

Aside: Module vs. Supply Chain Attack

When you install a module on your machine from somewhere - you are trusting that code to run on your machine. In very rare cases, bad guys have tampered with modules to include malware in the module, which then runs on your machine, steal data, install malware, etc. A so called "supply chain attack"

Installing code from python.org is very safe, and also very well known modules like Pillow and matplotlib are safe, benefiting from large, active base of users.

Several supply chain attacks have been made on lesser known modules, from lesser known code sources, in particular the code source pypi.org

Be more careful if installing a little used module.

Module Docs

Every module has formal "documentation" - "docs"
Explain what its functions do
The "abstraction" of each function
What it does
How to call it
Demo web search: "python math module"
python.org - the official home of python docs
Official python.org math docs
Watch out for SEO
Search Engine Optimization
Some possibly lame site tries to get a better search ranking

Hacker: Use dir() and help() (optional)

Feel like a hacker, use dir() and help() on module
In the interpreter >>>
dir(module) - shows a list of all the defs in the module
help(module.fn) - shows some help text for that function
The """Pydoc""" we write to describe each function
That Pydoc is what help() returns (demo later)
Names starting with double-underbar = internal thing you probably do not need

>>> import math
>>> dir(math)
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']
>>>
>>> help(math.sqrt)
Help on built-in function sqrt in module math:

sqrt(x, /)
    Return the square root of x.
>>>
>>> help(math.cos)
Help on built-in function cos in module math:

cos(x, /)
    Return the cosine of x (measured in radians).

How to Create Your Own Module?

You already have! A regular old foo.py file is a module.

wordcount.py Is a Module

How hard is it to write a module? Not hard at all. A regular Python file we have written works as a module too with whatever defs the foo.py file has.

alt: wordcount.py is a module named wordcount

Consider the file wordcount.py in wordcount.zip

Forms a module named wordcount

Suppose you have built some useful functions
Someone else in your lab wants to use them....
Them pasting in their own copy is not ideal
What does a module contain?
We have wordcount.py
python3 wordcount.py - runs main()
wordcount.py is also a module named just "wordcount"
Think of all the defs in wordcount: read_counts(), clean(), print_counts(),
import works on wordcount (in the same directory)
Access functions as module.xxx just like usual
Run python interpreter in wordcount directory to try this
Try importing wordcount, calling the read_counts() function
Call wordcount.clean()

Try this demo in the wordcount directory. The file wordcount.py has the module name wordcount

>>> # Run interpreter in wordcount directory
>>> import wordcount
>>>
>>> wordcount.read_counts('test1.txt')
{'a': 2, 'b': 2}

A module/file contains many defs
Can import a module/file, call its defs:
module.fn_name()
Style: for a function to be usable from another module...
it should take in data as parameters and return a value
i.e. black box style
we've done this all along, see now the bigger picture
Babygraphics project:
treats babynames.py as a module
import babynames
calls babynames.read_files()

dir() and help() work on wordcount Too

Look at wordcount.py, look at the functions
dir() and help() work here too
See where the """Pydoc""" goes!

>>> dir(wordcount)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'clean', 'main', 'print_counts', 'print_top', 'read_counts', 'sys']
>>> 
>>> help(wordcount.read_counts)

read_counts(filename)
    Given filename, reads its text, splits it into words.
    Returns a "counts" dict where each word
    ...

How babygraphics.py Used babynames.py

# 1. In the babygraphics.py file
# import the babynames.py file in same directory
import babynames

...

    # 2. Call the read_files() function                                                                  
    names = babynames.read_files(FILENAMES)

Module Example: urllib

Do a quick demo here - just show the power of modules

How Does The Web Work?

The web browser app has url ("client" side)
Web server is running on some machine ("server" side)
Browser sends GET request to server
Server gets request, sends back HTML response data
HTML is a text code
Browser "renders" the HTML on screen
Demo:
Visit python.org or sfgate.com or whatever
Right-click on page, "View Source" to see the HTML text that makes a web page
Think of all the surfing you have done .. HTML code defines each page

HTML

Here is the HTML code for is plain text with a bolded word in it, tags like <b> mark up the text.

This <b>bolded</b> text

HTML Experiment - View Source

Go to python.org. Try view-source command on this page (right click on page). Search for a word in the page text, such as "whether" .. to find that text in the HTML code.

Thing of how many web pages you have looked at - this is the code behind those pages. It's a text format! Lines of unicode chars!

Web Page - HTML - Python

Every web page you've ever seen is defined by this HTML text behind the scenes. Hmm. Python is good at working with text.

urllib Demo

Python code to request an HTML page by url
urllib - making requests to a server, getting back data
An example of using a standard module
docs - urllib.request docs on python.org
Makes a URL look like a local file mostly
Read the text of the web page
Use s.find() to display a fragment of it

(See copy of these lines below suitable for copy/paste yourself.)

>>> import urllib.request
>>> f = urllib.request.urlopen('http://www.python.org/')
>>> text = f.read().decode('utf-8')
>>> text.find('Whether')
26997
>>> text[26997:27100]
"Whether you're new to programming or an experienced developer, it's easy to learn and use Python"

Here is the above Python lines, suitable for copy paste:

import urllib.request
f = urllib.request.urlopen('http://www.python.org/')
text = f.read().decode('utf-8')

f.read() - reads all the bytes in one call
decode('utf-8') - decode raw bytes, returning unicode string
Does not always work, they may be blocking python
Also some web sites convey their HTML in a different way
Can try http: or https:

Data From the Web vs. Files

Very often data is drawn from file, into your program
New picture: data is drawn from a URL
urllib packages up a few functions for this
Handling the details
A nice example of a module

Two Math Systems, "int" and "float" (Recall)

Two Systems
int and float are two different worlds
"float" .. floating decimal point, moves around
Float and int - each have their own area on the chip
Look similar, but distinct
6 - the int six
6.0 - the float six

# int
3  100  -2

# float, has a "."
3.14  -26.2  6.022e23

Math Works, but Clickbait:
But float Has This One Crazy Flaw

Math works: + - * / min() max() for both int and float fine:
i.e. mostly don't have to think about it
Need to use int for indexing - [ ], grid.get(x, y)
Foreshadow:
Float mostly works easily
BUT Float has one crazy flaw .. revealed below

Float - One Crazy Flaw - Do Not Panic

Note: do not panic! We can work with this. But it is shocking.
Float arithmetic is a little imprecise
Off at the 15th digit .. there are erroneous "garbage" digits
1. Idea of 1/10th, mathematically pure
2. In Python code: looks like this 0.1
3. In the computer memory, actually: 0.100000000000076
There are some garbage digits way off to the right
The Math Will Not Come Out Exactly Right
This is a deep feature of float numbers in the computer, applies to all languages
The print routine hides a few digits, so often the garbage is hidden
But in the computation, the garbage is there

Crazy Flaw Demo - Adding 1/10th

Garbage digits are almost always part of a float value
Printing omits a few stored digits at right
So often do not see the garbage
But eventually the garbage gets big enough to print...

>>> 0.1
0.1
>>> 0.1 + 0.1
0.2
>>> 0.1 + 0.1 + 0.1    # this is why we can't have nice things
0.30000000000000004
>>> 
>>> 0.1 + 0.1 + 0.1 + 0.1
0.4
>>> 0.1 + 0.1 + 0.1 + 0.1 + 0.1
0.5
>>> 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
0.6
>>> 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
0.7
>>> 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
0.7999999999999999     # here the garbage is negative

Another example with 3.14

>>> 3.14 * 3
9.42
>>> 3.14 * 4
12.56
>>> 3.14 * 5
15.700000000000001   # d'oh

Conclusion: float math is slightly wrong

Why Must We Have This Garbage?

The short answer, is that with a fixed number of bytes to store a floating point number in memory, there are some unavoidable problems where numbers have these garbage digits on the far right. It is similar to the impossibility of writing number 1/3 precisely as a decimal number — 0.3333 is close, but falls a little short.

Why? Why Must There Be This Garbage?

We think in base 10
So 0.1 and 2.5 come out "even"
But 1/3 does not come out even
Try to write 1/3 out as a decimal number
Say stop at 10 digits: 0.3333333333
This number differs from 1/3 by a tiny "error" amount
Some fractions come out even in base 10, and some don't
The computer uses base 2 internally - 0's and 1's
In base 2, a different set of numbers don't come out even
The garbage digits off to the right are due to the tiny error

Crazy, But Not Actually A Problem

Everyone needs to remember:
float arithmetic always comes out a tiny bit wrong
(int arithmetic, comes out perfect)
The error is typically far less than 1-trillionth part
But the error is not zero
Most computations can handle an error of 1-trillionth part
Actually not a problem
How many digits of accuracy in the inputs, 6 digits?

Must Avoid One Thing: no ==

There is one concrete coding rule
Do not use == with float
Exception: 0.0 is reliable for ==
Any float value * 0.0 will be exactly 0.0

>>> a = 3.14 * 5
>>> b = 3.14 * 6 - 3.14
>>> a == b   # Observe == not working right
False
>>> b
15.7
>>> a
15.700000000000001

How To Compare Floats

Compare float values
Instead of ==, look at abs(a - b)
abs(x) - the absolute value function
Check if absolute value of difference is very small
There is also a builtin function math.isclose() that does this

>>> abs(a - b) < 0.00001
True
>>>
>>> import math
>>> math.isclose(a, b)
True

int Arithmetic is Exact

int arithmetic does not have the error problem of float
int results are exactly correct and repeatable
Except overflow - many languages, have a maximum possible int
Int arithmetic that goes over the max will get the wrong answer - aka "overflow"
Uniquely, Python does not have a max int
Bank balance example:
Bank balances can be stored as int number of pennies
That way balance add/subtract comes out exactly right

>>> # Int arithmetic is exact
>>> a = 6
>>> b = 24
>>> 
>>> a * 5
30
>>> a * 5 - 6
24
>>> a * 5 - 6 == b
True

int Bitcoin

Bitcoin wallets use this int strategy - the amount of bitcoin in a wallet is measured in "satoshis". One satoshi is one 100-millionth of 1 bitcoin. Each balance is tracked as an int number of satoshis, e.g. an account with 0.25 Bitcoins actually has 25,000,000 satoshis. Using ints in this way, the addition and subtraction to move bitcoin (satoshis) from one account to another comes out exactly correct. int is precise!