CS193Q - Day 3

> Nick's Python Guide - maybe open this in a new tab so you can get to its chapters as we go

Dict Type

d = {}  # empty dict
d['a'] = 'alpha'   # set
d['b'] = 'beta'
d['a']   -> 'alpha'  # retrieve
'a' in d -> True     # "in" key check
>>> d = {}
>>> d['a'] = 'apple'
>>> d['g'] = 'grape'
>>> d['d'] = 'donut'
>>> 
>>> d  # literal syntax - r/w
{'a': 'apple', 'g': 'grape', 'd': 'donut'}
>>>
>>> d['a']
'apple'
>>>
>>> 'a' in d  # "in" efficient
True
>>> 'x' in d
False
>>> not 'x' in d  # no: not this way
True
>>> 'x' not in d  # yes: "not in" form preferred
True
>>>
>>> d.keys()
dict_keys(['a', 'g', 'd'])
>>> 
>>> # .keys() not a list, but works in loop
>>> for key in d.keys():
...   print(key, d[key])
... 
a apple
g grape
d donut
>>> 
>>> # better: go through keys in sorted order
>>> for key in sorted(d.keys()):
...   print(key, d[key])
... 
a apple
d donut
g grape
>>> 
>>>
>>> d.keys()[0]   # Not a list
TypeError: 'dict_keys' object is not subscriptable
>>> 
>>> lst = list(d.keys())  # Make list of it
>>> lst[0]
'a'
>>>

Dict Count Algorithm - ip-count.py

counts = {}
for word in xxxxxxx:
    if word not in counts:
        counts[word] = 0
    counts[word] += 1

Example: ip-count.py - add ip count code

Open up in PyCharm and code it up.

1. Two functions TBD read_counts() print_counts

2. See Doctest - can use local file

3. How does data get from read_counts() to print_counts() - see main() This is the right way to do it. Note: doing it without global variables is the best.

Tuple Type

>>> t = (1, 2, 3)
>>> len(t)
3
>>> t[0]
1
>>> t[2]
3
>>> t[0] = 9
TypeError: 'tuple' object does not support item assignment
>>> 
>>> (a, b) = (3, 4)  # assign trick
>>> a
3
>>> b
4

Dict .items()

>>> d = {'a': 'apple', 'g': 'grape', 'd': 'donut'}
>>> 
>>> 
>>> d.items()
dict_items([('a', 'apple'), ('g', 'grape'), ('d', 'donut')])
>>>
>>> for key, value in d.items():
...   print(key, value)
... 
a apple
g grape
d donut

Topic: Nesting

Everything is pointers. So we can put a pointer to a list inside a dict. We can access that pointer later, editing the "nested" list, even while it is still in the dict.

1. One List and One Dict

Here is code that creates one list and one dict, each with a variable pointing to it.

>>> lst = [1, 2, 3]
>>> d = {}
>>> d['a'] = 1

Memory looks like:
alt: one lst points to list, d points to dict

Store Reference To List inside Dict

>>> d['b'] = lst

What does this do? Key: the = does not make a copy of the list. Instead, it stores an additional reference to the one list inside the dict.

Memory looks like:
alt: reference to list stored inside dict

d['b'].append(4) - What Happens?

There is just one list, and there are two references to it. This is fine. What does the following code do?

>>> d['b'].append(4)

The d['b'] is a reference to the [1, 2, 3] list, so the .append() adds 4 to it.

Memory then looks like:
alt: list is modified

What do these lines of code print now?

>>> lst
???
>>> d['b']
???

Answer

Both lst and d['b'] are references to the one underlying list, which is now [1, 2, 3, 4]

3. Use "nums" Variable

Use = to store another reference to list in a "nums" variable. Does this make a copy of the list? No. It's just another reference to the one list. Adding in the "nums" variable makes this complex phrase more readable. What happens when we do nums.append(99)?

>>> nums = d['b']
>>> nums.append(99)
>>> nums
[1, 2, 3, 4, 99]
>>> d['b']
[1, 2, 3, 4, 99]
>>>


alt: nums also points to the list

Summary - Pointers Proliferate

Python does not copy a list or dict when used with, say, =. Instead, Python just spreads around more pointers to the one list. This is a normal way for Python programs to work - a few important lists or dicts, and pointers to those structures spread around in the code. This does not require any action on your part, just have the right picture in mind.

Other computer languages have varied rules, where sometimes there is a copy and sometimes not, and the programmer has to keep this in mind. Python is simple - no copy.


1. "Standard" Modules — Fine

Many Standard Modules

Module - Import and Use

Top of file import math

In code, use math.xxx to refer to thing in module. e.g. sys.argv in previous example of getting list of command line arguments from inside the sys module.

>>> import math
>>> math.sqrt(9)
3.0
>>>

Module = Dependency

Non-Standard "pip" Modules — Depends

Other modules are valuable but they are not a standard part of Python. For code using non-standard module to work, the module must be installed on that computer via the "pip" Python tool. e.g. for homeworks we had you pip-install the "Pillow" module with this command:

$ python3 -m pip install Pillow
..prints stuff...
Successfully installed Pillow-5.4.1

A non-standard module can be great, although the risk is harder to measure. The history thus far is that popular modules continue to be maintained. Sometimes the maintenance is picked up by a different group than the original module author. A little used module is more risky.

Note: Pip Installed Alongside Python

When you upgrade Python, you will lose the pip installed modules which are back in your previous Python install directories. You need to install them again - not hard actually. There's a pattern where you store a list of all the modules in a file "requirements.txt" and then "pip" can read this file and install them all in one step.

Aside: Module vs. Supply Chain Attack

When you install a module on your machine from somewhere - you are trusting that code to run on your machine. In very rare cases, bad guys have tampered with modules to include malware in the module, which then runs on your machine, steal data, install malware, etc. A so called "supply chain attack"

Installing code from python.org is very safe, and also very well known modules like Pillow and matplotlib are safe, benefiting from large, active base of users.

Several supply chain attacks have been made on lesser known modules, from lesser known code sources, in particular the code source pypi.org

Be more careful if installing a little used module.


Module Docs


Hacker: Use dir() and help() (optional)

>>> import math
>>> dir(math)
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']
>>>
>>> help(math.sqrt)
Help on built-in function sqrt in module math:

sqrt(x, /)
    Return the square root of x.
>>>
>>> help(math.cos)
Help on built-in function cos in module math:

cos(x, /)
    Return the cosine of x (measured in radians).

How to Create Your Own Module?

You already have! A regular old foo.py file is a module.

ipcount.py Is a Module

How hard is it to write a module? Not hard at all. A regular Python file we have written works as a module too with whatever defs the foo.py file has.

Consider the file ipcount.py

Forms a module named ipcount

>>> # Run interpreter in ipcount directory
>>> import ipcount
>>>
>>> ipcount.read_counts('small-ips.txt')
{...

Note: if filename has a dash in it like 'ipcount-solution.py', import this way:

import importlib  
ipcount = importlib.import_module("ipcount-solution")

(optional) Module Example: urllib

Do a quick demo here - just show the power of modules

How Does The Web Work?

HTML

Here is the HTML code for is plain text with a bolded word in it, tags like <b> mark up the text.

This <b>bolded</b> text
HTML Experiment - View Source

Go to python.org. Try view-source command on this page (right click on page). Search for a word in the page text, such as "whether" .. to find that text in the HTML code.

Thing of how many web pages you have looked at - this is the code behind those pages. It's a text format! Lines of unicode chars!

Web Page - HTML - Python

Every web page you've ever seen is defined by this HTML text behind the scenes. Hmm. Python is good at working with text.

urllib Demo

(See copy of these lines below suitable for copy/paste yourself.)

>>> import urllib.request
>>> f = urllib.request.urlopen('http://www.python.org/')
>>> text = f.read().decode('utf-8')
>>> text.find('Whether')
26997
>>> text[26997:27100]
"Whether you're new to programming or an experienced developer, it's easy to learn and use Python"

Here is the above Python lines, suitable for copy paste:

import urllib.request
f = urllib.request.urlopen('http://www.python.org/')
text = f.read().decode('utf-8')

Data From the Web vs. Files


Comprehensions

Super handy way to compute a new list from a list. Best for short coputations 1. Write outer [ ] 2. Write "for elem in lst" inside 3. Write expr on the left that you want to compute each elem in the new list 4. Write "if xxx" at the right side, to trim results if wanted

>>> lst = [1, 2, 3, 4]
>>> 
>>> [n * n  for n in lst ]
[1, 4, 9, 16]
>>> 
>>> [str(n) + '!'  for n in lst ] # type change
['1!', '2!', '3!', '4!']
>>> 
>>> 
>>> [str(n) + '!'  for n in lst if n >= 2]
['2!', '3!', '4!']

End of Class

Python is a big language, but strings, lists, dicts, functions, tests, modules, and files are pretty central for everything.

Homework: see our home page, due end of week 6.