Today: modules, urllib, Jupyter notebooks

Productivity Boost: Modules Of Code To Use

Organized modules of code for common problems. Organized, documented, and waiting for your code to use. We say that you build your code "on top of" the libraries.

alt:your code built on top of modules like sys

We ♥ Modules

Module = Dependency

"Standard" Modules

1. Many Standard Modules

2. Non-Standard "pip" Modules

Other modules are valuable but they are not a standard part of Python. For code using non-standard module to work, the module must be installed on that computer via the "pip" Python tool. e.g. for homeworks we had you pip-install the "Pillow" module with this command:

$ python3 -m pip install Pillow
..prints stuff...
Successfully installed Pillow-5.4.1

A non-standard module can be great, although the risk is harder to measure. The history thus far is that popular modules continue to be maintained. Sometimes the maintenance is picked up by a different group than the original module author.

Module Docs

Recall: Import

How Does The Web Work?

alt: web client makes request to server, gets back HTML response, browser draws the HTML

HTML

HTML is plain text that looks like this, tags like <b> mark up the text.

This <b>bolded</b> text

HTML Experiment - View Source

Go to python.org. Try view-source command on this page (right click on page).

urllib 1

>>> import urllib.request
>>> f = urllib.request.urlopen('http://www.python.org/')
>>> text = f.read().decode('utf-8')
>>> import urllib.request
>>> f = urllib.request.urlopen('http://www.python.org/')
>>> text = f.read().decode('utf-8')
>>> # text is the HTML
>>> # use text.find('xxx') to look for something, show that slice
>>> # like text[5000:5200]
>>>
>>> f = urllib.request.urlopen('https://sfgate.com/')
>>> text = f.read().decode('utf-8')

# without >>>
import urllib.request
f = urllib.request.urlopen('http://www.python.org/')
text = f.read().decode('utf-8')

Web hello.txt Example / Exercise

import urllib.request
>>> f = urllib.request.urlopen('http://web.stanford.edu/class/cs106a/hello.txt')
>>> text = f.read().decode('utf-8')
>>> text
'Hello from CS106A url!\nWhat if there were data here?\n12,34\n25,19\n66,0\n1,2\n'
>>> lines = text.splitlines()
>>> lines
['Hello from CS106AP url!', 'What if there were data here?', '12,34', '25,19', '66,0', '1,2']
>>> lines = lines[2:]  # one way to get rid of first 2 lines
>>> lines
['12,34', '25,19', '66,0', '1,2']
>>> lines[0].split(',')
['12', '34']

Data From the Web vs. Files


Traffic Example - Jupyter

traffic.zip Jupyter example

This is a real data set about a particular area (thanks to Jason Lin for getting this data). It measures the total seconds spent in traffic grouped by hour of the day. There is data for every day over a calendar year. It looks like this:

Parsing: we are thinking "for line in f" and "line.split(',')" here

day_of_year,hour_of_day,time_in_secs
01-01-18,0,2549
01-01-18,1,2751
01-01-18,2,2248
01-01-18,3,2440
01-01-18,4,2666
01-01-18,5,2084
01-01-18,6,2302
01-01-18,7,3410
01-01-18,8,3229
01-01-18,9,2367
01-01-18,10,2217
01-01-18,11,2082
01-01-18,12,2055
01-01-18,13,2842
01-01-18,14,2206
01-01-18,15,2178
01-01-18,16,2974
01-01-18,17,2444
01-01-18,18,2965
01-01-18,19,2714
01-01-18,20,2783
01-01-18,21,1951
01-01-18,22,2234
01-01-18,23,2263
01-02-18,0,2311
01-02-18,1,2732
01-02-18,2,2165
01-02-18,3,2377
01-02-18,4,2836
01-02-18,5,2841
01-02-18,6,2719
01-02-18,7,3671
01-02-18,8,3417
01-02-18,9,2578
01-02-18,10,2301
01-02-18,11,2357
01-02-18,12,2165
01-02-18,13,1958
...

traffic.py

{0: 922633,
 1: 870481,
 2: 814756,
 3: 850357,
...
 22: 844618,
 23: 902191}

read_times()

def read_times(filename):
    """
    Read filename into a "times" dict.
    """
    with open(filename) as f:
        lines = f.readlines()
    lines.pop(0)  # first line is not data
    times = {}
    for hour in range(24):
        times[hour] = 0

    for line in lines:
        splits = line.split(',')
        hour = int(splits[1])
        time = int(splits[2])
        times[hour] += time
    return times

Jupyter Notebook

$ python3 -m pip install jupyter
$ python3 -m pip install matplotlib

Start Jupyter

Traffic Example Setup

Traffic Example - First Half

Here is the code from the traffic notebook. Looks best in the notebook, so you can see each cell and its output.

%matplotlib inline
# Use the traffic.read_times() function to read in "times" dict.
# Times dict has keys 0..23 for hours, values are wait-times in seconds
# for traffic at that hour over a whole year.
import traffic
times = traffic.read_times('commute-times.txt')
times

# Use traffic.div_times() to scale it down to seconds for one day.
times = traffic.div_times(times, 365)
times

Commands in Jupyter

The Universe of Matplotlib

Matplotlib is an extremely capable and popular Python module for producing visualizations of data. Install it with "pip" as above. Matplotlib is so popular, there is no chance it is going to stop being supported. Matplotlib has a dizzying number of features. We will just scratch the surface here, so you get a feel for what's there.

For this lecture example, we'll just use the few matplotlib features shown below. If you would like more information about matplotlib in the future, see matplotlib.org

Functions in matplotlib

# standard import line .. 'plt' is idiomatic here
import matplotlib.pyplot as plt

# 1. plot 1-d list of values. plot() uses lists
plt.plot([5, 13, 2, 7])
plt.show()


# 2. Provide both x-values and y-values lists to plot() - a common pattern
# specify color, titling
plt.title('Some Words Here')
# plot() pattern: plot( [ x-values-list ], [ y-values-list ] )
plt.plot([1, 2, 3, 4], [5, 13, 2, 7], color='red')
plt.show()

Have: times dict, want: y values list

Have "times" dict - look at it. How to make a list of the y values in order by hour? Just times.values() won't work, it's in random order.

{0: 2527.7616438356163,
 1: 2384.8794520547945,
 2: 2232.2082191780823,
 3: 2329.745205479452,
...
 20: 2421.4301369863015,
 21: 2316.26301369863,
 22: 2314.0219178082193,
 23: 2471.7561643835616}

Traffic Plot Code

...
# Plotting works on *lists* - make a len-24 "waits" list
# from the times dict.
waits = [ times[key]  for key in range(24)   ]
waits

# This is the canonical line to import matplotlib using the name "plt"
import matplotlib.pyplot as plt

# 1. Simplest: plot a 1-d list of values, no x-values specified
plt.plot(waits)
plt.show()

# 2. More typical, provide [ x-values ] and [ y-values ] lists
# Here we narrow the data to the hours 6..20.
# (Comprehension is not strictly needed here, but a common pattern)
plt.plot([i for i in range(4, 21)], waits[4:21])
plt.show()

Fancier Plotting

Plotting with more flair!

# 3. Like above, but set size, add labels, can play with line color and width
plt.figure(figsize=(10, 5))  # figsize unit is like 0.5 inch
plt.title('Commute times')
plt.xlabel('Hour of day')
plt.ylabel('Commute time in seconds')
plt.plot([i for i in range(6,21)], waits[6:21], color='purple', linewidth=9)
plt.show()