Today: modules, urllib, Jupyter notebooks

Python Niche: Productive Programmers

Why is Python Coding So Productive?

Or indeed, why are most modern languages so much more productive than 20 years ago?

1. Language Features

2. Garbage Collection

3. Libraries Of Code To Use

Organized libraries (modules) of code for common problems. Organized, documented, and waiting for your code to use. We say that you build your code "on top of" the libraries.

alt:your code built on top of modules like sys

"Standard" Modules

Many Standard Modules

We ♥ Modules

Non-Standard "pip" Modules

Other modules are valuable but not a standard part of Python. For code using non-standard module to work, the module must be installed on that computer via the "pip" tool. e.g. "Pillow" we have installed to manipulate images with this command:

$ python3 -m pip install Pillow
..prints stuff...
Successfully installed Pillow-5.4.1

Module Docs

How Does The Web Work?

urllib 1

>>> import urllib.request
>>> f = urllib.request.urlopen('http://www.python.org/')
>>> text = f.read().decode('utf-8')
>>> import urllib.request
>>>
>>> f = urllib.request.urlopen('http://www.python.org/')
>>> text = f.read().decode('utf-8')
>>> # text is the HTML
>>> # use text.find('xxx') to look for something, show that slice
>>> # like text[5000:5200]
>>>
>>> f = urllib.request.urlopen('https://sfgate.com/')
>>> text = f.read().decode('utf-8')

Web hello.txt File Example

>>> import urllib.request
>>> f = urllib.request.urlopen('http://web.stanford.edu/class/cs106a/hello.txt')
>>> text = f.read().decode('utf-8')
>>> text
'Hello from CS106A url!\nWhat if there were data here?\n12,34\n25,19\n66,0\n1,2\n'
>>> lines = text.splitlines()
>>> lines
['Hello from CS106AP url!', 'What if there were data here?', '12,34', '25,19', '66,0', '1,2']
>>> lines = lines[2:]  # one way to get rid of first 2 lines
>>> lines
['12,34', '25,19', '66,0', '1,2']

Data From the Web vs. Files


Traffic Example

traffic.zip Jupyter example

Have text data about seconds spent in road traffic across many days and hours. Typical giant data set.

day_of_year,hour_of_day,time_in_secs
01-01-18,0,2549
01-01-18,1,2751
01-01-18,2,2248
01-01-18,3,2440
01-01-18,4,2666
01-01-18,5,2084
01-01-18,6,2302
01-01-18,7,3410
01-01-18,8,3229
01-01-18,9,2367
01-01-18,10,2217
01-01-18,11,2082
01-01-18,12,2055
01-01-18,13,2842
01-01-18,14,2206
01-01-18,15,2178
01-01-18,16,2974
01-01-18,17,2444
01-01-18,18,2965
01-01-18,19,2714
01-01-18,20,2783
01-01-18,21,1951
01-01-18,22,2234
01-01-18,23,2263
01-02-18,0,2311
01-02-18,1,2732
01-02-18,2,2165
01-02-18,3,2377
01-02-18,4,2836
01-02-18,5,2841
01-02-18,6,2719
01-02-18,7,3671
01-02-18,8,3417
01-02-18,9,2578
01-02-18,10,2301
01-02-18,11,2357
01-02-18,12,2165
01-02-18,13,1958
...

traffic.py

{0: 922633,
 1: 870481,
 2: 814756,
 3: 850357,
...
 22: 844618,
 23: 902191}

Jupyter Notebook

$ python3 -m pip install jupyter
$ python3 -m pip install matplotlib

Start Jupyter

Traffic Example Steps

Why Scientists Love Jupyter

With notebook form, you publish your analysis and output, along with the mechanism to create it - invites iteration and study. Big picture...

Commands in Jupyter

Functions in matplotlib

# standard import line .. 'plt' is idiomatic here
import matplotlib.pyplot as plt

# 1. plot 1-d list of values. plot() uses lists
plt.plot([5, 13, 2, 7])
plt.show()


# 2. Provide both x-values and y-values lists to plot() - a common pattern
# specify color, titling
plt.title('Some Words Here')
# plot() pattern: plot( [ x-values ], [ y-values] )
plt.plot([1, 2, 3, 4], [5, 13, 2, 7], color='red')
plt.show()