Homework 8 - Flights

This is the last homework, pulling it all together. This homework is relatively small. Due Wed Dec 4th at 11:55pm as usual.

The first part of this project is standard Python code, slicing up and organizing data. The second part uses Jupyter to display the data and produce a notebook.

Urban myth: if you look at the flight take-off data for a city, there is a characteristic little spike in passenger traffic each morning. Every city is different - for a more high-stress city, the peak is earlier. For a more mellow city, the peak is later.

Could this be true? How hard would it be to hack up something in Python to check it out?

Starter project: flights.zip

Flights Data

The takeoff data looks like the following text, the data on each line separated by commas. Each line is one flight out of a city. The first column is the originating city. The second column is the "time", int number of seconds into the day local time when the flight departed. The third is the int count of passengers on that flight. The lines are in increasing order by time, and only one flight leaves a city per time.

Denver den,3,250
Los Angeles lax,9,242
Atlanta atl,10,294
Portland pdx,12,245
Atlanta atl,12,233
Portland pdx,15,237
...

To work with this data, organize it as a "flights" dict with a key for each city, and the value is a list of (time, passengers) tuples of the flights from that city in increasing order by time.

{
 'Denver den': [(3, 250), (22, 220), ...],
 'Atlanta atl': [(10, 294), ...
 ...
}

Part-1: parse_flights(text)

This part is familiar PyCharm / Python coding. In the flightlib.py file, complete the the parse_flights(text) function, which takes in text lines and returns a flights dict as above.

Use the string function text.splitlines() which given a string, splits it into a list of lines to loop over.

Python reminder: you can form a tuple value in your code with parenthesis, just as you form a list with [ ] or a dict with { }:

pair = (6, 7)

One simple Doctest is provided. Syntax quirk: within Doctests, the newline character must be written with two-backslashes (\\n) as shown in the provided test. Write at least 2 additional Doctests, each test with at least 2 cities and at least 3 flights.

The provided main() calls your parse_flights() from the command line as another way to run your code.

The only deliverable for Part-1 is that your parse_flights() is done and tested to work correctly. Make sure your function is correct before moving on to the next part. It's a lot easier to test and perfect your function in PyCharm where you have the Doctests working for you.

Part-2 - Jupyter

Make sure jupyter and matplotlib are installed as in lecture. Not a problem if it mentions that "pip" could be upgraded, you can ignore that (on windows the commnad below uses "python" instead of "python3".)

$ python3 -m pip install jupyter
$ python3 -m pip install matplotlib

Run Jupyter

From a terminal in your flights folder, run jupyter notebook, which should open a Jupyter web page.

$ jupyter notebook

1. In Jupyter, click the New button towards the upper right, and create a new Python 3 notebook. Click the "Untitled" towards the upper left, and change the name to "flights". In the first cell of Jupyter, run these lines (shift-return to run):

%matplotlib inline
import flightlib

The %matplotlib phrase avoids the problem that graphs do not always show up. The import just brings in your python code to call.

2. Web data. Here are the 4 lines from the lecture example to display the contents of the file at http://web.stanford.edu/class/cs106a/hello.txt

import urllib.request
f = urllib.request.urlopen('http://web.stanford.edu/class/cs106a/hello.txt')
text = f.read().decode('utf-8')
len(text)

The flights data is at the following url, so change the code to load that text.

http://web.stanford.edu/class/cs106a/flights.txt

Get the urllib calls working in your notebook so it downloads the flights data text.

3. Call your flightlib code to parse the data.

flights = flightlib.parse_flights(text)
len(flights)

The len(xxx) just prints a number, providing a little bit of confirming output to signal that it worked. The length in this case is 10, since there are 10 cities.

Graphing

Now use Jupyter's strength in graphing. Write code to produce a series of graphs.

Using the default graph appearance is fine, or you can play around with the countless appearance options (Matplotlib docs). Here is a sample graph using the defaults, except setting the size to (10, 3) as below:

Here is a reminder of the 5 lines of matplotlib code to produce a graph:

import matplotlib.pyplot as plt

# for each graph
plt.figure(figsize=(10,3))  # optional "inch" width,height
plt.plot(list-of-x-values, list-of-y-values)
plt.title(title-str)
plt.show()

Once you've got the 10 graphs working, you're all done. You should be able to look at all the graphs to estimate which city appears to be the most mellow according to this urban myth.

This project bring techniques together, using python testing/dict/loops to wrangle realistic data, and then working on that data in Jupyter for tweaking and graphing.

When your graphs look good, use File > Save and Checkpoint to save the .ipynb file in its current state. Then use File > Close and Halt to get out of the notebook. Back at the Jupyter file list, there's a Quit button at the top and you can close the tab. Please turn in the 2 files on Paperless: flightlib.py and flights.ipynb.