CS193Q - Day 2

> Nick's Python Guide - maybe open this in a new tab so you can get to its chapters as we go

Today: Install PythonCharm

Install PyCharm "community edition" (free) on your computer. https://www.jetbrains.com/pycharm/download/

PyCharm is an IDE. It's not required for Python, but it's handy.

cs193q-2.zip today's .zip of code

PyCharm: open the Folder containing the code - e.g. cs193q-2. Do not double click a .py file to open, does the wrong thing.

File Reading

with - text files by default, handles closing automatically
'r' for reading (the default)

'w' for file-writing

with open(filename, 'r') as f:
    # use f in here

More likely do it like this, as 'r' is the default anyway:
```
with open(filename) as f:
    # use f in here
```
"with" form automates the f.close()
Specify encoding (default depends on your machine / locale)

utf-8 is what many files use

with open(filename, encoding='utf-8') as f:

File loop - most common
Uses the least memory - one line at a time
```
for line in f:
  # process each line
```

Other forms:

s = f.read()  # read whole file as a string
lines = f.readlines()  # read whole file as list of line strs

str.split() examples - break line into parts
s.split(), s.split(':')
Show also join: ','.join(lst)

Nick demos with poem.txt

print()

In python 2.x print was not a function, did not have parenthesis like this
print() function
Converts elems to strings
Prints them separated by spaces
sep='xxx' separator instead of space
end='xxx' end char instead of '\n'
```
>>> print(1, 2, 'hi')
1 2 hi
```

cat.py Program - Exercise 1

"cat" traditional command to print out a file
Run it
```
python3 cat.py poem.txt
```
Complete the code in echo_file() to print out the file contents
Ignore the "censor" parameter for now

Python Doctests

See in cat.py
Doctest is a syntax for embedding a test in the comments next to the code
This is a fantastic feature
Can test your functions one at a time as you go

See the has_word() function

def has_word(line, word):
    """
    Returns True if word is in line, ignoring case differences.
    >>> has_word('aaa cat bbb', 'cat')
    True
    >>> has_word('aaa CAT bbb', 'cat')
    True
    >>> has_word('aaa cat bbb', 'Cat')
    True
    >>> has_word('aaa cat bbb', 'dog')
    False
    """

Right-click a test to run it (Pycharm)
Or can run from command line like this
```
python3 -m doctest -v foo.py
```
Exercise: write code for has_word (use string lower() and "in")
Exercise: Run the doctests
Extension: modify main() and echo_file() to do censoring with has_word

Dict Type

Powerful data structure - hash table
key/value pairs
Fast at looking up by key
pairs are "random" order (now it's insertion order)

Order is NOT end-user friendly .. will sort later

d = {}  # empty dict
d['a'] = 'alpha'   # set
d['b'] = 'beta'
d['a'] --> 'alpha' # retrieve

dict.keys() - list-like of keys
dict.values() - list-like of keys
Strategy: access data by key, .values() is just there for completeness
Typical: start empty, load up, then loop over .keys()
Standard way to dump out dict once loaded - keys in nice order
```
for key in sorted(d.keys()):
   # use key d[key[ in here
```

>>> d = {}
>>> d['a'] = 'apple'
>>> d['g'] = 'grape'
>>> d['d'] = 'donut'
>>> 
>>> d['a']
'apple'
>>> d.keys()
dict_keys(['a', 'g', 'd'])
>>> 
>>> for key in d.keys():
...   print(key, d[key])
... 
a apple
g grape
d donut
>>> 
>>> # better: go through keys in sorted order
>>> for key in sorted(d.keys()):
...   print(key, d[key])
... 
a apple
d donut
g grape
>>> 
>>> 'a' in d
True
>>> '' in d
False

Dict Count Algorithm - ip-count.py

Go through random data (ips.txt)
Dict can make the data coherent by key - key concept!

Count how many times each word appears

counts = {}
for word in xxxxxxx:
    if word not in counts:
        counts[word] = 0
    counts[word] += 1

Look at ip-count.py example, add ip count code

Tuple Type

```
(1, 2, 3)
```
Store 2 or 3 things together (vs list)
Immutable, no .append()
len() square bracket .. all work

Dict .items()

dict.items()
List of (key, value) tuples len-2
Way to dump out whole contents of dict
Use this with custom sorting later

Python No Copies - Shallow

Python does not by default ever make a copy. It's always pointers!

Make a list. Put it in a dict with =. There is just one list! Modify it inside the dict, you are modifying the one original list. This is the "no copies" strategy that runs through Python. It's fine! Your code can just work this way. Python: there is just the one list, and using =, we just send references to that one around.

>>> lst = ['aaa', 'bbb']
>>> d = {}
>>> d[1] = lst
>>> 
>>> lst
['aaa', 'bbb']
>>> d
{1: ['aaa', 'bbb']}
>>> d[2] = []
>>> 
>>> d
{1: ['aaa', 'bbb'], 2: []}
>>> 
>>> 
>>> d[1].append('ccc')
>>> 
>>> d
{1: ['aaa', 'bbb', 'ccc'], 2: []}
>>> 
>>> 
>>> lst
['aaa', 'bbb', 'ccc']. I

Note: both list and dict have a .copy() method if you need it, but generally you don't need this. I've written tons of production python code, and I never needed to use .copy().

Comprehensions

Super handy way to compute a new list from a list. Here are the steps

1. Write outer [ ]

2. Write "for elem in lst" inside

3. Write expr on the left that you want to compute each elem in the new list

4. Write "if xxx" at the right side, to trim results if wanted

>>> lst = [1, 2, 3, 4]
>>> 
>>> [n * n  for n in lst ]
[1, 4, 9, 16]
>>> 
>>> [str(n) + '!'  for n in lst ]
['1!', '2!', '3!', '4!']
>>> 
>>> 
>>> [str(n) + '!'  for n in lst if n >= 2]
['2!', '3!', '4!']

Map Lambda (optional)

A lambda is a little function, typically of one param - this section is to just show what lambda does
If you have not seen lambda before, just look at how it works below - it's just a little function
map runs a function over a list, gathering the results
Later we'll see a powerful way to use lambda
works best as a demo!

>>> lst = [2, 1, 3, 6]
>>> 
>>> 
>>> def double(n):
...   return n * 2
... 
>>> list(map(double, lst))   # map the def def
[4, 2, 6, 12]
>>> 
>>> list(map(lambda n: 2 * n, lst))  # use lambda!
[4, 2, 6, 12]
>>>
>>> list(map(lambda n: n + 1, lst))
[3, 2, 4, 7]
>>> 
>>> list(map(lambda n: n * n, lst))
[4, 1, 9, 36]
>>>
>>> list(map(lambda n: str(n) + '!!', lst))
['2!!', '1!!', '3!!', '6!!']

Custom Sort - Food Example

sort: For each element in list
Project a value to use for comparisons
Suppose I have food tuples, each
food = (name, tasty 1-10, healthy 1-10)
e.g. food[1] is how tasty it is

Sorted by default of tuples tuples: first [0], then [1], .. so 'apple' first

>>> foods = [('donut', 10, 1), ('apple', 7, 9), ('radish', 2, 8), ('broccoli', 6, 10)]
>>> 
>>> sorted(foods)
[('apple', 7, 9), ('broccoli', 6, 10), ('donut', 10, 1), ('radish', 2, 8)]
>>>

Want to sort by tasty value alt: circle tastiness for sorting

Use lambda to project out that value alt: project out tasty

Food Sort Examples

Say we want to sort by how tasty the foods are, how?
Project out the tasty int from the tuple
('donut', 10, 1) -> 10, sort by that
e.g. lambda food: food[1]
sorted(lst) - by default result in increasing order

sorted(lst, reverse=True) - reverse option, decreasing order

>>> # sort by tasty - project out the tasty int
>>> sorted(foods, key=lambda food: food[1])
[('radish', 2, 8), ('broccoli', 6, 10), ('apple', 7, 9), ('donut', 10, 1)]
>>> 
>>> sorted(foods, key=lambda food: food[1], reverse=True)  # by tasty, reverse=True
[('donut', 10, 1), ('apple', 7, 9), ('broccoli', 6, 10), ('radish', 2, 8)]
>>> 
>>> sorted(foods, key=lambda food: food[2], reverse=True)  # by healthy
[('broccoli', 6, 10), ('apple', 7, 9), ('radish', 2, 8), ('donut', 10, 1)]
>>> 
>>> sorted(foods, key=lambda food: food[1]*food[2], reverse=True) # by tasty*healthy
[('apple', 7, 9), ('broccoli', 6, 10), ('radish', 2, 8), ('donut', 10, 1)]
>>>

Custom Sort upper/lower

sorted() uses "<" by default

With strings, < places uppercase before lowercase

>>> sorted(strs)
['Banana', 'Donut', 'Zebra', 'apple', 'coffee']

reverse=True option
"Custom" sort = customize how < works here
e.g. treat uppercase/lowercase the same
How Python does this - lambda

Custom Sort String upper/lower

e.g. for these strs

['Banana', 'apple', 'Zebra', 'coffee', 'Donut']

Project out these to use in comparisons

['banana', 'apple', 'zebra', 'coffee', 'donut']

Do comparisons with the project list, but sort the upper list
"proxy" strategy - use this proxy value for comparison
Q: how to project out these proxy values?
A: lambda

Python Custom Sort Example

Strategy: for each elem, project out XXX value for comparisons
e.g. project out lowercase version of each str to ignore case

Sorted:
-Takes "key" parameter
-A lambda of 1 parameter, returns the proxy value to use

>>> sorted(strs, key=lambda s: s.lower())
['apple', 'Banana', 'coffee', 'Donut', 'Zebra']
>>>
>>> sorted(strs, key=lambda s: s[len(s)-1])   # by last char
['Banana', 'Zebra', 'apple', 'coffee', 'Donut']
>>>

Sorted vs. Dict Count Items

Application: organizing dict count data
Say we have a 'counts' style dict
Access .items(), list of (key, count) pairs

What we get from dict.items() when counting...

>>> items = [('z', 1), ('a', 3), ('e', 11), ('b', 3), ('c', 2)]
Q1: How to sort items in decreasing order by count?
Q2: How to access the pair with the largest count?
>>> items = [('z', 1), ('a', 3), ('e', 11), ('b', 3), ('c', 2)]
>>> 
>>> sorted(items, key=lambda pair: pair[1], reverse=True)
[('e', 11), ('a', 3), ('b', 3), ('c', 2), ('z', 1)]
>>> 
>>> max(items, key=lambda pair: pair[1])
('e', 11)

Could go back to ip-count.py - change it to print the ip addrs with the highest count

Conclusions

Python has lots of features, but you can do a lot with the core:
functions, strings, lists, dicts, Doctests, files