Today: 1. When do you get a copy? with =? With parameters? 2. More sophisticated dict nesting - file index example

## Python and Copying

We'll need to lean harder today on how Python's lack of copying memory

## Dict - Chapter 1 - "Count"

• Our first use of dict was counting
• Super important dict code pattern
• Stereotypical in/not-in logic per data item
• Invariant strategy:
• 1. First write the code assuming "in"
• 2. Then add code above that to fix "not in" case

Suppose "x" is something we're counting

```    if x not in counts:
counts[x] = 0      # Init
counts[x] += 1        # Increment
```

## Init and Increment cases

• Call these 2 cases "init" and "increment"
• 1. "init" value = 0
• 2. "increment" value += 1

## Dict Count Practice Problem

Try the code for this one - the dict count algorithm

More problems > Dict Count Problems

## Dict - Chapter 2 - Nested

• More sophisticated dict algorithms
• Old: value is int counter 0, 1, 2, 3
• Now: value is "nested" structure, e.g. a list, dict
• email_hosts example ..

## Email Hosts Data

• Have email strings
'abby@foo.com'
has one @
"user" is left of @ -> 'abby'
"host" is right of @ -> 'foo.com'

## Email Hosts Outline

Given a list of email address strings. Each email address has one '@' in it, e.g. 'abby@foo.com', where 'abby' is the user, and 'foo.com' is the host.

Create a nested dict with a key for each host, and the value for that key is a list of all the users for that host, in the order they appear in the original list (repetition allowed).

```emails:
['abby@foo.com', 'bob@bar.com', 'abe@foo.com']

returns:
{
'foo.com': ['abby', 'abe'],
'bar.com': ['bob']
}
```

## Email Hosts Hints

1 The in/not-in logic remains, setting up the initial, and += 1 cases. But now we are building a list of user strings, not incrementing an int.

2 Here is the code snippet to extract the user/host from an email:

```        at = email.find('@')
user = email[:at]
host = email[at + 1:]
```

## Nested Hosts Example

> email-hosts

Suggestion: build up a list for each value. Old init was: 0. Now the init is: []

## Style Note: Decomp By Var

The nested structure is complicated. Looking at your vars, it can be hard to keep track of what's a dict and what's a str and what's a list.

Suppose "outer" is a dict and "outer[s]" is a nested list of names. You want to append "name" to the list. Could write it this way:

```outer[s].append(names)
```

It can be nice to break it down into smaller pieces - divide and conquer! Use a variable with a good name to break out part of the computation, narrating the steps more clearly for your own mental model, like this:

```names = outer[s]
names.append(name)
```

The nested dict problems are complicated enough that this is a nice strategy. Perhaps this is an instance of slowing down as you code, working more carefully in the hope of less debugging later. That's a good trade.

## Need today: line.split() -> list

• Builtin str.split()
• With no parameters, splits on whitespace, returns a list of "words"
• Use today to access words within a line of text
```>>> line = ' this      and  that. '
>>> line.split()
['this', 'and', 'that.']
```
• 1. Trims off all whitespace
• 2. Not smart about punctuation, 'that.' is a "word" here - fine for today

## Bigger Nested Dict Example - Index

• Given text file
• Go through all lines
• Use line.split() -> words
• Go through all the words
• Build index dict:
key = each word in text (lowercase)
value = list of all lines where that word appears

## Example Index Dict

File `tea-time.txt` 2 lines:

```tea time
coffee time
```

Example index. Key for each word. Value for each is key is list of lines containing that word. The dict totally captures the text. Note how keys are in a random order.

1. When just line 1 `tea time` is indexed:

```{
'tea': ['tea time'],
'time': ['tea time'],
}
```

2. State when line 2 `coffee time` is added:

```{
'tea': ['tea time'],
'time': ['tea time', 'coffee time'],
'coffee': ['coffee time']
}
```

Program output is just a prettied up version of the dict. Prints the words in sorted order (code below). Puts '**' around each key word, followed by its lines. The function to make this output from the dict is provided.

```**coffee**
coffee time

**tea**
tea time

**time**
tea time
coffee time
```

## Gettysburg Output

```Gettysburg address output
...
**which**
living, rather, to be dedicated here to the unfinished work which they
which they gave the last full measure of devotion -- that we here

**who**
portion of that field, as a final resting place for those who here
who struggled here, have consecrated it, far above our poor power to
who fought here have thus far so nobly advanced. It is rather for us

**will**
add or detract. The world will little note, nor long remember what we
....
```

## index_line(index, line)

• Look in index.py
• Look at index_line() function
• Takes in dict .. adds to it .. returns it
• Initially pass in {}
• Examine Doctests
• The Doctests are a sequence story
• The index_file() function is provided
calls your index_line() in a loop to do all the work
• The print_index() function is provided
Loop over keys to dump put dict

## index_line(index, line) - Write The Code

• Each word in a line
• Remember value = list of lines
• One strategy:
• 1. Use decomp-by-var to keep value type clear
• 2. Write the code assuming the word is in the index
• 3. Then add if-logic before to fix the case where it's not in
This is makes the "invariant" strategy with no "else"
```def index_line(index, line):
"""
Given an index dict and a line of text,
update the index with the text of that line
and return the modified dict.
Use the lowercase form of each word as the key.
>>> index_line({}, 'tea time')
{'tea': ['tea time'], 'time': ['tea time']}
>>> index_line({'tea': ['tea time'], 'time': ['tea time']}, 'coffee time')
{'tea': ['tea time'], 'time': ['tea time', 'coffee time'], 'coffee': ['coffee time']}
"""
words = line.split()
for word in words:
word = word.lower()
# Your code here - update index for each word
pass

return index
```

## index.py Solution Code

• Look at index_line() - core algorithm
• Look at index_file()
• Look at print_index() - discussion below
```"""
Stanford CS106A Index Example
Nick Parlante
Shows reading all the words out of a file,
building a "nested" dict structure to analyze them,
printing out the contents of a dict with a standard
sorted(d.keys()) loop.
"""

import sys

"""
We'll say that an "index" dict has a key for the lowercase
version of every word in a text, and its value
is a list of all the lines where that word appears.
"""

def index_line(index, line):
"""
Given an index dict and a line of text,
update the index with the text of that line
and return the modified dict.
Use the lowercase form of each word as the key.
>>> index_line({}, 'tea time')
{'tea': ['tea time'], 'time': ['tea time']}
>>> index_line({'tea': ['tea time'], 'time': ['tea time']}, 'coffee time')
{'tea': ['tea time'], 'time': ['tea time', 'coffee time'], 'coffee': ['coffee time']}
"""
words = line.split()
for word in words:
word = word.lower()
# Your code here - update index for each word
pass
if word not in index:
index[word] = []
lines = index[word]  # Style idea: decomp by var
lines.append(line)
return index
# Extension: avoid listing a line mult times per word:
#   if line not in lines: lines.append(line)

def index_file(filename):
"""
(provided)
Given filename, build and return index of its contents.
(just calls index_line() for every line)
"""
# Build the index with every line from file
index = {}
with open(filename, 'r') as f:
for line in f:
index_line(index, line)
# Each call to index_line() modifies the index dict
return index

def print_index(index):
"""
(provided)
Given index dict, print out its contents.

Print out the words in alphabetical order.
Each word with **'s as shown below,
followed by all the lines where that word appears,
followed by a blank line.
e.g. here is part of the gettysburg address index output

**who**
portion of that field, as a final resting place for those who here
who struggled here, have consecrated it, far above our poor power to
who fought here have thus far so nobly advanced. It is rather for us

**will**
add or detract. The world will little note, nor long remember what we
"""
# Standard loop to dump out a dict by going through
# all the keys in sorted order.
for word in sorted(index.keys()):
print('**' + word + '**')
lines = index[word]       # decomp by var
for line in lines:
print(line, end='')   # line already has \n
print()                   # print a blank line

def main():
args = sys.argv[1:]
# args: -filename-   - prints index of that text file
if len(args) == 1:
index = index_file(args[0])
print_index(index)

```

## Run

```\$ python3 index.py tea-time.txt
**coffee**
coffee time

**tea**
tea time

**time**
tea time
coffee time

\$ python3 index.py gettysburg.txt
...
lots
...
**who**
portion of that field, as a final resting place for those who here
who struggled here, have consecrated it, far above our poor power to
who fought here have thus far so nobly advanced. It is rather for us

**will**
add or detract. The world will little note, nor long remember what we

**work**
living, rather, to be dedicated here to the unfinished work which they

**world**
add or detract. The world will little note, nor long remember what we

**years**
Four score and seven years ago our fathers brought forth on this
```

## Loop over a dict - `d.keys()`

• `d.keys()` is a collection of all the keys in a dict
• Works with foreach loop
• But keys are in a random order
• Can loop over them to see all dict contents
```>>> d = {'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
>>> d.keys()
dict_keys(['a', 'g', 'b'])  # not alphabetic order
```

The following loop works to iterate over whole dict. Don't need to check "in" for each key - these are the keys.

```for key in d.keys():
# use key and d[key] in here
print(key, d[key])
```

## Standard Loop Over Dict - `sorted(d.keys())`

• Don't want to print keys in random order
• The `sorted()` function takes in a collection, and makes a sorted version of it
More later!
• Use `sorted(d.keys())` to loop over the keys in sorted order
• This is a standard loop to look at the contents of a dict
```>>> d = {'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
>>> sorted(d.keys())
['a', 'b', 'g']       # in alphabetic order, loop over this
```

Standard loop to see contents of dict in order:

```for key in sorted(d.keys()):
print(key, d[key])
```