Today: 1. When do you get a copy? with =? With parameters? 2. More sophisticated dict nesting - file index example

Python and Copying

We'll need to lean harder today on how Python's lack of copying memory

See: Python Not Copying

Dict - Chapter 1 - "Count"

Suppose "x" is something we're counting

    if x not in counts:
       counts[x] = 0      # Init
    counts[x] += 1        # Increment

Init and Increment cases

Dict Count Practice Problem

Try the code for this one - the dict count algorithm

> Int Count

More problems > Dict Count Problems

Dict - Chapter 2 - Nested

Email Hosts Data

Email Hosts Outline

Given a list of email address strings. Each email address has one '@' in it, e.g. 'abby@foo.com', where 'abby' is the user, and 'foo.com' is the host.

Create a nested dict with a key for each host, and the value for that key is a list of all the users for that host, in the order they appear in the original list (repetition allowed).

emails:
  ['abby@foo.com', 'bob@bar.com', 'abe@foo.com']

returns:
  {
   'foo.com': ['abby', 'abe'],
   'bar.com': ['bob']
  }

Email Hosts Hints

1 The in/not-in logic remains, setting up the initial, and += 1 cases. But now we are building a list of user strings, not incrementing an int.

2 Here is the code snippet to extract the user/host from an email:

        at = email.find('@')
        user = email[:at]
        host = email[at + 1:]

Nested Hosts Example

> email-hosts

Suggestion: build up a list for each value. Old init was: 0. Now the init is: []

Style Note: Decomp By Var

The nested structure is complicated. Looking at your vars, it can be hard to keep track of what's a dict and what's a str and what's a list.

Suppose "outer" is a dict and "outer[s]" is a nested list of names. You want to append "name" to the list. Could write it this way:

outer[s].append(names)

It can be nice to break it down into smaller pieces - divide and conquer! Use a variable with a good name to break out part of the computation, narrating the steps more clearly for your own mental model, like this:

names = outer[s]
names.append(name)

The nested dict problems are complicated enough that this is a nice strategy. Perhaps this is an instance of slowing down as you code, working more carefully in the hope of less debugging later. That's a good trade.


Need today: line.split() -> list

>>> line = ' this      and  that. '
>>> line.split()
['this', 'and', 'that.']

Bigger Nested Dict Example - Index

> index.zip

Example Index Dict

File tea-time.txt 2 lines:

tea time
coffee time

Example index. Key for each word. Value for each is key is list of lines containing that word. The dict totally captures the text. Note how keys are in a random order.

1. When just line 1 tea time is indexed:

{
'tea': ['tea time'],
'time': ['tea time'],
}

2. State when line 2 coffee time is added:

{
'tea': ['tea time'],
'time': ['tea time', 'coffee time'],
'coffee': ['coffee time']
}

Program output is just a prettied up version of the dict. Prints the words in sorted order (code below). Puts '**' around each key word, followed by its lines. The function to make this output from the dict is provided.

**coffee**
coffee time

**tea**
tea time

**time**
tea time
coffee time

Gettysburg Output

Gettysburg address output
...
**which**
living, rather, to be dedicated here to the unfinished work which they
which they gave the last full measure of devotion -- that we here

**who**
portion of that field, as a final resting place for those who here
who struggled here, have consecrated it, far above our poor power to
who fought here have thus far so nobly advanced. It is rather for us

**will**
add or detract. The world will little note, nor long remember what we
....

index_line(index, line)

index_line(index, line) - Write The Code

def index_line(index, line):
    """
    Given an index dict and a line of text,
    update the index with the text of that line
    and return the modified dict.
    Use the lowercase form of each word as the key.
    >>> index_line({}, 'tea time')
    {'tea': ['tea time'], 'time': ['tea time']}
    >>> index_line({'tea': ['tea time'], 'time': ['tea time']}, 'coffee time')
    {'tea': ['tea time'], 'time': ['tea time', 'coffee time'], 'coffee': ['coffee time']}
    """
    words = line.split()
    for word in words:
        word = word.lower()
        # Your code here - update index for each word
        pass
        
    return index

index.py Solution Code

"""
Stanford CS106A Index Example
Nick Parlante
Shows reading all the words out of a file,
building a "nested" dict structure to analyze them,
printing out the contents of a dict with a standard
sorted(d.keys()) loop.
"""

import sys


"""
We'll say that an "index" dict has a key for the lowercase
version of every word in a text, and its value
is a list of all the lines where that word appears.
"""


def index_line(index, line):
    """
    Given an index dict and a line of text,
    update the index with the text of that line
    and return the modified dict.
    Use the lowercase form of each word as the key.
    >>> index_line({}, 'tea time')
    {'tea': ['tea time'], 'time': ['tea time']}
    >>> index_line({'tea': ['tea time'], 'time': ['tea time']}, 'coffee time')
    {'tea': ['tea time'], 'time': ['tea time', 'coffee time'], 'coffee': ['coffee time']}
    """
    words = line.split()
    for word in words:
        word = word.lower()
        # Your code here - update index for each word
        pass
        if word not in index:
            index[word] = []
        lines = index[word]  # Style idea: decomp by var
        lines.append(line)
    return index
    # Extension: avoid listing a line mult times per word:
    #   if line not in lines: lines.append(line)


def index_file(filename):
    """
    (provided)
    Given filename, build and return index of its contents.
    (just calls index_line() for every line)
    """
    # Build the index with every line from file
    index = {}
    with open(filename, 'r') as f:
        for line in f:
            index_line(index, line)
            # Each call to index_line() modifies the index dict
    return index


def print_index(index):
    """
    (provided)
    Given index dict, print out its contents.

    Print out the words in alphabetical order.
    Each word with **'s as shown below,
    followed by all the lines where that word appears,
    followed by a blank line.
    e.g. here is part of the gettysburg address index output

    **who**
    portion of that field, as a final resting place for those who here
    who struggled here, have consecrated it, far above our poor power to
    who fought here have thus far so nobly advanced. It is rather for us

    **will**
    add or detract. The world will little note, nor long remember what we
    """
    # Standard loop to dump out a dict by going through
    # all the keys in sorted order.
    for word in sorted(index.keys()):
        print('**' + word + '**')
        lines = index[word]       # decomp by var
        for line in lines:
            print(line, end='')   # line already has \n
        print()                   # print a blank line


def main():
    args = sys.argv[1:]
    # args: -filename-   - prints index of that text file
    if len(args) == 1:
        index = index_file(args[0])
        print_index(index)

Run

$ python3 index.py tea-time.txt
**coffee**
coffee time

**tea**
tea time

**time**
tea time
coffee time

$ python3 index.py gettysburg.txt
...
lots
...
**who**
portion of that field, as a final resting place for those who here
who struggled here, have consecrated it, far above our poor power to
who fought here have thus far so nobly advanced. It is rather for us

**will**
add or detract. The world will little note, nor long remember what we

**work**
living, rather, to be dedicated here to the unfinished work which they

**world**
add or detract. The world will little note, nor long remember what we

**years**
Four score and seven years ago our fathers brought forth on this

Loop over a dict - d.keys()

>>> d = {'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
>>> d.keys()
dict_keys(['a', 'g', 'b'])  # not alphabetic order

The following loop works to iterate over whole dict. Don't need to check "in" for each key - these are the keys.

for key in d.keys():
    # use key and d[key] in here
    print(key, d[key])

Standard Loop Over Dict - sorted(d.keys())

>>> d = {'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
>>> sorted(d.keys())
['a', 'b', 'g']       # in alphabetic order, loop over this

Standard loop to see contents of dict in order:

for key in sorted(d.keys()):
    print(key, d[key])