Today: talk about midterm a tiny bit, more next week. Main topic: Dictionaries, dict-count algorithm
Dict - Hash Table - Fast
- Python "dict" type
- A key/value "dictionary"
- Generic term: "hash table"
Sounds like a real hacker thing
CS106B!
- Defining feature: powerful and fast
Python Dict - Advanced
- String and list and int are crucial
- But dict is advanced
- The dict type has a unique power in it
- Many advanced algorithms leverage that power
- Job interview pattern:
Interview question has some messed up data
Best answer inevitably uses a dict to organize the data
Because the dict is advanced and fast, its appearance is sort of inevitable
Python Dict
See also: Python Dict
- Organize data around a key
- For each key, store one value
- Get/set by key is fast
- Key type is typically a str or int (immutable)
- Value type can be anything (str, list, ...)
- Set:
d[key] = value
- Get:
d[key]
- Get from d[key] if key not in there = Error
- Check if key is present:
key in d
or not present: key not in d
- Danger: before accessing d[key] - check that key is in first
- Note: the order of the keys in the dict is kind of random
It is the order they were added
Simplest to think of it as random
First Dict Code Example
>>> d = {} # start as empty dict {}
>>> d['a'] = 'alpha' # store key/values into d
>>> d['g'] = 'gamma'
>>> d['b'] = 'beta'
>>> d
{'a': 'alpha', 'g': 'gamma', 'b': 'beta'} # curly-brace syntax
# order is somewhat random
>>> d['b']
'beta'
>>> d['a'] = 'apple' # overwrite 'a' key
>>> d['a']
'apple'
>>> d['x']
Error:KeyError('x',)
>>> 'a' in d
True
>>> 'x' in d
False
>>> # Use += to modify
>>> d['a'] += '!!!'
>>> d['a']
'apple!!!'
>>>
>>> # Can write dict literal with { } syntax
# style: 1 space after colon and comma
>>> d = {'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
Dict = Memory
- At a high level, dict is memory
- Organized by key
- Suppose key is 'a'
- Your code can store at one time:
d['a'] = 12
- Later, code can lookup:
d['a']
- Get back the 12 stored earlier
- Speed: even if d contains 10 million keys, can access any key instantly
Dict Memory Example
Use dict to remember that 'snack1' is 'apple' and 'snack2' is 'donut'. Using 'snack1' and 'snack2' as keys.
>>> d = {}
>>> d['snack1'] = 'apple'
>>> d['snack2'] = 'donut'
>>>
>>> # time passes, other lines run
>>>
>>> # what was snack2 again?
>>> d['snack2']
'donut'
>>>
Dict-Count Algorithm
- Important class of dict algorithms
- (Read: we'll use it a lot)
- Counts dict:
key for each distinct value
value for each key is count how many times that key appears
- e.g. strs:
'a', 'c', 'a', 'b'
- creates "counts" dict:
{'a': 2, 'c': 1, 'b': 1}
Dict-Count Steps
- 1. Start with empty dict
- 2. For each string:
- 3. First time str seen? store key=str, value = 1
- 4. Seen before? key=str, value += 1
1. str-count1() - if/else
str_count1 demo, canonical dict-count algorithm
>
1.str-count1
- Each s, key question: is this the first time seeing it?
- if/else solution
- if test first time?
Do one line if first time, counts[s] = 1
Do other line for all other: counts[s] += 1
- This approach is fine
Solution code
def str_count1(strs):
counts = {}
for s in strs:
# s first time?
if s not in counts:
counts[s] = 1 # first time
else:
counts[s] += 1 # every later time
return counts
2. str-count2() - "Invariant" Version, no else
>
2. str-count2
- Same problem: is this the first time seeing s?
- Invariant approach:
- Run this line for all cases:
counts[s] += 1
- Precede it with if logic to "fix" counts if necessary
- "Invariant" means something which is true in all cases at some line
- Invariant means programmer can count on that being true - simpler
- I weakly prefer this version. It's one fewer lines and does not use else.
- All counting goes through that one += 1 line
Standard Dict-Count Code - "invariant" Version
def str_count2(strs):
counts = {}
for s in strs:
if s not in counts: # make s be in there
counts[s] = 0
# Invariant: now s is in counts one way or
# another, so can do next step unconditionally
counts[s] += 1
return counts
Int Count - You Try It
Apply the dict-count algorithm to a list of int values, return a counts dict, counting how many times each int value appears in the list.
>
3. int-count