Today: dict and dict-count
Dict - Hash Table - Fast
- Python "dict" type
- A key/value "dictionary"
- Generic term: "hash table"
Sounds like a real hacker thing
CS106B!
- Defining feature: powerful and fast
Python Dict - Advanced
- String and list and int are crucial
- But dict is advanced
- The dict type has a unique power in it
- Many advanced algorithms leverage that power
- Job interview pattern:
Interview question has some messed up data
Best answer inevitably uses a dict to organize the data
Because the dict is advanced and fast, its appearance is sort of inevitable
Python Dict
For more details see guide: Python Dict
- Organize data around a key
- Each key has one associated value in the dict
Draw with an arrow
- Set:
d[key] = value
- Get:
d[key]
- Setting a value overwrites any previous value for that key
- Key type is typically a str or int (immutable)
- Value type can be anything (str, list, ...)
- Get from d[key] if key not in there = Error
- Check if key is present:
key in d
or not present: key not in d
- Danger: before accessing d[key] - check that key is in first
- Note: the order of the keys in the dict is kind of random
It is the order they were added
Simplest to think of it as random
- Summary: get/set by key, in/not-in test of key
First Dict Code Example
>>> d = {} # start as empty dict {}
>>> d['a'] = 'alpha' # store key/values into d
>>> d['g'] = 'gamma'
>>> d['b'] = 'beta'
>>> d
{'a': 'alpha', 'g': 'gamma', 'b': 'beta'} # curly-brace syntax
# order is somewhat random
>>> d['b']
'beta'
>>> d['a'] = 'apple' # overwrite 'a' key
>>> d['a']
'apple'
>>> d['x']
Error:KeyError('x',)
>>> 'a' in d
True
>>> 'x' in d
False
>>> # Use += to modify
>>> d['a'] += '!!!'
>>> d['a']
'apple!!!'
>>>
>>> # Can write dict literal with { } syntax
# style: 1 space after colon and comma
>>> d = {'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
Key and Value - Different Roles
- Note that get/set/in are all by key
- The key is the control, value is just dumb payload
- Could say key/value are asymmetric, having specialized roles
- YES:
d['a'] = 'alpha'
- YES:
'a' in d -> True
- NO:
'alpha' in d -> False
in/not-in etc. use key, do not look at values at all
Dict = Memory
- At a high level, dict is memory
- Organized by key
- Suppose key is 'a'
- Your code can store at one time:
d['a'] = 12
- Later, code can lookup:
d['a']
- Get back the 12 stored earlier
- Speed: even if d contains 10 million keys, can access any key instantly
Dict vs. List - Keys
- Dict and List both remember things
- What's the difference?
- Keys!
- The "keys" for list are always index numbers
0, 1, 2, 3, ... len-1
- The "keys" for dict, you choose!
Any string or int etc. value is a key
- Can install anything, make that key on the fly...
- e.g.
d['dinner'] = 'radish'
Dict Memory Example - Meals
Use dict to remember that 'breakfast' is 'apple' and 'lunch' is 'donut'. Using 'breakfast' and 'lunch' as keys.
>>> d = {}
>>> d['breakfast'] = 'apple'
>>> d['lunch'] = 'donut'
>>>
>>> # time passes, other lines run
>>>
>>> # what was lunch again?
>>> d['lunch']
'donut'
>>>
>>> # did I have breakfast or dinner?
>>> 'breakfast' in d
True
>>> 'dinner' in d
False
>>>
Basic Dict Code Examples - Meals
>
Basic Dict Examples
These all use a "meals" dict which contains key/value pairs like 'lunch' -> 'hot dog'. The possible keys are 'breakfast', 'lunch', 'dinner', although a key may or not be present in the meals dict.
- bad_start() - check for bad breakfast - return True if no breakfast or if it is 'candy'
- candyish() - check for candy lunch or dinner
- enkale() - if candy for dinner, change it to kale
bad_start() Solution Code
def bad_start(meals):
if 'breakfast' not in meals:
return True
if meals['breakfast'] == 'candy':
return True
return False
# Can be written with "or" / short-circuiting avoids key-error
# if 'breakfast' not in meals or meals['breakfast'] == 'candy':
Dict-Count Algorithm
- Important class of dict algorithms
- (Read: we'll use it a lot)
- Counts dict:
key for each distinct value
value for each key is count how many times that key appears
- e.g. strs:
'a', 'c', 'a', 'b'
- creates "counts" dict:
{'a': 2, 'c': 1, 'b': 1}
Dict Count Code Examples
>
Dict Count Examples
Dict-Count Steps
- 1. Start with empty dict
- 2. For each string, test: not seen before?
- 3. Not seen before: store key = str, value = 1
- 4. Seen before: key = str, value += 1
1. str-count1() - if/else
str_count1 demo, canonical dict-count algorithm
- Central test of this algorithm: not seen before?
- if/else solution
- Test: not seen before?
not seen before: counts[s] = 1
seen before: counts[s] += 1
- This if/else approach is fine, but we'll see another way below
Solution code
def str_count1(strs):
counts = {}
for s in strs:
# s not seen before?
if s not in counts:
counts[s] = 1 # first time
else:
counts[s] += 1 # every later time
return counts
2. str-count2() - "Invariant" Version, no else
- A slight "invariant" improvement on the above code
- Same central test: not seen s before?
- Invariant approach:
- If not seen before:
counts[s] = 0
- Fix: if s not in there, make it be in there
- Now can simply use this line for b>all cases:
counts[s] += 1
- "Invariant" means something which is true in all cases at some line
- Invariant means programmer can count on that being true - simpler
- I have a slight preference this version
It's one fewer lines and does not use else
- All counting goes through that one += 1 line
Standard Dict-Count Code - "invariant" Version
def str_count2(strs):
counts = {}
for s in strs:
if s not in counts: # fix counts/s if not seen before
counts[s] = 0
# Invariant: now s is in counts one way or
# another, so can do next step unconditionally
counts[s] += 1
return counts
Int Count - Exercise
Apply the dict-count algorithm to a list of int values, return a counts dict, counting how many times each int value appears in the list.
Char Count - Exercise
Apply the dict-count algorithm to chars in a string. Build a counts dict of how many times each char appears in a string so 'Coffee' returns {'c': 1, 'o': 1, 'f': 2, 'e': 2}.