Quiz 3 Review

March 7, 2021


Written by Juliette Woodrow, Brahm Capoor, and Nick Parlante

This is a handout with some practice problems to help you study for the upcoming quiz. In addition to these problems, you can do additional problems on the experimental server as well as checkout all problems from the section 6 handout as well as all problems from the section 7 handout. We recommend that you try a problem before looking at the solution. Also, try timing yourselves when working through these to simulate the quiz envrionment. If you have any questions, feel free to come to LaIR, office hours, post on Ed, or email Juliette.

One Liners


Dictionaries


Emails

We'll say an email addr string looks like this 'alice@foo.com', with the following structure: there is exactly one '@', separating a user substring ('alice'), and host ('foo.com'). For this problem, assume that all email addrs are correctly formatted in this way.

Implement a function, parse_hosts(addrs, which given a list of email addr strings, e.g. ['alice@foo.com', 'bob@bar.org', ...] , compute a "hosts" dict with a key for every host string that occurs within the addresses. The value for each host key is a list of all the user strings for that host. For example, the following hosts dict has a key 'foo.com' with 3 user strings:

          
{
...
'foo.com': ['bob', 'alice', 'sal'],
...
}
          
        

The user strings in the nested list may be in any order, but should not include duplicates, e.g. if the addr 'alice@foo.com' is seen twice, 'alice' should be in the list only once.


Sent Messages

For this problem, suppose you are building a new messaging service where every user has an address like '@bob'. You have a log file of lines like this..

          
@bob,@alice,@mo
@mo,@pierre
...
          
        

Each line represents one sent message in the system. The first address on a line is the sender, followed by 1 or more destination (dest) addresses for that message. The dest addresses will not include duplicates. Users can send many messages over time with various sets of 1 or more dests. For this problem, you will process a file of log lines to analyze the flow of messages.

We'll say a "sends" dict has a key for every sender address that has sent a message. The value for each sender is a nested dict which counts how many messages have been sent from that sender to each dest. So for example, the sends dict below shows that '@bob' sent 3 messages including '@alice' as a recipient and 1 message including '@mo' as a recipient.

          
{
...
'@bob': {'@alice': 3, '@mo': 1},
...
}
          
        

Implement a function, parse_sends(filename), which given a filename of a log file, read through all the log lines to build and return a sends dict.


Common Senders

This problem uses the "sends" dict format from the previous problem, but the code here is independent, and you can work on it without solving the previous problem.

Implement a function, commons(sends, src_a, src_b), which given a sends dict, and src_a and src_b which are sender addrs which have sent messages, computes and returns a "commons" list of the dest addresses that each have received messages from both src_a and src_b. The addrs in the commons list may be in any order, but should not include duplicates, i.e. no address should be in the list twice. Several approaches are possible and we are not grading on efficiency here, so any approach that computes the list correctly is fine.

So for example with sends

          
{
...
'@bob': {'@a': 2, '@b': 1, '@x': 12},
'@mo': {'@b': 6, '@x': 1, '@y': 1}
...
}
          
        
Calling the function with src_a: '@bob' and src_b: '@mo', the commons list is ['@b', '@x'].


Make County

Implement a function, make_county(words) which given a list of non-empty words, produces a 'county' dict where each first-char-of-word seen is a key, and its value is a count dict of words starting with that char. For example, make_county(['aaa', 'abb', 'aaa']) would return {'a': {'aaa':2, 'abb':1}}.


Recipes

You are putting together an app that helps people find recipes they can make with the ingredients that they already have on hand. Implement the following function, find_recipes(curr_ingredients, recipe_book), which takes as parameters curr_ingredients, a list of ingredients that the user already has on hand and recipe_book, a dictionary representing a recipe book in which the keys are recipe names and values are the list of ingredients required to make that recipe. The function should return a list of recipe names that the user can make.


Building a Glossary

Implement the following function: make_glossary(filename) that takes as a parameter a string filename and constructs a glossary of the file. A glossary is defined as a dictionary from words to lists of tuples whose first elements are line numbers and second elements are the lines themselves. For example, calling the function on this file:

1st line 
2nd line
        
would return this dictionary: {"1st": [(0, "1st line")], "line": [(0, "1st line"), (1, "2nd line")], "2nd": [(1, "2nd line")]} Be sure to remove newline characters from the line. You may assume that each word appears at most once per line of the file.


Finding Grandparents

Implement a function find_grandchildren, which given a dictionary parents whose keys are strings representing parent names and whose values are lists of those parent's children's names, produces and return s a dictionary mapping from people's names to those lists of those people's grandchildren's names.

For example, given this dictionary:

            
parents_to_children = {
  'Khaled': ['Chibundu', 'Jesmyn'],
  'Daniel': ['Khaled', 'Eve'],
  'Jesmyn': ['Frank'],
  'Eve': ['Grace']
}
            
          

Daniel's grandchildren are Chibundu, Jesmyn and Grace, since those are the children of his children. Additionally, Khaled is Frank's grandparent, since Frank's parent is Khaled's child. Therefore, calling find_grandchildren (parents_to_children) returns this dictionary:

            
{
  'Khaled': ['Frank'],
  'Daniel': ['Chibundu', 'Jesmyn', 'Grace']
}
            
          

Note that the people who aren't grandparents don't show up as keys in the new dictionary

Capstone CS106A Problems

Prime File

The given file contains text data on each line as follows. Each line is made of a mixture of non-empty alphabetic words and non-negative ints, all separated by colons. The numbers can be distinguished since their first char is a digit. Lines like this: aaa:123:420:xyz:xxx:44:zz:a The first element is the "prime" for that line, and is always alphabetic. Implement a function, prime_file(filename) which reads through all the lines and builds a dictwith a key for each prime, and its value is a list of all the int number values from all the lines with that prime. Add every number to the list, even if it is a duplicate. When all the data is loaded, print the primes in alphabetical order 1 per line, each followed by its sorted list of numbers, like this aaa [44, 123, 123, 125, 420, 693] bbb [16, 23, 101, 101]


Dict File

Suppose you are part of a team handling a sudden surge of package deliveries. You have a data file for each day that looks like this:

ca,94301-0001,94025-1122,94301-9999,...
wa,98001-0000,98001-4322,...
...
      
The parts of each line are separated by commas. The first word is a 2-char state code. After the state are 1 or more zip+4 codes, like '94301-0001', each representing one delivery. The format of the zip+4 is: 5 digit zipcode, dash, 4 digits. We will treat the zipcode as a string.

Write code to build a "states" dict with a key for every state, The value for each state key is a nested dict which counts the number of times each zipcode string has a delivery, considering only the zipcode '94301' part of the zip+4 '94301-0001'.

        
{
 'ca': {'94301': 2, '94025': 1, ...}
 'wa': {'98001': 5, '98002': 1, ...}
}
        
      

Once the states dict is built, write code to print each state code on a line by itself, in increasing alphabetic order. Each state should be followed by its zip/count data, with the zipcodes in increasing order, so the overall print output looks like this:

ca
94025 1
94301 2
94563 5
tx
75001 5
75002 1
75011 2
wa
98001 5
98002 1
      
This function does not return anything - it prints its output. The standard code to open a file and loop over its lines is provided.


Lightning

There are many lightning strikes each day across the united states. Each lightning strike is represented by a number 1..100. Suppose you have a data file where each line represents one or more lightning events. Each line is made of one or more lightning strike ints followed by the state name, separated by commas like this:

26,1,ca
99,100,100,tx
12,50,ca
44,28,tx

Write a function to read the lines of this file and build a "states" dict with a key for each state, e.g. 'ca', and its value is a list of all that states's int strike numbers. So for example, the 4 lines of text above make this states dict:

  

{ 'ca': [26, 1, 12, 50],
  'tx': [99, 100, 100, 44, 28],
}
  

When all the data is loaded, print the states in alphabetical order, 1 per line, each followed by its list of numbers, followed by the sum of the numbers, like this:

ak [100, 12] 112
ca [26, 1, 12, 50] 89
tx [99, 100, 100, 44, 28] 371
...

This function does not return anything - it prints its output. The standard code to open a file and loop over its lines is provided.

  
>>> # Reminder of how to print a list of ints:
>>> nums = [1, 2, 3]
>>> print(nums)
[1, 2, 3]

  
def parse_lightning(filename):
    states = {}
    with open(filename, 'r') as f:
        for line in f:
            pass