Section #6: Nested Structures
May 17th, 2020
Written by Brahm Capoor, Juliette Woodrow, Peter Maldonado, Kara Eng,
Tori Qiu and Parth Sarin
Counting by Consonants
Implement the following function:
def count_by_consonants(filename):
"""
Reads in the file whose name is filename and returns a dictionary
that contains counts of words that share the same consonants in order.
"""
For example, if the file consonants.txt contains this text:
great
grate
teeny
greet
tiny
bump
calling count_by_consonants('consonants.txt') will return the
dictionary {'grt': 3, 'tny': 2, 'bmp': 1}.
Drawing Friend Graphs
Python is commonly used to help analyze data sets by creating
visualizations of said data. One common example of data visualization is
to look at relationships between users social networks. Programmers can
use python to help draw the network. The visual can be used to measure
statistics of the network or to find clusters, which are groups of
individuals who have many of the same friends. There are many reasons to
search for close-knit groups in social networks - for instance, to create
targeted marketing schemes or to suggest groups they'd like to be a part
of. Cluster analysis has been also used to
identify terrorist cells.
In this problem, you are going to use your python programming skills to
create a visual representation of a social network where users can follow
each other. More specifically, you will use information from two text
files and draw lines representing the relationships in the network. If a
user follows another user, your output should have a line connecting the
two people.
You will be given two files through command-line arguments. The first file
is a list of each person in the network, each on a separate line, where
their name is followed by a colon and then a list of all the people that
they follow. The second file is a list of coordinates. Each line has the
name of a person in the network, followed by a colon and then a comma
separated list of two integers representing the x and y coordinates you
will use to place the node representing that person on the canvas. For
example, two matching files, users.txt and
coords.txt, might look like this (note that the users needn't
necessarily be in the same order in both files):
users.txt
Juliette: Wil, Nick, Julie, Cynthia
Wil: Juliette, Nick, Cynthia, Mehran, Chris, Cynthia
Mehran: Oliver, Chris
Chris: Mehran, Oliver
Nick: Juliette, Julie, Keith
Julie: Juliette, Nick, Cynthia
Oliver: Mehran, Chris
Cynthia: Juliette, Julie, Keith
Keith: Nick, Cynthia
coords.txt
Wil: 141, 343
Chris: 390, 65
Cynthia: 100, 250
Julie: 185, 670
Juliette: 238, 409
Keith: 30, 400
Mehran: 550, 145
Nick: 14, 585
Oliver: 699, 18
Your job is to implement the following function:
def draw_friend_graph(canvas, friends_file, coordinates_file):
"""
Draws a graph representing the friend network. For each user,
draw a circle at their respective coordinates and a label with their name.
Next, draw lines connecting that circle to the circle of each person they
follow.
"""
As you consider how best to approach the problem and store your data, keep
in mind that relationships aren't necessarily symmetric. For example, note
that Wil follows several users in the example above, but despite his
best efforts to use in-vogue hashtags, has but a single humble follower.
You may assume that the two files provided are valid and represent the
same users, but an interesting extension to this problem might be to
verify that.
First Letter Index
Implement the following function:
def first_list(strs):
"""
Given a list of strings, create and return a dictionary whose
keys are the unique first characters of the strings and whose
values are lists of words beginning with those characters, in
the same order that they appear in strs.
>>> first_list(['banter', 'wil', 'aardvark', 'python', 'antiquated'])
{'b': ['banter'], 'a': ['aardvark', 'antiquated'], 'p': ['python'], 'w': ['wil']}
"""
Cryptography
Cryptography is the study of techniques for communicating messages
secretly. Imagine that Alice and Bob want to send messages to each other,
but Eve can snoop on the messages they're sending and read them. Alice
wants to figure out way to "encrypt" her messages so that if Eve reads the
message, she won't be able to understand it, but Bob will be able to
"decrypt" the message. We're going to write a program to help Alice and
Bob do this.
Alice decides that she'll replace every letter in her original message
with a different letter. She's defined a variable in the program called
ENCRYPTION_DICT that keeps track of these associations:
ENCRYPTION_DICT = {
'A': 'T',
'B': 'H',
'C': 'E',
'D': 'Q',
'E': 'U',
'F': 'I',
'G': 'C',
'H': 'K',
...
}
Alice and Bob exchanged this dictionary before we went into quarantine, so
they both know that this is the strategy, but Eve doesn't know that! Note
that in order to avoid ambiguity, the values in this dictionary are unique
(that is, each letter is a value in the dictionary exactly once).
Encryption
To start us off, implement the following function:
def encrypt(plaintext):
"""
Takes in plaintext as an input and returns 'ciphertext': the result
of substituting each letter in the plaintext by its corresponding
encrypted character in ENCRYPTION_DICT.
The plaintext comprises entirely of uppercase letters and non-alphabetic
characters like punctuation. Non-alphabetic characters needn't be encrypted,
but rather should appear in the plaintext in their original form.
>>> encrypt("HEY, HOW'S IT GOING?")
"KUD, KXZ'S BV CXBFC?"
>>> encrypt("I LOVE CS 106A!")
'B WXLU ES 106T!'
>>> encrypt("UNICORNS ARE THE MOST BEAUTIFUL ANIMALS IN EXISTENCE")
'AFBEXPFS TPU VKU NXSV HUTAVBIAW TFBNTWS BF UYBSVUFEU'
"""
Decryption
Now that Bob has received Alice's encrypted message, he needs to "decrypt"
it, or convert it back to the original message. To help him do so,
implement the following function:
def decrypt(ciphertext):
"""
Uses ENCRYPTION_DICT to decrypt each of the alphabetic characters of
ciphertext.
>>> decrypt("KUD, KXZ'S BV CXBFC?")
"HEY, HOW'S IT GOING?"
>>> decrypt('B WXLU ES 106T!')
'I LOVE CS 106A!'
>>> decrypt('AFBEXPFS TPU VKU NXSV HUTAVBIAW TFBNTWS BF UYBSVUFEU')
'UNICORNS ARE THE MOST BEAUTIFUL ANIMALS IN EXISTENCE'
"""
Note that in order to successfully decrypt a message, you need the
'reverse' of ENCRYPTION_DICT: rather than associating
plaintext characters with their encrypted counterparts, we need to
go the other way.
We suggest writing a function
reverse_encryption_dict to decompose out this
problem. In class, we saw an example of this wherein a reversed
dictionary associated keys with lists of values, because multiple
keys can share the same value. Note that this doesn't apply to the
case of ENCRYPTION_DICT, because each character is
guaranteed to have a unique character. How does this affect how
you implement reverse_encryption_dict?
This is an interesting encryption scheme (known formally as a
substitution cipher), but unfortunately isn't very secure. What
issues do you see with it?
Recipes
Inspired by an unhealthy amount of Netflix and quarantine cooking, Parth
and Peter are opening up a bakery. To try and keep costs low, they want to
automate as much of their inventory tracking as possible.
Thanks to 106A, they think they can use dictionaries to help.
All of the ingredients they have available will be stored in a
pantry
dictionary, whose keys are ingredients and values are weights (Of course
they're using the metric system), like so:
pantry = {
'flour': 400,
'sugar': 300,
'salt': 10,
'chocolate': 150
}
Each recipe is also stored in a dictionary, like the
following uninspiring concoction:
recipe = {
'flour': 200,
'salt': 2.5
}
(As you can tell from the recipe above, it doesn't look like their bakery
is going to do very well, but that's not important right now.)
Recipes and the pantry are first stored as files, where each key value
pair is on a different line, separated by ':: '. The pantry
list and recipe above would look like:
pantry.txt
flour:: 400
sugar:: 300
salt:: 10
chocolate:: 150
recipe.txt
flour:: 200
salt:: 2.5
Our goal is to write a few functions to help Parth and Peter run their
bakery. Begin by implementing the following function:
def read_dict_from_file(filename):
"""
Takes in the name of a file containing a recipe or
pantry list and reads it into a dictionary.
An example doctest using the file above:
>>> read_dict_from_file('recipe.txt')
{'flour': 200, 'salt': 2.5}
"""
Once you've developed the infrastructure to construct recipe and pantry
list dictionaries, implement the following functions:
def can_make(recipe, pantry):
"""
Given the contents of the pantry, returns a boolean indicating
whether or not it is possible to follow the recipe. Note that
the parameters to this function are dictionaries, and not
filenames. The pantry should not be modified in this function
"""
pass
def make_recipe(recipe, pantry):
"""
Given a recipe and a pantry with enough ingredients to make the recipe,
modify the contents of the pantry to remove as many quantities as the
recipe requires. You may modify the pantry in place, but return the modified
pantry in order to test the output using doctests.
# using the recipe and pantry defined above
>>> make_recipe(recipe, pantry)
{'flour': 200, 'sugar': 300, 'salt': 7.5, 'chocolate': 150}
"""
pass
Note that make_recipe assumes the pantry is sufficient for
the recipe but this is not necessarily always the case; thus, every call
to make_recipe will need to be guarded by a call to
can_make_recipe.
Whilst implementing can_make function, you might find the
dictionary's .get function helpful, which accepts a parameter
suggesting what to return if the value is not in the dictionary. For
example, calling recipe.get('yeast', 42) will return
42 if 'yeast' is not a key in the
recipe dictionary.
Finally, implement a main function, which first reads in a
pantry file from a user and then continuously asks for recipe filenames
from the user and either prints an error message if the pantry does not
have sufficient ingredients, or removes the ingredients from the pantry if
it does exist, printing the pantry afterwards.
A sample run of the program is below, which repeatedly tries to make the
recipe above in the vain hope that someone will actually want to eat it.
Note that your program should be able to accept any valid recipe filename,
though. User input is bolded and italicized.
$ python3 pantry_manager.py
Enter pantry filename: pantry.txt
What recipe should we bake next (Press enter to quit.)? recipe.txt
You can make that recipe! Your pantry now looks like this:
{'flour': 200, 'sugar': 300, 'salt': 7.5, 'chocolate': 150}
What recipe should we bake next (Press enter to quit.)? recipe.txt
You can make that recipe! Your pantry now looks like this:
{'flour': 0, 'sugar': 300, 'salt': 5, 'chocolate': 150}
What recipe should we bake next (Press enter to quit.)? recipe.txt
You can't make that recipe.
What recipe should we bake next (Press enter to quit.)? User presses enter immediately
Anagrams
Two words are anagrams if they consist of the same letters in a
different order. Your job in this problem is to write a program that
allows users to see what words are anagrams of words that they type.
Here's some sample output (user input is
bolded and italicized):
$ python3 anagrams.py
Word: listen
['enlist', 'inlets', 'listen', 'silent', 'slinte', 'tinsel']
Word: race
['acer', 'acre', 'care', 'cera', 'crea', 'race']
Word: python
['phyton', 'python', 'typhon']
Word: programming
['programming']
Word: piech
piech is not in the dictionary
Word: microphone
['microphone', 'neomorphic']
Word: User presses enter immediately
The design of this program is largely up to you, although we have a few
suggestions for you:
-
We've provided a constant LEXICON whose value is the name
of a file containing all of the words in the English language. It may
be helpful to find some way of associating words with their anagrams.
-
A useful observation is that two anagrams are identical when their
characters are sorted. For example, 'LISTEN' and
'SILENT', when sorted are both 'eilnst'.
Python has an inbuilt sorted function, which returns a
list of the characters in a string in alphabetical order. This list
can then be converted back to a string like so:
>>> string = 'banter'
>>> sorted_characters = sorted('banter')
>>> sorted_characters
['a', 'b', 'e', 'n', 'r', 't']
>>> sorted_string = ''.join(sorted_characters)
'abenrt'
Big Tweet Data
In this program, you'll write a program that reads through a large
collection of tweets and store the data to keep track of how hashtags
occur in tweets. This is a great example of how Python can be used in data
analysis tasks.
Our Dataset
For the purposes of this problem, each tweet is represented as a single
line of text in a file. Each line consists of the poster's username
(prefixed by a '@' symbol), followed by a colon and then the text of the
tweet. Each character in this file can be a character from any language,
or an emoji, although you don't need to do anything special to deal with
these characters. One such file in the PyCharm project we provide is
small-tweets.txt, which is reproduced here:
@BarackObama: Missed President Obama's final #SOTU last night? Check out his full remarks. https://t.co/7KHp3EHK8D
@BarackObama: Fired up from the #SOTU? RSVP to hear @VP talk about the work ahead with @OFA supporters:
https://t.co/EIe2g6hT0I https://t.co/jIGBqLTDHB
@BarackObama: RT @WhiteHouse: The 3rd annual #BigBlockOfCheeseDay is today! Here's how you can participate:
https://t.co/DXxU8c7zOe https://t.co/diT4MJWQā¦
@BarackObama: Fired up and ready to go? Join the movement: https://t.co/stTSEUMkxN #SOTU
@kanyewest: Childish Gambino - This is America https://t.co/sknjKSgj8c
@kanyewest: šššš„š„š„ https://t.co/KmvxIwKkU6
@dog_rates: This is Kylie. She loves naps and is approximately two bananas long. 13/10 would snug softly
https://t.co/WX9ad5efbN
@GonzalezSarahA: RT @JacobSmithVT: Just spent ten minutes clicking around this cool map #education #vt #realestate
https://t.co/iqxXtruqrt
We provide 3 such files for you in the PyCharm Project:
small-tweets.txt, big-tweets.txt and
huge-tweets.txt.
Building a user_tags Dictionary
Central to this program is a user_tags dictionary, in which
each key is a Twitter user's name like '@BarackObama'. The
value for each key in this dictionary is a second, nested dictionary which
counts how frequently that particular user has used particular hashtags.
For example, a very simple user_tags dictionary might be:
{'@BarackObama': {'#SCOTUS': 4, '#Obamacare': 3}}
We'll explore this dictionary in some more detail as we go through this
problem, but as a matter of nomenclature, we'll call the inner dictionary
the 'counts' dictionary. Our high-level strategy is to
change the above dict for each tweet we read, so it accumulates all
the counts as we go through the tweets.
1. Warmup questions
Given the dictionary above, what updates we would make to it in each of
the following cases?
-
We encounter a new tweet that reads
'@BarackObama: #Obamacare signups now!'.
-
We encounter a new tweet that reads
'@kanyewest: šššš„š„š„ https://t.co/KmvxIwKkU6'.
2. Implementing add_tweet
The add_tweet function is the core of this whole program, and
is responsible for performing the update to a
user_tags dictionary described above. The tests shown below
represent a sequence of tweets, expressed as a series of Doctests. For
each call, you can see the dictionary that is passed in, and the
dictionary that is returned on the next line. The first test passes in the
empty dictionary ({}) and gets back a dictionary with 1 user
and 2 tags. The 2nd test then takes that returned dictionary as its input,
and so on. Each call adds more data to the
user_tags dictionary.
We've provided you with two functions entitled
parse_tags and parse_user, both of which take as
a parameter the tweet in question and return a list of tags in the tweet
and the username that posted the tweet, respectively.
def add_tweet(user_tags, tweet):
"""
Given a user_tags dict and a tweet, parse out the user and tags,
and add those counts to the user_tags dict which is returned.
If no user exists in the tweet, return the user_tags dict unchanged.
Note: call the parse_tags(tweet) and parse_user(tweet) functions to pull
the parts out of the tweet.
>>> add_tweet({}, '@alice: #apple #banana')
{'@alice': {'#apple': 1, '#banana': 1}}
>>> add_tweet({'@alice': {'#apple': 1, '#banana': 1}}, '@alice: #banana')
{'@alice': {'#apple': 1, '#banana': 2}}
>>> add_tweet({'@alice': {'#apple': 1, '#banana': 2}}, '@bob: #apple')
{'@alice': {'#apple': 1, '#banana': 2}, '@bob': {'#apple': 1}}
"""
3. Implementing parse_tweets
Use add_tweet in a loop to build up and return a
user_tags dict. This should look mostly like other
file-reading functions you've written, and your job is to make sure you
understand how to follow the pattern of creating and updating a dictionary
suggested by the add_tweet function. Restated, the
responsibility of add_tweet is to update a dictionary, and
parse_tweets must create and maintain that dictionary as it
is updated.
Running your program
We provide a main function that calls the
parse_tweets function you implemented in a variety of ways.
To use it, run the program from the terminal. Run with just 1 argument (a
data filename), it reads in all the data from that file and prints out a
summary of each user and all their tweets and counts:
$ python3 tweets.py small-tweets.txt
@BarackObama
#BigBlockOfCheeseDay -> 1
#SOTU -> 3
@GonzalezSarahA
#education -> 1
#vt -> 1
#realestate -> 1
When run with the '-users' argument, main prints
out all the usernames:
$ python3 tweets.py -users small-tweets.txt
users
@BarackObama
@kanyewest
@dog_rates
@GonzalezSarahA
When run with the '-user' argument followed by a username,
the program prints out the data for just that user.
$ python3 tweets.py -user @BarackObama small-tweets.txt
user: @BarackObama
#BigBlockOfCheeseDay -> 1
#SOTU -> 3
Extension
You probably won't get to this extension in section, but if you have time,
implement this additional function which you can then leverage to answer
some interesting questions about Hashtag use.
Implementing flat_counts
It's natural to be curious about how often tags are used across users.
This function takes in a user_tags dictionary and computes a
new "flat" count dictionary:
def flat_counts(user_tags):
"""
Given a user_tags dicts, sum up the tag counts across all users,
return a "flat" counts dict with a key for each tag,
and its value is the sum of that tag's count across users.
>>> flat_counts({'@alice': {'#apple': 1, '#banana': 2}, '@bob': {'#apple': 1}})
{'#apple': 2, '#banana': 2}
"""
main will call that function with the
-flat argument, like so:
$ python3 tweets.py -flat small-tweets.txt
flat
#BigBlockOfCheeseDay -> 1
#MAGA -> 2
#SOTU -> 3
.
.
.