October 25th, 2020
Written by Brahm Capoor, Juliette Woodrow, Baker Sharp, Peter Maldonado, Kara Eng, Tori Qiu and Parth Sarin
Implement the following function:
def count_by_consonants(filename):
"""
Reads in the file whose name is filename and returns a dictionary
that contains counts of words that share the same consonants in order.
"""
For example, if the file consonants.txt
contains this text:
great
grate
teeny
greet
tiny
bump
calling count_by_consonants('consonants.txt')
will return the
dictionary {'grt': 3, 'tny': 2, 'bmp': 1}
.
The inspiration for this problem came from this video by Veritasium on YouTube
In 2020 more than 12,000 people applied to join NASA's newest class of astronauts. In a few years, 10-12 of those people will be selected to join the Artemis program and become astronauts. Will those 10-12 truly be the best of the bunch, or could there be some luck involved? Source
Let's explore this question by making a simulation using code. Complete function make_applicants(num_applicants)
that creates num_applicants applicants who want to be astronauts, where each applicant is created with a random float of skill and luck. Before writing this function, take a minute to consider what data structures would be helpful here. What information needs to be stored for each applicant? How can we store the information on all of the applicants? This function should return the data structure containing information on all of the applicants.
Then run these applicants through a skill test: each applicant must pass a minimum skill score (like 0.90) to make the first cut. To do this, write a function make_first_cut_applicants(applicants)
which takes as parameter a data structure holding all applicants and their scores and returns a list of all of the skilled astronauts.
After this first cut, all remaining applicants are highly skillful, but it's time to select the best of the best. Find the best applicants by combining their luck and skill scores (made from some proportion such as 95% skill and 5% luck) and make the final list of astronauts from the applicants with the ten highest combined scores. To do so, first write a function calc_combined_score(applicant)
which takes as parameter an applicant and returns their combined luck and skill score. Then use this function to create a list of the top ten applicants.
If you were to ask one of the selected astronauts why they were chosen, what might they say?
Stanford accepted 2,349 out of 45,227 applicants for the class of 2024. What happens when you use those numbers instead? Where do you think this difference comes from? Check out the numbers for other classes at the bottom of this page.
What do the results of this simulation say about the possible effects of luck and skill in highly selective programs? In what ways can this simulation be misleading?
If you like, try altering some of the constants in your code, such as increasing the percentage of luck in the combined score, lowering the skill score needed to pass the first cut, or selecting a greater proportion of applicants.
Python is commonly used to help analyze data sets by creating visualizations of said data. One common example of data visualization is to look at relationships between users social networks. Programmers can use python to help draw the network. The visual can be used to measure statistics of the network or to find clusters, which are groups of individuals who have many of the same friends. There are many reasons to search for close-knit groups in social networks - for instance, to create targeted marketing schemes or to suggest groups they'd like to be a part of. Cluster analysis has been also used to identify terrorist cells.
In this problem, you are going to use your python programming skills to create a visual representation of a social network where users can follow each other. More specifically, you will use information from two text files and draw lines representing the relationships in the network. If a user follows another user, your output should have a line connecting the two people.
You will be given two files through command-line arguments. The first file
is a list of each person in the network, each on a separate line, where
their name is followed by a colon and then a list of all the people that
they follow. The second file is a list of coordinates. Each line has the
name of a person in the network, followed by a colon and then a comma
separated list of two integers representing the x and y coordinates you
will use to place the node representing that person on the canvas. For
example, two matching files, users.txt
and
coords.txt
, might look like this (note that the users needn't
necessarily be in the same order in both files):
Juliette: Brahm, Nick, Julie, Cynthia
Brahm: Juliette, Nick, Cynthia, Mehran, Chris, Cynthia
Mehran: Oliver, Chris
Chris: Mehran, Oliver
Nick: Juliette, Julie, Keith
Julie: Juliette, Nick, Cynthia
Oliver: Mehran, Chris
Cynthia: Juliette, Julie, Keith
Keith: Nick, Cynthia
Brahm: 141, 343
Chris: 390, 65
Cynthia: 100, 250
Julie: 185, 670
Juliette: 238, 409
Keith: 30, 400
Mehran: 550, 145
Nick: 14, 585
Oliver: 699, 18
Your job is to implement the following function:
def draw_friend_graph(canvas, friends_file, coordinates_file):
"""
Draws a graph representing the friend network. For each user,
draw a circle at their respective coordinates and a label with their name.
Next, draw lines connecting that circle to the circle of each person they
follow.
"""
As you consider how best to approach the problem and store your data, keep in mind that relationships aren't necessarily symmetric. For example, note that Brahm follows several users in the example above, but despite his best efforts to use in-vogue hashtags, has but a single humble follower.
You may assume that the two files provided are valid and represent the same users, but an interesting extension to this problem might be to verify that.
Implement the following function:
def first_list(strs):
"""
Given a list of strings, create and return a dictionary whose
keys are the unique first characters of the strings and whose
values are lists of words beginning with those characters, in
the same order that they appear in strs.
>>> first_list(['banter', 'brahm', 'aardvark', 'python', 'antiquated'])
{'b': ['banter', 'brahm'], 'a': ['aardvark', 'antiquated'], 'p': ['python']}
"""
Cryptography is the study of techniques for communicating messages secretly. Imagine that Alice and Bob want to send messages to each other, but Eve can snoop on the messages they're sending and read them. Alice wants to figure out way to "encrypt" her messages so that if Eve reads the message, she won't be able to understand it, but Bob will be able to "decrypt" the message. We're going to write a program to help Alice and Bob do this.
Alice decides that she'll replace every letter in her original message
with a different letter. She's defined a variable in the program called
ENCRYPTION_DICT
that keeps track of these associations:
ENCRYPTION_DICT = {
'A': 'T',
'B': 'H',
'C': 'E',
'D': 'Q',
'E': 'U',
'F': 'I',
'G': 'C',
'H': 'K',
...
}
Alice and Bob exchanged this dictionary before we went into quarantine, so they both know that this is the strategy, but Eve doesn't know that! Note that in order to avoid ambiguity, the values in this dictionary are unique (that is, each letter is a value in the dictionary exactly once).
To start us off, implement the following function:
def encrypt(plaintext):
"""
Takes in plaintext as an input and returns 'ciphertext': the result
of substituting each letter in the plaintext by its corresponding
encrypted character in ENCRYPTION_DICT.
The plaintext comprises entirely of uppercase letters and non-alphabetic
characters like punctuation. Non-alphabetic characters needn't be encrypted,
but rather should appear in the plaintext in their original form.
>>> encrypt("HEY, HOW'S IT GOING?")
"KUD, KXZ'S BV CXBFC?"
>>> encrypt("I LOVE CS 106A!")
'B WXLU ES 106T!'
>>> encrypt("UNICORNS ARE THE MOST BEAUTIFUL ANIMALS IN EXISTENCE")
'AFBEXPFS TPU VKU NXSV HUTAVBIAW TFBNTWS BF UYBSVUFEU'
"""
Now that Bob has received Alice's encrypted message, he needs to "decrypt" it, or convert it back to the original message. To help him do so, implement the following function:
def decrypt(ciphertext):
"""
Uses ENCRYPTION_DICT to decrypt each of the alphabetic characters of
ciphertext.
>>> decrypt("KUD, KXZ'S BV CXBFC?")
"HEY, HOW'S IT GOING?"
>>> decrypt('B WXLU ES 106T!')
'I LOVE CS 106A!'
>>> decrypt('AFBEXPFS TPU VKU NXSV HUTAVBIAW TFBNTWS BF UYBSVUFEU')
'UNICORNS ARE THE MOST BEAUTIFUL ANIMALS IN EXISTENCE'
"""
This is an interesting encryption scheme (known formally as a substitution cipher), but unfortunately isn't very secure. What issues do you see with it?
Inspired by an unhealthy amount of Netflix and quarantine cooking, Parth and Peter are opening up a bakery. To try and keep costs low, they want to automate as much of their inventory tracking as possible.
Thanks to 106A, they think they can use dictionaries to help.
All of the ingredients they have available will be stored in a
pantry
dictionary, whose keys are ingredients and values are weights (Of course
they're using the metric system), like so:
pantry = {
'flour': 400,
'sugar': 300,
'salt': 10,
'chocolate': 150
}
Each recipe
is also stored in a dictionary, like the
following uninspiring concoction:
recipe = {
'flour': 200,
'salt': 2.5
}
(As you can tell from the recipe above, it doesn't look like their bakery is going to do very well, but that's not important right now.)
Recipes and the pantry are first stored as files, where each key value
pair is on a different line, separated by ':: '
. The pantry
list and recipe above would look like:
flour:: 400
sugar:: 300
salt:: 10
chocolate:: 150
flour:: 200
salt:: 2.5
Our goal is to write a few functions to help Parth and Peter run their bakery. Begin by implementing the following function:
def read_dict_from_file(filename):
"""
Takes in the name of a file containing a recipe or
pantry list and reads it into a dictionary.
An example doctest using the file above:
>>> read_dict_from_file('recipe.txt')
{'flour': 200, 'salt': 2.5}
"""
Once you've developed the infrastructure to construct recipe and pantry list dictionaries, implement the following functions:
def can_make(recipe, pantry):
"""
Given the contents of the pantry, returns a boolean indicating
whether or not it is possible to follow the recipe. Note that
the parameters to this function are dictionaries, and not
filenames. The pantry should not be modified in this function
"""
pass
def make_recipe(recipe, pantry):
"""
Given a recipe and a pantry with enough ingredients to make the recipe,
modify the contents of the pantry to remove as many quantities as the
recipe requires. You may modify the pantry in place, but return the modified
pantry in order to test the output using doctests.
# using the recipe and pantry defined above
>>> make_recipe(recipe, pantry)
{'flour': 200, 'sugar': 300, 'salt': 7.5, 'chocolate': 150}
"""
pass
Note that make_recipe
assumes the pantry is sufficient for
the recipe but this is not necessarily always the case; thus, every call
to make_recipe
will need to be guarded by a call to
can_make_recipe
.
Whilst implementing can_make
function, you might find the
dictionary's .get
function helpful, which accepts a parameter
suggesting what to return if the value is not in the dictionary. For
example, calling recipe.get('yeast', 42)
will return
42
if 'yeast'
is not a key in the
recipe
dictionary.
Finally, implement a main
function, which first reads in a
pantry file from a user and then continuously asks for recipe filenames
from the user and either prints an error message if the pantry does not
have sufficient ingredients, or removes the ingredients from the pantry if
it does exist, printing the pantry afterwards.
A sample run of the program is below, which repeatedly tries to make the recipe above in the vain hope that someone will actually want to eat it. Note that your program should be able to accept any valid recipe filename, though. User input is bolded and italicized.
$ python3 pantry_manager.py
Enter pantry filename: pantry.txt
What recipe should we bake next (Press enter to quit.)? recipe.txt
You can make that recipe! Your pantry now looks like this:
{'flour': 200, 'sugar': 300, 'salt': 7.5, 'chocolate': 150}
What recipe should we bake next (Press enter to quit.)? recipe.txt
You can make that recipe! Your pantry now looks like this:
{'flour': 0, 'sugar': 300, 'salt': 5, 'chocolate': 150}
What recipe should we bake next (Press enter to quit.)? recipe.txt
You can't make that recipe.
What recipe should we bake next (Press enter to quit.)? User presses enter immediately
Two words are anagrams if they consist of the same letters in a different order. Your job in this problem is to write a program that allows users to see what words are anagrams of words that they type. Here's some sample output (user input is bolded and italicized):
$ python3 anagrams.py
Word: listen
['enlist', 'inlets', 'listen', 'silent', 'slinte', 'tinsel']
Word: race
['acer', 'acre', 'care', 'cera', 'crea', 'race']
Word: python
['phyton', 'python', 'typhon']
Word: programming
['programming']
Word: piech
piech is not in the dictionary
Word: microphone
['microphone', 'neomorphic']
Word: User presses enter immediately
The design of this program is largely up to you, although we have a few suggestions for you:
We've provided a constant LEXICON
whose value is the name
of a file containing all of the words in the English language. It may
be helpful to find some way of associating words with their anagrams.
A useful observation is that two anagrams are identical when their
characters are sorted. For example, 'LISTEN'
and
'SILENT'
, when sorted are both 'eilnst'
.
Python has an inbuilt sorted
function, which returns a
list of the characters in a string in alphabetical order. This list
can then be converted back to a string like so:
>>> string = 'banter'
>>> sorted_characters = sorted('banter')
>>> sorted_characters
['a', 'b', 'e', 'n', 'r', 't']
>>> sorted_string = ''.join(sorted_characters)
'abenrt'
In this program, you'll write a program that reads through a large collection of tweets and store the data to keep track of how hashtags occur in tweets. This is a great example of how Python can be used in data analysis tasks.
For the purposes of this problem, each tweet is represented as a single
line of text in a file. Each line consists of the poster's username
(prefixed by a '@' symbol), followed by a colon and then the text of the
tweet. Each character in this file can be a character from any language,
or an emoji, although you don't need to do anything special to deal with
these characters. One such file in the PyCharm project we provide is
small-tweets.txt
, which is reproduced here:
@BarackObama: Missed President Obama's final #SOTU last night? Check out his full remarks. https://t.co/7KHp3EHK8D
@BarackObama: Fired up from the #SOTU? RSVP to hear @VP talk about the work ahead with @OFA supporters:
https://t.co/EIe2g6hT0I https://t.co/jIGBqLTDHB
@BarackObama: RT @WhiteHouse: The 3rd annual #BigBlockOfCheeseDay is today! Here's how you can participate:
https://t.co/DXxU8c7zOe https://t.co/diT4MJWQā¦
@BarackObama: Fired up and ready to go? Join the movement: https://t.co/stTSEUMkxN #SOTU
@kanyewest: Childish Gambino - This is America https://t.co/sknjKSgj8c
@kanyewest: šššš„š„š„ https://t.co/KmvxIwKkU6
@dog_rates: This is Kylie. She loves naps and is approximately two bananas long. 13/10 would snug softly
https://t.co/WX9ad5efbN
@GonzalezSarahA: RT @JacobSmithVT: Just spent ten minutes clicking around this cool map #education #vt #realestate
https://t.co/iqxXtruqrt
We provide 3 such files for you in the PyCharm Project:
small-tweets.txt
, big-tweets.txt
and
huge-tweets.txt
.
user_tags
Dictionary
Central to this program is a user_tags
dictionary, in which
each key is a Twitter user's name like '@BarackObama'
. The
value for each key in this dictionary is a second, nested dictionary which
counts how frequently that particular user has used particular hashtags.
For example, a very simple user_tags
dictionary might be:
{'@BarackObama': {'#SCOTUS': 4, '#Obamacare': 3}}
We'll explore this dictionary in some more detail as we go through this problem, but as a matter of nomenclature, we'll call the inner dictionary the 'counts' dictionary. Our high-level strategy is to change the above dict for each tweet we read, so it accumulates all the counts as we go through the tweets.
Given the dictionary above, what updates we would make to it in each of the following cases?
'@BarackObama: #Obamacare signups now!'
.
'@kanyewest: šššš„š„š„ https://t.co/KmvxIwKkU6'
.
add_tweet
The add_tweet
function is the core of this whole program, and
is responsible for performing the update to a
user_tags
dictionary described above. The tests shown below
represent a sequence of tweets, expressed as a series of Doctests. For
each call, you can see the dictionary that is passed in, and the
dictionary that is returned on the next line. The first test passes in the
empty dictionary ({}
) and gets back a dictionary with 1 user
and 2 tags. The 2nd test then takes that returned dictionary as its input,
and so on. Each call adds more data to the
user_tags
dictionary.
We've provided you with two functions entitled
parse_tags
and parse_user
, both of which take as
a parameter the tweet in question and return a list of tags in the tweet
and the username that posted the tweet, respectively.
def add_tweet(user_tags, tweet):
"""
Given a user_tags dict and a tweet, parse out the user and tags,
and add those counts to the user_tags dict which is returned.
If no user exists in the tweet, return the user_tags dict unchanged.
Note: call the parse_tags(tweet) and parse_user(tweet) functions to pull
the parts out of the tweet.
>>> add_tweet({}, '@alice: #apple #banana')
{'@alice': {'#apple': 1, '#banana': 1}}
>>> add_tweet({'@alice': {'#apple': 1, '#banana': 1}}, '@alice: #banana')
{'@alice': {'#apple': 1, '#banana': 2}}
>>> add_tweet({'@alice': {'#apple': 1, '#banana': 2}}, '@bob: #apple')
{'@alice': {'#apple': 1, '#banana': 2}, '@bob': {'#apple': 1}}
"""
parse_tweets
Use add_tweet
in a loop to build up and return a
user_tags
dict. This should look mostly like other
file-reading functions you've written, and your job is to make sure you
understand how to follow the pattern of creating and updating a dictionary
suggested by the add_tweet
function. Restated, the
responsibility of add_tweet
is to update a dictionary, and
parse_tweets
must create and maintain that dictionary as it
is updated.
We provide a main
function that calls the
parse_tweets
function you implemented in a variety of ways.
To use it, run the program from the terminal. Run with just 1 argument (a
data filename), it reads in all the data from that file and prints out a
summary of each user and all their tweets and counts:
$ python3 tweets.py small-tweets.txt
@BarackObama
#BigBlockOfCheeseDay -> 1
#SOTU -> 3
@GonzalezSarahA
#education -> 1
#vt -> 1
#realestate -> 1
When run with the '-users'
argument, main
prints
out all the usernames:
$ python3 tweets.py -users small-tweets.txt
users
@BarackObama
@kanyewest
@dog_rates
@GonzalezSarahA
When run with the '-user'
argument followed by a username,
the program prints out the data for just that user.
$ python3 tweets.py -user @BarackObama small-tweets.txt
user: @BarackObama
#BigBlockOfCheeseDay -> 1
#SOTU -> 3
You probably won't get to this extension in section, but if you have time, implement this additional function which you can then leverage to answer some interesting questions about Hashtag use.
flat_counts
It's natural to be curious about how often tags are used across users.
This function takes in a user_tags
dictionary and computes a
new "flat" count dictionary:
def flat_counts(user_tags):
"""
Given a user_tags dicts, sum up the tag counts across all users,
return a "flat" counts dict with a key for each tag,
and its value is the sum of that tag's count across users.
>>> flat_counts({'@alice': {'#apple': 1, '#banana': 2}, '@bob': {'#apple': 1}})
{'#apple': 2, '#banana': 2}
"""
main
will call that function with the
-flat
argument, like so:
$ python3 tweets.py -flat small-tweets.txt
flat
#BigBlockOfCheeseDay -> 1
#MAGA -> 2
#SOTU -> 3
.
.
.