Back to CS 106A Homepage
Written by Nick Parlante, Brahm Capoor, Andrew Tierno and Juliette Woodrow
February 2nd, 2020
First, we'll go through a series of problems intended to increase your
familiarity with the additional parameters of the
range function and nested loops. Implement the following
negative(n): Given a non-negative integer
n, return a list of all the ints from
n down to
[n, n-1, ..., 1, 0, -1, -2, ..., -n].
Note: consider the case where
n == 0. Does your
function need to do any special work to account for this
threes(n): Given a non-negative integer
n, create and return a list of
each of which has length 3. Each inner list should contain 3
consecutive integers. The first inner list should start with a 1,
and every subsequent inner list's first element should increase by
1. For example,
threes(4) would return
[[1,2,3], [2,3,4], [3,4,5], [4,5,6]].
countdown(n): Given a non-negative integer
nthat is less than or equal to 10, create and return a list of
nlists, where each inner list counts down from 10 to successively smaller numbers. For example,
[, [10, 9], [10, 9, 8]].
Implement the following function:
That takes as a parameter a
representing a file with a single integer on each line,
and returns the smallest unique positive integer in the file.
An integer is positive if is greater than 0, and unique if it
occurs exactly once in the file. For example, suppose
filename.txt looks like this:
42 1 13 12 1 -8 20
You may assume that each line of the file contains exactly one integer, although it may not be positive and that there is at least one positive integer in the file.
Implement the following function:
which takes as a parameter a list
lst and returns a list of
all the unique elements in
lst, in the order that they appear
lst. A unique element is defined as an element that occurs
only once in
Implement the following functions:
exclaim(s): Given a string
s, look for the first exclamation mark. If there is a substring of 1 or more alphabetic characters immediately to the left of the exclamation mark, return this substring including the exclamation mark. Otherwise, return the empty string. For example,
exclaim('xx Hello! yy')returns
vowels(s): Given a string
s, look for the first colon. If there is a substring of 1 or more vowels immediately to the right of the colon, return this substring without the colon. Otherwise, return the empty string. For example,
Try out your solutions here.
Download the PyCharm project for this section here.
Now, we're going to turn our attention to a parsing task we'd be more likely to see in the real world: parsing email addresses. For the purposes of this problem, we'll be using a simplified format of an email address as follows:
hostname is a string with at least 4 characters.
It consists of alphabetic characters and at least one period. In
addition, the username can be any length, including 0 characters.
Some examples are:
firstname.lastname@example.org # valid email address email@example.com # valid email address jillian@website # invalid email address, needs at least one period. firstname.lastname@example.org # invalid, since 1 isn't a letter or period @gmail.com # valid email address email@example.com # invalid, less than 4 characters long.
Suppose you have a file called
emails.txt that looks
Please forward this email to firstname.lastname@example.org for me. Thanks! Can someone tell me who owns the email@example.com email address? The email firstname.lastname@example.org keeps sending me spam mail. Please forward this email to email@example.com for me. Thanks! Omg @ye is my favorite! Do you think firstname.lastname@example.org is spam? This one isn't spam: email@example.com Hello, world! Why am I getting emails from firstname.lastname@example.org?
which has at most one email address per line of the file. Your job is to write the following function:
which takes in a string representing a file's name and returns a
list of all the unique hostnames in the file.
For example, calling the function with the parameter
would have the following result:
>>> extract_all_hostnames('emails.txt') ['d.tv', 'gmail.com', 'spam.com', 'stanford.edu', 'yahoo.com']
In writing this function, think about how best to decompose it into functions that are responsible for subparts of the problem. For example, consider implementing a function which extracts a hostname from a single line and how you might use it.
Download the PyCharm project here.
In the last problem, you built a program to retrieve email hostnames from a file. Unfortunately, that program was limited in several ways. For example, it could only parse a single email from each line of the file, only retrieved the hostname of each email, and finally wasn't robust to peculiar cases such as punctuation occurring immediately after the hostname.
This time, you'll leverage your skills with nested loops and string parsing to build a more sophisticated program to grab emails from a file. You'll start by writing a program that simply grabs every email address from the file by implementing functions which we specify and whose definitions we provide for you. Then, you'll make your program a little more flexible by having it to support a variety of command line arguments which alter its behaviour.
First, we're going to refine our definition of what constitutes an email address. An email address must be formatted in the following way:
Every character in both the username and the hostname must be a
letter, a digit, a period, a dash, or an underscore (the
'_' character). The username must be at least one
character long, and the hostname must be at least 4 characters long,
one of which is a period. With this in mind, implement the following
useful helper function:
that takes in a character, and returns whether that character is a valid part of an email address. This will not be a long function, but will be instrumental in the readability of the more complex functions you write later.
Your job here is to implement the following function:
Which takes as input a string representing a line of text from a file, and returns a list of all the valid email addresses in that line.
Here's some sample output for the
>>> get_all_emails('xx email@example.com firstname.lastname@example.org') ['email@example.com', 'firstname.lastname@example.org'] >>> get_all_emails('_@_ aa-bb@TV.email@example.com') ['aa-bb@TV.org', 'firstname.lastname@example.org'] >>> get_all_emails('abc @ @ 123')  >>> get_all_emails('') 
Some words of wisdom:
@character in the email, and then scan backwards and forwards to find the other characters in the email address. As a reminder, the
str.find()function accepts an optional second parameter which specifies which index to begin searching in the string from.
is_email_char()function you wrote in the previous section will be very helpful here.
Finally, implement the following function:
that takes as input a filename for your function to read through and
returns a list of all the email addresses in the file. For
example, if the file
emails.txt looks like this:
Hello email@example.com this is firstname.lastname@example.org And a.7@d_e.org and email@example.com firstname.lastname@example.org is not nick's email
then the function would behave as below:
>>> get_emails_from_file('emails.txt') ['a.7@d_e.org', 'email@example.com', 'firstname.lastname@example.org', 'email@example.com', 'firstname.lastname@example.org']
We've written a main function for you that puts all of these together, so you don't need to worry about modifying it for this section.
def main(): args = sys.argv[1:] if len(args) == 1: emails = parse_all_emails(args) for email in emails: print(email) # some other bookkeeping here
You can use your program as demonstrated below:
$ python3 emails.py emails.txt a.7@d_e.org email@example.com firstname.lastname@example.org email@example.com firstname.lastname@example.org $ python3 emails.py big-emails.txt --@ --@and.com --@bill --@come --@oh --@oh.com --@the.com ....lots and lots of emails.... email@example.com firstname.lastname@example.org your@walk
Now that you have a basic version of your program working, you'll now turn your attention to making it a more flexible and powerful by implementing various optional command line options for the user:
-maxcommand line option allows you to specify the maximum number of emails you'd like to grab from each line. For example, if one of the lines in the file is
email@example.com and firstname.lastname@example.org and email@example.com, but your program is called as below, only
firstname.lastname@example.org be printed to the terminal.
$ python3 emails.py emails.txt ... emails from other lines in the file ... email@example.com firstname.lastname@example.org email@example.com ... emails from other lines in the file ... $ python3 emails.py -max 2 emails.txt ... emails from other lines in the file ... firstname.lastname@example.org email@example.com ... emails from other lines in the file ...
-hostcommand line option allows you to specify that you would only like to grab emails with a paricular hostname. For example, calling the program as below will only print
stanford.eduemails in the shell.
$ python3 emails.py emails.txt firstname.lastname@example.org email@example.com firstname.lastname@example.org $ python3 emails.py -host stanford.edu emails.txt email@example.com
You can assume that a user will use either the
-max option, or the
-host option, but not
Elegantly supporting both these options is primarily a challenge in
decomposition and style - there is no one 'correct' way to do it.
You are free to make whatever modifications you want to the
program's functions, their parameters and return values. As a
reference, the sample solution modifies the
get_all_emails functions, although you are welcome to
pursue an alternative strategy.
You've just begun working for a company whose goal is to help users make new friends based on what they like to watch and read. Users input information about themselves, such as their Netflix history and favorite books. Your program will use this information to calculate a 'compatibility score' between two people, which serves as an estimate of how likely those people are to get along with one another. The compatibility score of two people is calculated as follows:
compatibility = % (books liked in common) + % (shows on Netflix liked in common)
In this problem, we'll represent the books and movies liked by a particular user as separate lists. Given the lists representing, for example, the books liked by two different users, we find the number of elements present in both lists and divide it by the sum of the lengths of the two lists. To that end, your first job is to implement the following function:
def in_common(l1, l2)
which takes in two lists of strings and returns the number of elements the
two lists have in common divided (using float division) by the total number
of elements in both lists. For example,
percent_in_common(['a', 'b', 'c', 'd'], ['c', 'd', 'm', 'n', 'x', 'z'])
0.2, because both lists contain
'd' and there are 4 elements in the first list and 6 in the second.
Next, implement the following function:
def calc_score(netflix_history1, netflix_history2, fav_books1, fav_books2)
which takes the names and preferences of two users and returns their
compatibility score. The compatibility score between two users is the
fraction of shows on netflix in common + the fraction books in common,
using the calculation you implemented in the
function. You may assume that there are no repeated elements in any of
Finally, implement the following function to predict for a particular user which user they will be the most compatible with:
def new_friend(name_list, compatibility_scores)
which takes in a list of names of all other users and a list of compatibility
scores between the chosen user and all other users, and returns a list where the first
element is the name of the user who is most compatible and the
second element is their compatibility score.
name_list stores the
name for each user at the same index as the
stores the corresponding compatabiity score. For example, for user
if we have
name_list = ['Michelle', 'Joe']
compability_scores = [1, 0.8], this means the the compatibility
score between Barack and Michelle is 1 and the compatibility score between Barack and Joe is only 0.8.
In this example,
new_friend(name_list, compatability_scores) would return
['Michelle', 1]. You may break ties between equally-compatible users arbitrarily.
Download the PyCharm project for this problem here.
Your job is to write a program that emulates the 3 calculator functions shown below:
$ python3 calculator.py -square 42 1764 # prints the square of the number passed in $ python3 calculator.py -exp 2 10 1024 # prints the first number raised to the power of the second number $ python3 calculator.py -add 1 2 3 4 5 15 # prints the sum of all the numbers typed in
You may assume that you are provided with a
that takes as a parameter the list of arguments typed in the console,
import sys def main(args): # your code here pass if __name__ == "__main__": main(sys.argv)
Thus, your job is to decompose and implement the
main function so that your program produces the sample
-squarewill be followed by two numbers,
-expwill be followed by two numbers and
-addwill be followed by at least one number.