Late Policy: All assignments and projects are due at 11:59pm on the due date. Each assignment and project may be turned in up to 24 hours late for a 10% penalty and up to 48 hours late for a 30% penalty. No assignments or projects will be accepted more than 48 hours late. Students have four free late days they may use to turn in work late with no penalty: four 24-hour periods, no pro-rating. This late policy is enforced without exception.
Honor Code: Under the Honor Code at Stanford, you are expected to submit your own original work for assignments, projects, and exams. On many occasions when working on assignments or projects (but never exams!) it is useful to ask others -- the instructor, the TAs, or other students -- for hints, or to talk generally about aspects of the assignment. Such activity is both acceptable and encouraged, but you must indicate on all submitted work any assistance that you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding, writing up, and being able to explain all work that you submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.
Datasets: This assignment makes use of the following two datasets; more information is provided on the Datasets page, but what's here is sufficient for you to complete the assignment.
- National Football League stats ("Football data"). Each row captures one football game with the following information:
- Year - The year of the game
- Week - The week number within the year
- Home - The home team
- HomeScore - The home team's score in the game
- Away - The away team
- AwayScore - The away team's score in the game
- Prediction - The point-spread prediction: a pre-game estimate of HomeScore minus ActualScore
- Schoolchildren's goals and perceptions ("Schoolkids data"). Each row captures one student's demographic information, personal goal, and perception ranking:
- Gender - The student's gender (boy or girl)
- Grade - The student's grade level (4, 5, or 6)
- Age - The student's age
- School - The student's school name
- Type - The type of area in which the school is located (Rural, Suburban, or Urban)
- Goal - The student's most important goal (Grades, Popular, or Looks)
- Grades - A ranking between 1 (most important) and 4 (least important) indicating how important the student thinks grades are in affecting popularity
- Sports - A ranking between 1 and 4 indicating how important the student thinks sports skills are in affecting popularity
- Looks - A ranking between 1 and 4 indicating how important the student thinks looks are in affecting popularity
- Money - A ranking between 1 and 4 indicating how important the student thinks money is in affecting popularity
Setup Instructions: We recommend using Google Sheets for Problems 1-13. If you use Excel or another spreadsheet program we may not be able to help you with special features, and you should explain any nonstandard features that you use.
- Download the two datasets by clicking on the links above.
- Go to the Google Drive website. You may use your personal Gmail account to log in, or you can use your SUNet ID by entering your Stanford email address -- Google should automatically redirect you to WebLogin.
- To upload the CSV files, drag them from your desktop or file browser to the main Google Drive page.
- To open a newly uploaded CSV file, right-click the CSV, select "Open with" and choose "Google Sheets". After you open your CSV file in Google Sheets once, it will save a new version that opens automatically in Sheets from then on.
- You're now ready to solve the problems -- have fun!
- Canvas accounts have been set up automatically for you, so you can simply log in using your SUNet ID.
- For our grading system, it's critical that the answer to each problem is on a separate page. Answers spanning multiple pages are allowed, as long as the answer to the next problem starts on a new page. To make sure the format is correct, please use our template: Google Doc or Latex
- If you'd like to use the template locally (e.g., with Microsoft Word), click on File > Download as > Microsoft Word (.docx).
- If you'd like to use the template through Google Docs, click on File > Make a copy. In the dialog box, rename the document and ensure "Share it with the same people" is not checked. The template should now be copied to and editable on your drive.
- Your answers will consist of text for Problems 1-7, graphics for Problems 8-13, and a combination of text and graphics for Problems 14-15. For adding graphics, screenshots may be easiest, although in Google Sheets you can use the "Insert" button on a chart, then from the menu on the top-right corner of the inserted chart select "Copy chart".
- When you're ready to submit, save your final document as pdf, and please do check it over to make sure it looks how you expect and that each answer is indeed on its own page.
Part 1: Spreadsheet Data Analysis (3 points per problem)
For each of these problems you will submit an answer obtained by performing spreadsheet operations on the provided data, and a description of the how you manipulated the spreadsheet(s) to obtain your answer.
Problem 1 (Football data). Find the average difference between game predictions and actual outcomes, across all games. The HomeScore and AwayScore columns are the actual scores, while the Prediction column is an estimate of how much the home team will win or lose by, i.e., it's an estimate of HomeScore minus AwayScore. For example, if the Home team beats the Away team by 15 points and the Prediction is 10, then the difference between Prediction and outcome is -5; if the Home team loses to the Away team by 8 points and the Prediction is -2, then the difference between Prediction and outcome is 6. In arithmetic terms, Outcome = (HomeScore - AwayScore), and Difference = (Prediction - Outcome). Give a single positive or negative number for the average difference, then explain how you manipulated the spreadsheet to arrive at the number.
Problem 2 (Football data). Which weeks have the smallest and largest point spreads on average across all games played in that week across all three years? The point spread is the difference between the higher and lower actual scores, i.e., |HomeScore - AwayScore|, always a positive number. (Hint: You might want to use the =abs() function.) Give the smallest and largest point-spread weeks, along with the average point spread across all games played in that week in all years, then explain how you manipulated the spreadsheet to arrive at the results.
Problem 3 (Football data). Find the teams that scored the least and the most total points. Make sure to include both home and away games when computing total points. Give the two teams and their total points, then explain how you manipulated the spreadsheet to arrive at the answer.
Problem 4 (Schoolkids data). What is the overall ranking of Grades, Sports, Looks, and Money in perceived popularity impact across all of the students? Note that every row represents one student's ranking from 1 (most important) to 4 (least important) on how important the four factors (Grades, Sports, Looks and Money) are in impacting popularity. State the four factors in order of overall importance (the factor with the highest importance should be first), and explain how you manipulated the spreadsheet to arrive at the answer.
Problem 5 (Schoolkids data). Which area type puts the least emphasis overall on Looks: Rural, Suburban, or Urban? Give your answer and explain how you manipulated the spreadsheet to arrive at your answer.
Problem 6 (Schoolkids data). Find the school with the highest girl-to-boy ratio. Provide the school name and the ratio, and explain how you manipulated the spreadsheet to arrive at your answer.
Problem 7 (Schoolkids data). Considering only boys who are older than 10, what is their most common goal? Give the goal and the number of boys greater than 10 who have that goal, then explain how you manipulated the spreadsheet to arrive at the answer.
Part 2: Data Visualization (3 points per problem)
For each of these problems you will submit a graphic: a bar graph, pie chart, scatterplot, or other visualization. You may either copy and paste visualizations from Google Sheets into your document, or do a screenshot - either is fine. Problems 14 and 15 will also include a text section covering tool used and what the visualization is showing.
Problem 8 (Football data, bar graph). For Arizona, Atlanta, Baltimore, Buffalo, and Carolina, create a bar graph showing their average score as a home team and average score as an away team.
Problem 9 (Schoolkids data, bar graph). Create a stacked bar graph that shows, for each of the three school Types (Rural, Suburban, Urban), the total number of students divided by how many have the three goals (Sports, Popular, Grades). Your graph should have three bars, with each bar divided into three colors.
Problem 10 (Schoolkids data, pie chart). Create a pie chart showing the relative percentage of students attending schools that are Rural, Suburban, and Urban.
Problem 11 (Schoolkids data, pie chart). Create a pie chart that shows the relative percentage of students who rank as most important Grades, Sports, Looks, and Money.
Problem 12 (Football data, pie chart). Every game is either a home win (HomeScore > AwayScore), an away win (HomeScore < AwayScore), or a tie (HomeScore = AwayScore). Create a pie chart that shows the relative percentage of home wins, away wins, and ties across all games played. (Hint: you may find the =if() function helpful.)
Problem 13 (Football data, scatterplot). Create a scatterplot where the x-axis is the Prediction and the y-axis is the actual outcome (HomeScore minus AwayScore). There should be one dot in the scatterplot for every game played. What do you conclude from the scatterplot?
Problems 14-15 (Personal choice). Create two interesting visualizations of your own. You may use the Football and/or Schoolkids dataset, and you may use Google Sheets, Raw, Excel, or any other tool of your choosing. For each one, mention what tool you used, and explain what your visualization is showing.