Assignment 2 - SQL

Due Date: Sunday April 23 at 11:59 PM

Honor Code: Under the Honor Code at Stanford, you are expected to submit your own original work for assignments, projects, and exams. On many occasions when working on assignments or projects (but never exams!) it is useful to ask others -- the instructor, the TAs, or other students -- for hints, or to talk generally about aspects of the assignment. Such activity is both acceptable and encouraged, but you must indicate on all submitted work any assistance that you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding, writing up, and being able to explain all work that you submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.

Datasets: This assignment uses the Kickstarter and CountryContinents datasets from the previous assignment as well as 2 new datasets, Football and Schoolkids.

Setup Instructions:
  1. Download the following zipped file here.
  2. This zipped folder contains the 4 following files
    • assignment2.ipynb
    • Kickstarter2017.csv
    • CountryContinent.csv
    • Football_Schoolkids.db
  3. Ensure that these 4 files are in the same directory when running your jupyter notebook.
  4. You will need to install Pandas for this assignment:
    • Run "pip install pandas" in your terminal.
    • Else if you are using Anaconda, run "conda install pandas" instead.
    • For Windows users, run "pip install pandas" in command prompt.
Submission Instructions: You must submit your completed notebook assignment2.ipynb via Canvas under Assignment 2.

Part 1: SQL on Kickstarter data (3 points per problem)

These questions are exactly the same as the previous assignment. For each problem, write a SQL query to find the answer. Make sure your query is general enough to work on any data in the tables, not just the current data.

Problem 1 Which was the project with the smallest goal? Return the name of the project and the goal amount.

Problem 2 What is the total sum of goal amounts for projects where the pledged amount was greater or equal than the goal amount? Return the total sum.

Problem 3 Which was the most ambitious project (highest goal) where the pledged amount was greater or equal than the goal amount? Return the project's name and goal amount.

Problem 4 Which country has the highest percentage of successful projects (status = successful)? Return the country's name and the percentage rate of success.

Problem 5 Which are the top 5 most popular categories (From highest to lowest) in terms of number of backers? Return the 5 categories along with the number of backers for each category.

Problem 6 What is the average pledge amount by continent? Return the average pledge amount for Europe, North America and Oceania.

Part 2: SQL on Football data (3 points per problem)

For each problem, write a SQL query to find the answer. Make sure your query is general enough to work on any data in the tables, not just the current data.

Problem 7. Find all games where Seattle was the home or away team. For each game, return the two teams and scores.

Problem 8. Find all games where the home team scored more than 40 points and the away team scored fewer than 10 points. For each game, return the home and away team, their scores, and the prediction for the game; return the result sorted by the home-team score in descending order.

Problem 9. Find all teams who lost by more than 20 points as the home team during the first four weeks of some year (i.e., Week < 5). Return the teams in alphabetical order and return each team only once.

Problem 10. Find the average prediction for games where the home team won.

Problem 11. In each of the three years (1998, 1999, 2000), how many games were there where the home team was predicted to win, i.e., prediction > 0, and the actual point spread (Homescore - Awayscore) was more than three times the prediction? Your result should have three tuples, each containing the year and number of games.

Problem 12. Find all pairs of teams where the two teams played each other in 1998 and 1999 in the same configuration (the same team was home and the same team was away), and in 1998 the home team won while in 1999 the away team won. Return the pairs of teams.

Part 3: SQL on Schoolkids data (3 points per problem)

For each problem, write a SQL query to find the answer. Make sure your query is general enough to work on any data in the tables, not just the current data.

Problem 13. Find all students whose goal is Sports but they ranked Sports as fourth in impact on popularity. Return the entire tuple for each such student.

Problem 14. Count how many 4th graders think Looks are more important than Grades. Return just one number.

Problem 15. For each of the three goals (Grades, Sports, Popular), what is the average age of students who have that goal? Your result should have three tuples, one each with the goal and the average age for that goal.

Problem 16. Find all 4th graders in Urban schools where there is no 5th grader in their same school who has the same goal and same ranking of popularity impact factors. Return each student's grade, school, and goal.

Problem 17. Find the oldest students. For each one return their gender, grade, age, and school.