Assignment 2

Logistics

Due Date

Thursday, Jan 31 at 11:59 PM

Policies

Late Policy: All assignments and projects are due at 11:59pm on the due date. Each assignment and project may be turned in up to 24 hours late for a 10% penalty and up to 48 hours late for a 30% penalty. No assignments or projects will be accepted more than 48 hours late. Students have five free late days they may use to turn in work late with no penalty: four 24-hour periods, no pro-rating. This late policy is enforced without exception.

Honor Code: Under the Honor Code at Stanford, you are expected to submit your own original work for assignments, projects, and exams. On many occasions when working on assignments or projects (but never exams!) it is useful to ask others -- the instructor, the TAs, or other students -- for hints, or to talk generally about aspects of the assignment. Such activity is both acceptable and encouraged, but you must indicate on all submitted work any assistance that you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding, writing up, and being able to explain all work that you submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.

Datasets

This assignment makes use of the following two datasets. These datasets are also available on the Datasets page. You will only need a local copy of these files for Part 1. For Part 2, please make sure the datasets have been copied into your Instabase repository (see instructions below)

  • Titanic
    • Download link: Titanic.csv
    • Description: Data on passengers of the RMS Titanic. Entries include the name, age, class, fare, gender, and whether or not the passenger survived.
    • Notes:
      • A blank entry for age means that the age is unknown.
      • Fare can have more than two digits because money was not base-10 at that time: Titanic Fare Data
  • World Cup
    • Download links: Players.csv, Teams.csv, PlayersExt.csv
    • Description: 2010 World Cup data including last name, team, position, minutes played, and game statistics for each player (Players.csv) as well as world ranking, games played in tournaments, and game statistics for each team (Teams.csv). The joined tables can be found in PlayersExt.csv. Note: keep in mind that since the tables are joined, country data will show up for each player.
    • Notes:
      • Statistics, including yellowCards and redCards, are for the entire tournament (excluding the final game).
      • Team ranking is the world ranking going into the tournament, so it may not be 1-32 even though there are only 32 teams.

Setup Instructions

For this assignment, we will be using Instabase and Tableau. Instabase is a service that allows students to run Jupyter Notebooks without downloading any software on their computers. If you prefer to run Jupyter Notebooks locally (make sure you're using a Python 2 kernel), you may do so: using Instabase is highly recommended for consistency.

Instructions for Tableau:

  1. Download the two datasets by clicking on the links in the above section.
  2. Open Tableau. If you have not yet installed Tableau, please read our Piazza post for instructions.
  3. To upload a CSV file to Tableau, on the left menu bar, click the "Text File" link under "Connect", navigate to the location of downloaded files, select the CSV file you wish to work with, and click "Open" to import. We'll only be using Tableau for part 1 of this assignment, so you only need to put one of the datasets (your choice) into Tableau. You are encouraged to play around with and create visualizations for both datasets!
  4. Once you've uploaded your CSV, the screen should display your data. In the bottom left corner of the screen, click on the tab that says "Sheet 1." Warning: this tab will say something different if you renamed the tab in your CSV file.
  5. You're now ready to start building your visualization!

If you would like more info on how to use Tableau, check out the Tableau Tutorial.

Instructions for Instabase:

  1. Make sure your Instabase account is setup. If you have not yet done so, please read our Piazza post for setup instructions.
  2. You will need the files found at the following link copied to your personal Instabase account: Assignment 2. To copy files to your personal Instabase account, click on the drop-down menu above the folder listing, choose Select All, then click on the Copy link that appears. You should now have a private copy of the assignment to work on.
  3. Navigate to the personal repository where you copied the files to, then double-click on the notebooks folder. The folder should contain SQLAssign.ipynb. Right-click or control-click on this file and select Open With > Jupyter. (If you simply double-click on it, it will show you the file but will not run Jupyter notebooks.) Sometimes it will take a minute or so for a new Jupyter server to start up on your behalf. Once it does, you are ready to go!

Submission Instructions

Submission for this assignment will require 3 parts: your Tableau visualization file, your Jupyter Notebook code, and a PDF of your notebook.

Submitting your Tableau Visualization

For your visualization submission, you'll be uploading your Tableau visualization file to Gradescope. To download your visualization, go to File -> Export Packaged Workbook. (You should get a .twbx file.) Upload this saved file to Gradescope Assignment 2: Tableau.

Submitting your Jupyter Notebook

For Notebook submission, we ask that you submit both a PDF of the Notebook and a copy of the Notebook file through Gradescope. We're requiring these steps so that we can both run your code and provide quick feedback.

Submitting through Gradescope

  1. Download your Jupyter Notebook as a PDF. With your Notebook pulled up in Instabase, open the print menu for your browser (File > Print). Change the printer to "Save to PDF", and print (this saves your Notebook as a PDF file). Check that all of your answers are still there.
  2. Also download your Juypter Notebook as a .ipynb file. In the same menu shown above, choose Download As, then Notebook (.ipynb).
  3. Go to the CS102 class in Gradescope and click on Assignment 2: SQL Notebook PDF.
  4. Upload your PDF and tag the pages corresponding to each question and your answer. You may submit as many times as you like before the submission deadline, and we will use your latest submission for both grading and the late policy.
  5. Go back to the Gradescope CS102 class and click on Assignment 2: SQL .ipynb File. Upload the .ipynb file you downloaded.


Part 1: Tableau Visualization (15 Points)

Using either the World Cup (PlayersExt.csv) or Titanic (Titanic.csv) data, create an interactive data visualization dashboard using Tableau (please only choose one and explore multiple aspects of the data). This is a very open-ended problem; you should experiment with different features in Tableau. Full points will be awarded to visualizations that meet all the requirements below. While we encourage you to create "detailed" visualizations, you will not be penalized for creating "basic" ones.

Requirements

Create one dashboard with the following:

Hints:

Part 2: SQL (50 Points)

Complete the problems in SQLAssign.ipynb. Please see the setup instructions above to create and edit your own copy of the assignment.