Assignment 2

Logistics

Due Date

Monday, October 16 at 11:59 PM

Policies

Late Policy: All assignments and projects are due at 11:59pm on the due date. Each assignment and project may be turned in up to 24 hours late for a 10% penalty and up to 48 hours late for a 30% penalty. No assignments or projects will be accepted more than 48 hours late. Students have four free late days they may use to turn in work late with no penalty: four 24-hour periods, no pro-rating. This late policy is enforced without exception.

Honor Code: Under the Honor Code at Stanford, you are expected to submit your own original work for assignments, projects, and exams. On many occasions when working on assignments or projects (but never exams!) it is useful to ask others -- the instructor, the TAs, or other students -- for hints, or to talk generally about aspects of the assignment. Such activity is both acceptable and encouraged, but you must indicate on all submitted work any assistance that you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding, writing up, and being able to explain all work that you submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.

Datasets

This assignment makes use of the following two datasets. These datasets are also available on the Datasets page. You will only need a local copy of these files for Part 1. For Part 2, please make sure the datasets have been copied into your Instabase repository (see instructions below)

  • Titanic
    • Download link: Titanic.csv
    • Description: Data on passengers of the RMS Titanic. Entries include the name, age, class, fare, gender, and whether or not the passenger survived
    • Notes:
      • A blank entry for age means that the age is unknown
      • Fare can have more than two digits because money was not base-10 at that time Titanic Fare Data
  • World Cup
    • Download links: Players.csv, Teams.csv, PlayersExt.csv (joined dataset)
    • Description: 2010 World Cup data including last name, team, position, minutes played, and game statistics for each player (Players.csv) as well as world ranking, games played in tournaments, and game statistics for each team (Teams.csv). The joined tables can be found in PlayersExt.csv. Note: keep in mind that since the tables are joined, country data will show up for each player.
    • Notes:
      • Statistics, including yellowCards and redCards, are for the entire tournament (excluding final game).
      • Team ranking is the world ranking going into the tournament, so it may not be 1-32 even though there are only 32 teams

Setup Instructions

For this assignment, we will be using Instabase and Tableau. Instabase is a service that allows students to run Jupyter Notebooks without downloading any software on their computers. If you prefer to run Jupyter Notebooks locally (please make sure you're using a Python 2 kernel), you will still have to create an Instabase account for submission.

Instructions for Tableau:

  1. Download the two datasets by clicking on the links above.
  2. Open Tableau. If you have not yet installed Tableau, please read our Piazza post for instructions.
  3. To upload a CSV file to Tableau, on the left menu bar, click the "Text File" link under "Connect", navigate to the location of downloaded files, select the CSV file you wish to work with, and click "Open" to import. We'll only be using Tableau for part 1 of this assignment, so you only need to upload one of the datasets (your choice) to Tableau. You are encouraged to play around with and create visualizations for both datasets though.
  4. Once you've uploaded your CSV, the screen should display your data. In the bottom left corner of the screen, click on the tab that says "Sheet 1." Warning: this tab will say something different if you renamed the tab in your CSV file.
  5. You're now ready to start visualizing data!

If you want more info on how to use Tableau, check out the Tableau Tutorial.

Instructions for Instabase:

  1. Make sure your Instabase account is setup. If you have not yet done so, please read our Piazza post for instructions.
  2. You will need the files found at the following link copied to your personal Instabase account: Assignment 2. To copy files to your personal Instabase account, click on the drop-down menu above the folder listing, choose Select All, then click on the Copy link that appears. You should now have a private copy of the assignment to work on.
  3. Navigate to the repository where you copied the files to, then double-click on the notebooks folder. The folder should contain SQLBasicAssign.ipynb. Right-click or control-click on this file and select Open With > Jupyter. (If you simply double-click on it, it will show you the file but will not run Jupyter notebooks.) Sometimes it will take a minute or so for a new Jupyter server to start up on your behalf. Once it does, you are ready to go!

Submission Instructions

Submission for this assignment will require 2 parts: submission of your Tableau visualization through Canvas and submission of your Jupyter Notebook through Gradescope and Instabase.

Submitting your Tableau Visualization

For your visualization submission, you'll be using Canvas. Go to our Canvas course homepage; under Assignments > Assignment 2, you should be able to upload a file as your submission. We ask that you upload your entire Tableau workbook so that we may assess all the features you've included (to export a Tableau workbook, go to File > Export Packaged Workbook).

Submitting your Jupyter Notebook

For Notebook submission, we ask that you submit both a PDF of the Notebook through Gradescope and a copy of the Notebook through Instabase. We're requiring these steps so that we can both run your code and provide feedback (Instabase currently does not have the infrastructure to support instructor feedback on assignments).

Submitting through Gradescope

  1. Download your Jupyter Notebook as a PDF. With your Notebook pulled up in Instabase, go to File >> Download As >> PDF via LaTeX. Check that each of your answers is still there.
    No picture available
  2. Log into the Gradescope website (gradescope.com) using your Stanford email address. The TAs have already set up an account for you if you have not used Gradescope before.
  3. Click on the CS 102 class and click on Assignment 2.
  4. Upload your PDF and tag the pages corresponding to each question. You may submit as many times as you like before the submission deadline, and we will use your latest submission both for grading and for the late policy.

For more detailed instructions on submitting homework, take a look at the Gradescope FAQ.

Submitting through Instabase

Submitting through Instabase should be incredibly easy! At the bottom of each assignment, you will find a submission link and instructions. Please keep in mind that submissions are timestamped.



Part 1: Tableau Visualization (35 Points)

Create a visualization in Tableau using either World Cup Data or Titanic data (please only choose one and explore multiple aspects of the data). This is a very open-ended problem; you should experiment with different features in Tableau. Full points will be awarded to visualizations that meet all the requirements below. While we encourage you to create "detailed" visualizations, you will not be penalized for creating "basic" ones.

Requirements

Create one dashboard with the following:

Hints:

Part 2: Basic SQL (65 Points)

Complete the problems in SQLBasicAssign.ipynb. Please reference the setup instructions above to create your own copy of the assignment.