Thursday, April 26 at 11:59 PM
Late Policy: All assignments and projects are due at 11:59pm on the due date. Each assignment and project may be turned in up to 24 hours late for a 10% penalty and up to 48 hours late for a 30% penalty. No assignments or projects will be accepted more than 48 hours late. Students have four free late days they may use to turn in work late with no penalty: four 24-hour periods, no pro-rating. This late policy is enforced without exception.
Honor Code: Under the Honor Code at Stanford, you are expected to
submit your own original work for assignments, projects, and exams. On many
occasions when working on assignments or projects (but never exams!) it is
useful to ask others -- the instructor, the TAs, or other students -- for
hints, or to talk generally about aspects of the assignment. Such activity
is both acceptable and encouraged, but you must indicate on all submitted
work any assistance that you received. Any assistance received that is not
given proper citation will be considered a violation of the Honor Code. In
any event, you are responsible for understanding, writing up, and being able
to explain all work that you submit. The course staff will pursue
aggressively all suspected cases of Honor Code violations, and they will be
handled through official University channels.
This assignment makes use of the following two datasets. These datasets are also available on the Datasets page. You will only need a local copy of these files for Part 1. For Part 2, please make sure the datasets have been copied into your Instabase repository (see instructions below)
- Download link: Titanic.csv
- Description: Data on passengers of the RMS Titanic. Entries include the name, age, class, fare, gender, and whether or not the passenger survived
- A blank entry for age means that the age is unknown
- Fare can have more than two digits because money was not base-10 at that time Titanic Fare Data
- Download links: Players.csv, Teams.csv, PlayersExt.csv (joined dataset)
- Description: 2010 World Cup data including last name, team, position, minutes played, and game statistics for each player (Players.csv) as well as world ranking, games played in tournaments, and game statistics for each team (Teams.csv). The joined tables can be found in PlayersExt.csv. Note: keep in mind that since the tables are joined, country data will show up for each player.
- Statistics, including yellowCards and redCards, are for the entire tournament (excluding final game).
- Team ranking is the world ranking going into the tournament, so it may not be 1-32 even though there are only 32 teams
For this assignment, we will be using Instabase and Tableau. Instabase is a service that allows students to run Jupyter Notebooks without downloading any software on their computers. If you prefer to run Jupyter Notebooks locally (please make sure you're using a Python 2 kernel), you will still have to create an Instabase account for submission.
Instructions for Tableau:
- Download the two datasets by clicking on the links above.
- Open Tableau. If you have not yet installed Tableau, please read our Piazza post for instructions.
- To upload a CSV file to Tableau, on the left menu bar, click the "Text File" link under "Connect", navigate to the location of downloaded files, select the CSV file you wish to work with, and click "Open" to import. We'll only be using Tableau for part 1 of this assignment, so you only need to upload one of the datasets (your choice) to Tableau. You are encouraged to play around with and create visualizations for both datasets though.
- Once you've uploaded your CSV, the screen should display your data. In the bottom left corner of the screen, click on the tab that says "Sheet 1." Warning: this tab will say something different if you renamed the tab in your CSV file.
- You're now ready to start visualizing data!
If you want more info on how to use Tableau, check out the Tableau Tutorial.
Instructions for Instabase:
- Make sure your Instabase account is setup. If you have not yet done so, please read our Piazza post for instructions.
- You will need the files found at the following link copied to your personal Instabase account: Assignment 2. To copy files to your personal Instabase account, click on the drop-down menu above the folder listing, choose Select All, then click on the Copy link that appears. You should now have a private copy of the assignment to work on.
- Navigate to the repository where you copied the files to, then double-click on the notebooks folder. The folder should contain SQLAssign.ipynb. Right-click or control-click on this file and select Open With > Jupyter. (If you simply double-click on it, it will show you the file but will not run Jupyter notebooks.) Sometimes it will take a minute or so for a new Jupyter server to start up on your behalf. Once it does, you are ready to go!
Submission for this assignment will require 3 parts: submission of your Tableau visualization through Instabase, and submission of your Jupyter Notebook through Instabase, and submission of your PDF with your notebook through Gradescope.
Submitting your Tableau Visualization
For your visualization submission, you'll be using both instabase and gradescope. Please submit your Tableau Workbooks at this link. We ask that you upload your entire Tableau workbook so that we may assess all the features you've included (to export a Tableau workbook, go to File > Export Packaged Workbook). In your gradescope submission, please include your instabase ID in your PDF somewhere, and tag that under the tableau question. This is because we can't give comments and grades in instabase, so we'll make comments on gradescope.
Submitting your Jupyter Notebook
For Notebook submission, we ask that you submit both a PDF of the Notebook through Gradescope and a copy of the Notebook through Instabase. We're requiring these steps so that we can both run your code and provide feedback (Instabase currently does not have the infrastructure to support instructor feedback on assignments).
Submitting through Gradescope
Download your Jupyter Notebook as a PDF. With your Notebook pulled up in Instabase, go to File >>
Download As >> PDF via LaTeX. Check that each of your answers is still there.
- Log into the Gradescope website (gradescope.com) using your Stanford email address. The TAs have already set up an account for you if you have not used Gradescope before.
- Click on the CS 102 class and click on Assignment 2.
- Upload your PDF and tag the pages corresponding to each question. You may submit as many times as you like before the submission deadline, and we will use your latest submission both for grading and for the late policy.
For more detailed instructions on submitting homework, take a look at the Gradescope FAQ.
Submitting through Instabase
For this assignment, submit your Jupyter notebooks at this link.
Part 1: Tableau Visualization (15 Points)
Using either the World Cup (PlayersExt.csv) or Titanic (Titanic.csv) data, create an interactive data visualization dashboard using Tableau (please only choose one and explore multiple aspects of the data). This is a very open-ended problem; you should experiment with different features in Tableau. Full points will be awarded to visualizations that meet all the requirements below. While we encourage you to create "detailed" visualizations, you will not be penalized for creating "basic" ones.
Create one dashboard with the following:
- At least three panes in one dashboard.
- At least three different visualization types in the dashboard. (e.g. bar chart, scatterplot, map, etc.)
- Each visualization should illustrate different relationships (e.g., don't just make a pie chart and a bar chart from the same data).
- One pane should interactively drive all of the others (see hints below).
- Every visualization should "make sense".
- Examples of visualizations that don't make sense: red cards per position in the World Cup data (the data only has red and yellow cards per team), sum of classes in the Titanic data (classes are more like a category than a number; see hint below).
- Sometimes, categories are represented as numbers (e.g., Class in the Titanic data; Games, Red Cards, or Yellow Cards in the World Cup data). Tableau will automatically process these attributes as numeric, which is not something we want! To change a numeric attribute to a category, drag it from Measures to Dimensions.
- To use a map for teams, in the Team drop-down menu change Geographic role to Country/Region
- Make good use of the Tableau documentation. Tableau has a wealth of tutorials, like this one on Building a Dashboard, that can help you learn how to fulfill the requirements above. NOTE: you may have to create an account again when trying to view these tutorials. Tableau is still a fairly new product and has some glitches. Use the same credentials as when you downloaded Tableau Desktop
Part 2: SQL (50 Points)
Complete the problems in SQLAssign.ipynb. Please reference the setup instructions above to create your own copy of the assignment.