Thursday, Jan 31 at 11:59 PM
Late Policy: All assignments and projects are due at 11:59pm on the due date. Each assignment and project may be turned in up to 24 hours late for a 10% penalty and up to 48 hours late for a 30% penalty. No assignments or projects will be accepted more than 48 hours late. Students have five free late days they may use to turn in work late with no penalty: four 24-hour periods, no pro-rating. This late policy is enforced without exception.
Honor Code: Under the Honor Code at Stanford, you are expected to
submit your own original work for assignments, projects, and exams. On many
occasions when working on assignments or projects (but never exams!) it is
useful to ask others -- the instructor, the TAs, or other students -- for
hints, or to talk generally about aspects of the assignment. Such activity
is both acceptable and encouraged, but you must indicate on all submitted
work any assistance that you received. Any assistance received that is not
given proper citation will be considered a violation of the Honor Code. In
any event, you are responsible for understanding, writing up, and being able
to explain all work that you submit. The course staff will pursue
aggressively all suspected cases of Honor Code violations, and they will be
handled through official University channels.
This assignment makes use of the following two datasets. These datasets are also available on the Datasets page. You will only need a local copy of these files for Part 1. For Part 2, please make sure the datasets have been copied into your Instabase repository (see instructions below)
- Download link: Titanic.csv
- Description: Data on passengers of the RMS Titanic. Entries include the name, age, class, fare, gender, and whether or not the passenger survived.
- A blank entry for age means that the age is unknown.
- Fare can have more than two digits because money was not base-10 at that time: Titanic Fare Data
- Download links: Players.csv, Teams.csv, PlayersExt.csv
- Description: 2010 World Cup data including last name, team, position, minutes played, and game statistics for each player (Players.csv) as well as world ranking, games played in tournaments, and game statistics for each team (Teams.csv). The joined tables can be found in PlayersExt.csv. Note: keep in mind that since the tables are joined, country data will show up for each player.
- Statistics, including yellowCards and redCards, are for the entire tournament (excluding the final game).
- Team ranking is the world ranking going into the tournament, so it may not be 1-32 even though there are only 32 teams.
For this assignment, we will be using Instabase and Tableau. Instabase is a service that allows students to run Jupyter Notebooks without downloading any software on their computers. If you prefer to run Jupyter Notebooks locally (make sure you're using a Python 2 kernel), you may do so: using Instabase is highly recommended for consistency.
Instructions for Tableau:
- Download the two datasets by clicking on the links in the above section.
- Open Tableau. If you have not yet installed Tableau, please read our Piazza post for instructions.
- To upload a CSV file to Tableau, on the left menu bar, click the "Text File" link under "Connect", navigate to the location of downloaded files, select the CSV file you wish to work with, and click "Open" to import. We'll only be using Tableau for part 1 of this assignment, so you only need to put one of the datasets (your choice) into Tableau. You are encouraged to play around with and create visualizations for both datasets!
- Once you've uploaded your CSV, the screen should display your data. In the bottom left corner of the screen, click on the tab that says "Sheet 1." Warning: this tab will say something different if you renamed the tab in your CSV file.
- You're now ready to start building your visualization!
If you would like more info on how to use Tableau, check out the Tableau Tutorial.
Instructions for Instabase:
- Make sure your Instabase account is setup. If you have not yet done so, please read our Piazza post for setup instructions.
- You will need the files found at the following link copied to your personal Instabase account: Assignment 2. To copy files to your personal Instabase account, click on the drop-down menu above the folder listing, choose Select All, then click on the Copy link that appears. You should now have a private copy of the assignment to work on.
- Navigate to the personal repository where you copied the files to, then double-click on the notebooks folder. The folder should contain SQLAssign.ipynb. Right-click or control-click on this file and select Open With > Jupyter. (If you simply double-click on it, it will show you the file but will not run Jupyter notebooks.) Sometimes it will take a minute or so for a new Jupyter server to start up on your behalf. Once it does, you are ready to go!
Submission for this assignment will require 3 parts: your Tableau visualization file, your Jupyter Notebook code, and a PDF of your notebook.
Submitting your Tableau Visualization
For your visualization submission, you'll be uploading your Tableau visualization file to Gradescope. To download your visualization, go to File -> Export Packaged Workbook. (You should get a .twbx file.) Upload this saved file to Gradescope Assignment 2: Tableau.
Submitting your Jupyter Notebook
For Notebook submission, we ask that you submit both a PDF of the Notebook and a copy of the Notebook file through Gradescope. We're requiring these steps so that we can both run your code and provide quick feedback.
Submitting through Gradescope
- Download your Jupyter Notebook as a PDF. With your Notebook pulled up in Instabase, open the print menu for your browser (File > Print). Change the printer to "Save to PDF", and print (this saves your Notebook as a PDF file). Check that all of your answers are still there.
- Also download your Juypter Notebook as a .ipynb file. In the same menu shown above, choose Download As, then Notebook (.ipynb).
- Go to the CS102 class in Gradescope and click on Assignment 2: SQL Notebook PDF.
- Upload your PDF and tag the pages corresponding to each question and your answer. You may submit as many times as you like before the submission deadline, and we will use your latest submission for both grading and the late policy.
- Go back to the Gradescope CS102 class and click on Assignment 2: SQL .ipynb File. Upload the .ipynb file you downloaded.
Part 1: Tableau Visualization (15 Points)
Using either the World Cup (PlayersExt.csv) or Titanic (Titanic.csv) data, create an interactive data visualization dashboard using Tableau (please only choose one and explore multiple aspects of the data). This is a very open-ended problem; you should experiment with different features in Tableau. Full points will be awarded to visualizations that meet all the requirements below. While we encourage you to create "detailed" visualizations, you will not be penalized for creating "basic" ones.
Create one dashboard with the following:
- At least three different visualization types in the dashboard. (e.g. bar chart, scatterplot, map, etc.)
- Each visualization should illustrate different relationships (e.g., don't just make a pie chart and a bar chart from the same data).
- One pane should interactively drive all of the others (see hints below).
- Every visualization should "make sense".
- Examples of visualizations that don't make sense: red cards per position in the World Cup data (the data only has red and yellow cards per team), sum of classes in the Titanic data (classes are more like a category than a number; see hint below).
- Sometimes, categories are represented as numbers (e.g., Class in the Titanic data; Games, Red Cards, or Yellow Cards in the World Cup data). Tableau will automatically process these attributes as numeric, which is not something we want! To change a numeric attribute to a category, drag it from Measures to Dimensions.
- If you use PlayersExt.csv, remember that all team info (Ranking, Wins, Losses, etc.) is repeated for each player on the team.
- To use a map for teams, in the Team drop-down menu change Geographic role to Country/Region.
- Make good use of Tableau's documentation. Tableau has a wealth of tutorials, like this one on Building a Dashboard, that can help you learn how to fulfill the requirements above. NOTE: you may have to create an account again when trying to view these tutorials. Use the same credentials as when you downloaded Tableau Desktop.
Part 2: SQL (50 Points)
Complete the problems in SQLAssign.ipynb. Please see the setup instructions above to create and edit your own copy of the assignment.