Assignment 5


Due Date: Saturday, June 6 at 11:59 PM

Late Policy: All assignments and projects are due at 11:59pm on the due date. For this assignment, you may use ONE LATE DAY to turn in late with no penalty. Sunday, June 7 at 11:59pm is the HARD DEADLINE, and no submissions will be accepted after this date. Due to the timing of the exam, solutions will be released right after the late deadline.


Honor Code: Under the Honor Code at Stanford, you are expected to submit your own original work for assignments, projects, and exams. On many occasions when working on assignments or projects (but never exams!) it is useful to ask others -- the instructor, the TAs, or other students -- for hints, or to talk generally about aspects of the assignment. Such activity is both acceptable and encouraged, but you must indicate on all submitted work any assistance that you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding, writing up, and being able to explain all work that you submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.


This assignment includes some familiar datasets from past assignments as well as some new ones. All of the data files are included in our assignment repo so there's no need to have them available locally. You'll find descriptions of the new datasets in the assignment notebooks.

Setup Instructions:

We will be using Instabase again. The entire assignment will consist of Jupyter Notebooks with exercises that employ the skills we've learned in class.

  1. You will need the files found at the following link copied to your personal Instabase account: Assignment 5. To copy files to your personal Instabase account, first select all of the files by pressing the "Shift" key while simultaneously clicking on each of the files. This will cause an "Actions" dropdown menu to appear above the file names. Click on this dropdown menu, choose "Copy To", and then copy the files over to your own private Instabase folder (you can create a new private folder at this step if you want). Once you've done this, you should have a private copy of the assignment to work on.
  2. Navigate to the folder where you copied the files to, and the folder should contain two separate notebooks: MiningPythonAssign.ipynb and NetworksAssign.ipynb. There is a different component of the assignment in each of these notebooks. Right-click or control-click on this file and select Open With > Jupyter. (If you simply double-click on it, it will show you the file but will not run Jupyter notebooks.) Sometimes it will take a minute or so for a new Jupyter server to start up on your behalf. Once it does, you are ready to go! In the notebook you will see clearly where you need to add code for the different steps of each problem.

Assignment Details:

There are two parts to this assignment: Data Mining and Network Analysis. There is a Jupyter Notebook to complete for each of these two parts, all of which can be found in the Assignment 5 class Instabase drive. Start early!

Part 1: Data Mining

See Notebook: MiningPythonAssign.ipynb

Dataset used: Movies.csv (not to be confused with Project #2 movies.tsv)

Part 2: Network Analysis

See Notebook: NetworksAssign.ipynb

Datasets used: Friends.csv, Follows.csv, Dolphins.csv, Dolphins2.csv, Follows2.csv

Submission Instructions:

There are two parts to submit on Gradescope: .ipynb file from part 1, and .ipynb file from 2.
  1. Download each of your Juypter Notebooks as .ipynb files. In the menu bar, choose File > Download As > Notebook (.ipynb). Note: Please make sure the download ends with .ipynb and not .json, .html, or another format. We will take points off if this is not the case. It may be easier to download these notebooks with Google Chrome if you are having trouble with the .ipynb format.
  2. Go to the CS102 class in Gradescope, and click on Assignment 5: Part 1 - MiningPythonAssign.ipynb. Upload the .ipynb file you downloaded.
  3. Go back to the Gradescope CS102 class, and click on Assignment 5: Part 2 - NetworksAssign.ipynb. Upload the .ipynb file you downloaded.
  4. Make sure that you submit .ipynb and not .json / html files!
  5. Lastly, please be sure to run all nodes of your .ipynb before submission, and delete all print statements you used for debugging, otherwise we will take points off! You can check your .ipynb submissions by pressing on your submission and clicking Code on the top right. This will visualize your iPython notebook and you can look over it carefully to make sure you don't have extra print statements or cells that were not run.