Mining Massive Data Sets
Winter 2018

Homework Submission Instructions

Identity management

Please use the same username (your SUNetID) for all of your coursework, so that we can give you appropriate credit for your work.

Example: If your Stanford email address is daffyduck@stanford.edu, then

If you are having trouble registering for these services under your SUNetID, please send an email to the course staff mailing list during the first week of class, so that we may make a note of it.

Gradiance quizzes

There will be weekly quizzes on Gradiance. All students (including SCPD students) should submit answers to Gradiance quizzes via the Gradiance website.

Biweekly homework

There will be four longer homework assignments. Homework assignments should be submitted on Gradescope as a PDF. In addition, you should upload all the code associated with your assignment on http://snap.stanford.edu/submit.

To register for Gradescope,

Students also need to upload their code at http://snap.stanford.edu/submit, prior to the assignment due date. Put all the code for a single question into a single file and upload it. Only files in text format (e.g. .txt, .py, .java) will be accepted.

Homework policies


Gradiance quizzes are generally out on Tuesdays and due on Thursdays, 9 days later. (Thursday 11:59pm Pacific time). Note that we cannot under any circumstances extend the quiz deadline. Once the deadline has passed students will not be able to submit their quizzes.

You can try the work as many times as you like, and we hope everyone will eventually get 100%. The secret is that each of the questions involves a "long-answer" problem, which you should work. The Gradiance system gives you random right and wrong answers each time you open it, and thus samples your knowledge of the full problem. While there are ways to game the system, we group several questions at a time, so it is hard to get 100% without actually working the problems. Also notice that you have to wait 10 minutes between openings, so brute-force random guessing will not work.

Solutions appear after the problem-set is due. However, you must submit at least once, so your most recent solution appears with the solutions embedded.


Four biweekly homeworks that will involve programming, working with Hadoop, as well as regular numerical/algebraic theory problems.

Questions: We try very hard to make questions unambiguous, but some ambiguities may remain. Ask (i.e., post a question on Piazza) if confused or state your assumptions explicitly. Reasonable assumptions will be accepted in case of ambiguous questions. As per the extra credit policy, you may receive extra credit for pointing out ambiguities in course material.

Honor code: We strongly encourage students to form study groups. Students may discuss and work on homework problems in groups. However, each student must write down the code and solutions independently, and without referring to written notes from the joint session. In other words, each student must understand the solution well enough in order to reconstruct it by him/herself. In addition, each student should write on the problem set the set of people with whom she/he interacted.

Since we occasionally reuse problem set questions from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code violation to intentionally refer to a previous year's solutions. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year.

Finally, we consider it an Honor Code Violation to post your homework solutions to a place where it is easy for other students to access it. This includes uploading your solutions to publicly-viewable repositories like on GitHub.

The standard penalty for a first offense includes a one-quarter suspension from the University and 40 hours of community service. And the standard penalty for multiple violations (e.g. cheating more than once in the same course) is a three-quarter suspension and 40 or more hours of community service. Stanford Office of Community Standards has more information.

Late assignments: Each student will have a total of two late periods to use for homeworks. A late period ends at midnight, on the day of each class (This means that if the assignment is due on Thursday then the late period expires on the following Tuesday midnight, 11:59pm Pacific Time.) No assignments will be accepted after the late period is due. Also note that we cannot under any circumstances extend the deadline of quizzes on Gradiance. Students cannot use late periods for quizzes on Gradiance.

Assignment submission: All students (SCPD and non-SCPD) should submit their assignments via Gradescope by 11:59PM on the due date. (We will allow a small 15 minute grace period, but beyond that and late periods, all deadlines are final.) You can typeset or scan your assignment, but you should upload a PDF rather than submitting as images.

Do not put code in your Gradescope submission. Also, please make sure to tag each part correctly on Gradescope so it is easier for us to grade. There will be a small point deduction for each mistagged page and for each question that includes code.

Regrade policy: Please read the regrade policy and guidelines.