Mining Massive Data Sets: Hadoop Lab
Winter 2016

Course Information

Meeting Times and Locations

Tuesday 15:00 - 16:20 in NVidia Auditorium, Jen-Hsun Huang Engineering Center

Course description

This course will talk about Apache Hadoop and the projects in the Hadoop ecosystem in the context of (massive scale) data mining. The objective is to give students a foundation in Apache Hadoop and related technologies. The class will be lecture-only with a few small assignments to help students cement the concepts discussed in lecture.

Topics will include Apache Hadoop, MapReduce, HDFS, Apache Spark, Apache Hive, Apache Impala, Apache Sqoop, and Apache Flume.

CS246H is intended as a supplementary section to CS246.

The content of this class is derived largely from the Cloudera Developer Training for Spark and Hadoop and Cloudera Data Analyst Training , which are made available to Stanford through the Cloudera Academic Parntership program. Additional content has been developed specifically for this course in conjunction with Cloudera.

Course outline

Tentative list of topics to be covered. These topics may change slightly as the quarter progresses.


This course will include eight weekly Gradiance quizzes to check that students are learning the concepts. Some of the quizzes will require students to complete short programming assignments to produce the answers.

The Gradiance token for this class is FB710317.

Gradiance quiz
Out on
Due on
Quiz 1: Writing MapReduce Jobs
Tue, January 12 16:20
Tue, January 19 15:00
Quiz 2: Advanced MapReduce and Hadoop streaming
Tue, January 19 16:20
Tue, January 26 15:00
Quiz 3: Hive and Impala
Tue, January 26 16:20
Tue, February 9 15:00
Quiz 4: Data Formats and Spark
Tue, February 2 16:20
Tue, February 9 15:00
Quiz 5: Spark
Tue, February 10 00:00
Tue, February 16 15:00
Quiz 6: Spark II
Tue, February 17 00:00
Tue, February 25 15:00
Quiz 7: Spark III
Tue, February 23 00:00
Tue, March 1 15:00
Quiz 8: Spark III
Tue, March 1 00:00
Tue, March 8 15:00

Course materials

Slides will be posted on-line.

Course handouts and other reading materials can be downloaded here.

Course work and grading

The coursework for the course will consist of:


General course questions should be posted Piazza.

If you need to reach the course staff, you can reach us at cs246h-win1415-staff@lists.stanford.edu (consists of the TAs and the professor).