Mining Massive Data Sets: Hadoop Lab
Winter 2018

Course Information

Meeting Times and Locations

Wednesday 11:30 - 13:20 in Skilling Auditorium

Course description

This course will talk about Apache Hadoop and the projects in the Hadoop ecosystem in the context of (massive scale) data mining. The objective is to give students a foundation in Apache Hadoop and related technologies. The class will be lecture-only with a few small assignments to help students cement the concepts discussed in lecture.

Topics will include Apache Hadoop, MapReduce, HDFS, Apache Spark, Apache Hive, Apache Impala, Apache Sqoop, Apache Flume, and Apache Kafka.

CS246H is intended as a supplementary section to CS246.

The content of this class is derived largely from the Cloudera Developer Training for Spark and Hadoop and Cloudera Data Analyst Training , which are made available to Stanford through the Cloudera Academic Parntership program. Additional content has been developed specifically for this course in conjunction with Cloudera.

Course outline

Tentative list of topics to be covered. These topics may change slightly as the quarter progresses.


This course will include eight weekly Gradiance quizzes to check that students are learning the concepts. Some of the quizzes will require students to complete short programming assignments to produce the answers.

The Gradiance token for this class is 0FBAFFF0.

Gradiance quiz
Out on
Due on
Quiz 1: Writing MapReduce Jobs
Wed, January 17 13:20
Wed, January 24 11:30
Quiz 2: Spark
Wed, January 24 13:20
Wed, January 31 11:30
Quiz 3: DataFrames
Wed, January 31 13:20
Wed, February 7 11:30
Quiz 4: Spark Streaming
Wed, February 7 13:20
Wed, February 14 11:30
Quiz 5: SparkML
Wed, February 14 13:20
Wed, February 21 11:30
Quiz 6: SQL on Hadoop
Wed, February 22 13:20
Wed, March 1 11:30
Quiz 7: Data Management
Wed, February 28 13:20
Wed, March 7 11:30
Quiz 8: SQL and Data Managament
Tue, March 7 13:20
Tue, March 14 11:30
Bonus Quiz!
Tue, March 15 00:00
Tue, March 22 23:59

Course materials

Slides will be posted on-line.

Course handouts and other reading materials can be downloaded here.

Course work and grading

The coursework for the course will consist of:


General course questions should be posted Piazza.

If you need to reach the course staff, you can reach us at cs246h-win1718-staff@lists.stanford.edu (consists of the TAs and the professor).