Mining Massive Data Sets: Hadoop Labs
This course is designed to give students a practical understanding of the tools in the Hadoop ecosystem with a focus on understanding MapReduce and Spark.
The focus of this course is on the practical application of big data technologies, rather than on the theory behind them.
This is a partner course to CS246: Mining Massive Datasets
and includes limited additional assignments.
The course is adapted from the professional courses taught by Cloudera
Important course information will be posted on this web page and announced
in class. You are responsible for all material that appears here and should
check this page for updates frequently.
- 1/11: The first class will be held at 11:30 on Wednesday 1/11, in Skilling Auditorium.
We look forward to seeing you there!
- 1/12: We are organizing a VM clinic to help students set up their VMs. Daniel Templeton will be at the session, assisted by several other TAs. Time and Location: January 16 (coming Monday), 6PM to 9PM in Gates 415.
Wednesdays 11:30-13:20 in Skilling Auditorium
Daniel Templeton (daniel at cloudera dot com), Cloudera
Office Hours: By arrangement
Office Hours: Wednesdays 9-10am, Gates InfoLab
You Will Learn to
- Implement and debug complex data processing applications in Hadoop
- Use some of the tools in the Hadoop ecosystem for data mining and machine learning
- Apache Hadoop
- Apache Spark
- Apache Hive
- Apache Impala
- Apache Kafka
- Other ecosystem tools, e.g. Apache Sqoop, Apache Pig, etc.
This course will include eight weekly Gradiance quizzes to check that students are learning the concepts. Some of the quizzes will require students to complete short programming assignments to produce the answers. The Gradiance token for this class is 6A8C4765