Mining Massive Data Sets: Hadoop Labs
This course is designed to give students a practical understanding of the tools in the Hadoop ecosystem with a focus on understanding MapReduce anh Spark.
The focus of this course is on the practical application of big data technologies, rather than on the theory behind them.
This is a partner course to CS246: Mining Massive Datasets
and includes limited additional assignments.
The course is adapted from the professional courses taught by Cloudera
Important course information will be posted on this web page and announced
in class. You are responsible for all material that appears here and should
check this page for updates frequently.
- 1/5: The first class will be held at 15:00 on Tuesday 1/5, in NVidia Auditorium, Jen-Hsun Huang Engineering Center.
We look forward to seeing you there!
- 1/12: First quiz is posted. It is due by <blink>beginning of class next week</blink>
- 1/19: First bundle of optional homework labs are posted.
- 1/19: Second quiz is posted.
- 1/26: Third quiz is posted.
- 2/2: Fourth quiz is posted.
- 2/10: Fifth quiz is posted.
- 2/18: Sixth quiz is posted.
- 2/24: Seventh quiz is posted.
- 2/24: Eighth quiz is posted.
Tuesdays 15:00-16:20 in NVidia Auditorium, Jen-Hsun Huang Engineering Center.
Watch video lectures on SCPD (any Stanford student can see them here).
Daniel Templeton, Cloudera
Office Hours: By arrangement
Office Hours: Wednesdays 9-10am, Gates InfoLab Lab
You Will Learn to
- Implement and debug complex data processing applications in Hadoop
- Use some of the tools in the Hadoop ecosystem for data mining and machine learning
- Apache Hadoop
- Apache Spark
- Apache Hive
- Apache Impala
- Apache Pig
- Other ecosystem tools, e.g. Apache Sqoop, Apache Spark, etc.
You can reach us at firstname.lastname@example.org
Use Piazza to post class related questions: https://piazza.com/stanford/winter2016/cs246h/home