Vincent Duong
David Hallac
Trevor Hebert
Baharan Mirzasoleiman
Andreas Paepcke
Róbert Pálovics
Anand Rajaraman
Rok Sosic
Hongwei Wang
Rex Ying


What is this course about?

CS341 is an advanced project based course, framed as the natural continuation of CS246 - Mining Massive Data Sets. Students will work on Data Mining and Machine Learning algorithms for analyzing very large amounts of data. Both interesting datasets as well as computational infrastructure (Google Cloud) will be provided to the students by the course staff and mentors.


Students are expected to have knowledge and familiarity with concepts covered in CS246 or similar classes (Hadoop, Spark, large scale data mining and machine learning algorithms, etc.) Other courses that might be helpful are: CS221, CS224N, CS224W, CS228, CS229, CS276, EE364A.

Reference Text

The following text is useful, but not required. It can be downloaded for free, or purchased from Cambridge University Press.
Leskovec-Rajaraman-Ullman: Mining of Massive Dataset


This schedule is subject to change — please check it regularly.

Date Description Course Materials Deadlines [due by 11:59 PM PDT]
Wed, Apr 3 Introduction + Google Cloud overview [slides]
Wed, Apr 10 Advanced ML in GCP (1) [slides]
Wed, Apr 17 Advanced ML in GCP (2) [slides]
Wed, Apr 24 Checkpoint 1: Presentations
Sun, Apr 28 Checkpoint 1: Report due
Wed, May 15 Checkpoint 2: Presentations
Sun, May 19 Checkpoint 2: Report due
Wed, Jun 5 Final Presentations
Sun, Jun 9 Final Report due