CS345A:
Data Mining
Winter 2010
Course information:
Instructors:
Jure Leskovec
Office Hours: Wednesdays 9-10am, Gates 418
Anand Rajaraman
Office Hours: Tuesday/Thursday 5:30-6:30pm (after the class in the same room)
Room:
Tuesday, Thursday 4:15PM - 5:30PM in 200-203 (History Corner).
Teaching assistants:
Abhishek Gupta (abhig@cs.stanford.edu).
Office Hours: Mon 3.30-5 PM Gates B26A, Fri 3.30-5 pm Gates B24A
Roshan Sumbaly (rsumbaly@cs.stanford.edu).
Office Hours: Monday 1PM-2.15PM in Gates B24B / Pup Cluster
Staff mailing list:
You can reach us at cs345a-win0910-staff@lists.stanford.edu
Prerequisites:
CS145 or equivalent.
Materials:
Readings have been derived from the book Mining of Massive Datasets. Also you will find Chapter 20.2, 22 and 23 of the second edition of Database Systems: The Complete Book (Garcia-Molina, Ullman, Widom) relevant. Slides from the lectures will be made available in PDF format.
Students will use the Gradiance automated homework system for which a fee will be charged. Note: if you already have Gradiance (GOAL) privileges from CS145 or CS245 within the past year, you should also have access to the CS345A homework without paying an additional fee. Notes and/or slides will be posted on-line.
You can see earlier versions of the notes and slides covering 2008/09 CS345a Data Mining. Not all these topics will be covered this year.
Requirements:
There will be periodic homeworks (some on-line, using the Gradiance system), a final exam, and a project on web-mining. The homework will count just enough to encourage you to do it, about 20%. The project and final will account for the bulk of the credit, in roughly equal proportions.
Projects:
Course outline
See Handouts for a list of topics and reading materials.
Announcements:
- 1/5: The first class will be held on Tuesday 1/5, in Hewlett 201. See you there!
- 1/11: Class room changed to 200-203 (History Corner) starting this Tuesday!
- 1/11: Important Dates for Assignments, Final Project and Finals announced!
- 1/13: Setup Gradiance account as described in Assignments section!
- 1/25: Assignment1 is out! Assignment1
- 2/8: Challenge Problem 2 is out! Challenge Problem 2
- 2/22: Grades for Assignment 1 and Challenge Problem 2 are out!
- 2/22: Assignment 1 Q3 solutions are out! Assignment1-Q3 Solutions
- 2/22: Challenge Problem 2 Solutions are out! Challenge Problem 2 Solutions
- 1/3: Finals will be held on March 18th from 12:15 pm to 3:15 pm in Herrin T175
- 1/3: Challenge Problem 3 is out! Challenge Problem 3
- 3/3: Poster presentation: It will be held on 16th March from 3.30 - 6.30 pm in the Gates basement. More info
- 3/3: Alternate final exam will be held on 18th march from 9 am to 12 noon. More info
- 3/14: Finals 2009
Important Dates
Challenge Problems (in addition to the Gradiance homeworks):
Gradiance homeworks:
- Homework 1: Past Due, Due on 1/13
- Homework 2: Past Due, Due on 1/20
- Homework 3: Past Due, Due on 2/4
Final Project:
- Project Proposal due on 2/1
- Final Write-up due on 3/14 (11:59 pm - pdf by email to staff mailing list)
- Poster Presentation: Held on 16th March from 3.30 to 6.30 pm. The poster session will be held in the Gates basement. Gates Computer Science
- You are required to have at least 1 member of your group present during the entire poster session. You may have 1 member present for an hour or so and then another member of your group can be present for the remaining time. Please be prepared for a 3-5 minute pitch about your project.
- We will provide poster boards on 16th March itself. You can pick the boards (20 X 30 inches) between 2.45 and 3.20 pm from the database lab (Gates fourth floor).
Finals:
Alternate Finals:
- Alternate final exam will be held on 18th march from 9 am to 12 noon. The exact location will be announced soon. Request for an alternate exam will only be accommodated in case of genuine conflict at the time of CS345a final exam, for e.g. another final exam on the same day with overlapping time. Please immediately email the course staff list if you wish to give the alternate final exam.