CS276B / LING 239J
Web Search and Mining
Meeting Times and Locations
Lecture: TuTh 4:15-5:30 in Gates B12
Review sessions: TBD
From the bulletin: Advanced topics and project in information
retrieval. Web search engines including crawling and indexing, link-based algorithms, and web
metadata. Collaborative filtering and recommender systems. Text-centric
XML indexing and ranked retrieval. User interfaces for IR. Students work
in teams to implement a project of their choosing.
Staff Contact Information:
Students should post most questions on the course newsgroup,
Send questions of an individual nature to the staff mailing list at firstname.lastname@example.org.
Professor: Christopher Manning
Office: Gates 158
Office Hours: Tue 3-4, Wed 2-3
Professor: Prabhakar Raghavan
Office Hours: by appointment
TA: Louis Eisenberg
Office: Gates B26 (during office hours only)
Office Hours: Tuesday and Thursday 10:50-11:50 a.m., Thursday 3:10-4:10 p.m.
Course admin: Sarah Weden
Office: Gates 419
- Project: 50%
- initial proposal: 5%
- milestone #1: 7.5%
- milestone #2: 7.5%
- final submission: 30%
- Midterm: 20%
- Homework (2 problem sets): 20%
- Research paper appraisal/evaluation: 10%
Either CS276A or reasonable
background in some text and statistical
machine learning techniques, such as from CS224N, CS229, or
Stat315. (You're not required to have done CS276A to do this
course, and the focus is rather different. On the other
hand, we will only very briefly review material covered
there, so unless you already know appropriate topics
from CS276A, you will need to do additional outside
project will require extensive programming.
There is no required or recommended text. We will distribute readings
for each topic. Books that contain considerable material
of relevance to the course that you may wish to look at
- Soumen Chakrabarti. 2003. Mining the Web: Discovering Knowledge
Hypertext Data. Amsterdam: Morgan Kaufmann.
- Pierre Baldi, Paolo Frasconi, and Padhraic Smyth. 2003. Modeling
the Internet and the Web: Probabilistic Methods and
Algorithms. John Wiley.
- Christopher Manning and Hinrich Schütze. 1999. Foundations of
Statistical Natural Language Processing. Cambridge,
MA: MIT Press.
- Ian Witten and Eibe Frank. 2000. Data Mining: Practical Machine
Learning Tools and Techniques with Java
Implementations. San Francisco, CA: Morgan Kaufmann.
- Peter Jackson and Isabelle Moulinier. 2002. Natural Language
Processing for Online Applications: Text Retrieval,
Extraction, and Categorization. John Benjamins.
Assignments must be submitted by 5:30 p.m. Pacific on the due date. Problem sets should be handed to Louis in class or left in the box outside of Professor Manning's office.
- Late days:
Each student has 5 late days to use at his or her
discretion. Please reserve your late days for legitimate emergencies. Each
late day constitutes a 24-hour extension; you cannot split late days into
smaller increments. If project partners want to take a late day, each student must contribute a day from his or her allotment.
- Late penalties:
Once a student runs out of late days, any late submissions are penalized
at a rate of 10% per day. No assignment may be handed in more than 5 days late.
- Collaboration: You may talk to anybody you want about the problem sets, including working through problems together in groups. Indeed, we encourage you to work in groups, and to work with different people through the quarter. However:
- you must state on your written assignment the people you discussed problems with, and
- you are not allowed to take detailed notes in any group sessions that will appear verbatim in assignment write-ups. Everybody has to turn in written homework answers that are written solely by himself/herself.
If you feel that we made a mistake in grading one of your assignments, you can resubmit the assignment for a regrade. Please include a brief statement describing which portion(s) you would like us to review and why. Note that when you request a regrade, we reserve the right to review your entire assignment -- i.e. we may find errors in your work that we missed before.
All actual, detailed work on the solution of problem sets must be individual work. You are encouraged to discuss problem sets with each other in a general way, but if you do so, then you must acknowledge the people with whom you discussed the problem set at the top of your submission.
You should not look for problem answers elsewhere; but again, if material is taken from elsewhere, then you should acknowledge it. For practical exercises, you are not permitted to get programming help from people other than your partner. Normally, you are permitted to use pre-existing code, but you must acknowledge code that you have taken from other sources. In general, we will act and expect you to act according to the Stanford Honor Code.
Back to the CS276B homepage