Text information retrieval systems; efficient text indexing; Boolean, vector space, and probabilistic retrieval models; ranking and rank aggregation; evaluating IR systems. Text clustering and classification methods: Latent semantic indexing, taxonomy induction, cluster labeling; classification algorithms and their evaluation, text filtering and routing.

A note on structure: This year, we're teaching a two quarter sequence (CS276A/B) on information retrieval, text, and web page mining, somewhat similarly to in 2002-03, whereas in 2003-04, there was a compressed one quarter course (CS276). The organization this year is a little different however: this year, the first course will focus on information retrieval, and the text mining problems of text clustering and classification. This course will have homeworks, practical exercises and exams, but no large project. The second course will focus on areas like the web and XML, and will be a large project course.


