CS345A:

Data Mining

Winter 2010

- Section1 [slides1] [slides2]

[Aster Examples used in the section]

[Links to ant and ncluster_loader script]

[AWS & Hive]

- Personalized PageRank, Hubs and Authorities, TrustRank [slides][reading]
- Readings:
- Authoritative sources in a hyperlinked environment by J. Kleinberg, JACM 1999.

- Web spam and TrustRank [slides][reading]
- Readings:
- Combating Web Spam with TrustRank by Z. Gyongyi, H. Garcia-Molina, J. Pedersen, VLDB 2004.

- Random Walks with Restarts and Center piece subgraphs [slides]
- Readings:
- Center-Piece Subgraphs: Problem Definition and Fast Solutions by H. Tong, C. Faloutsos, KDD 2006.
- Fast Random Walk with Restart and Its Application by H. Tong, C. Faloutsos, J.Y. Pan, ICDM 2006.

- SVD and CUR [slides]
- Readings:
- Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition by P Drineas, R Kannan, MW Mahoney, SIAM Journal of Computing 2007.
- Tensor-CUR Decompositions For Tensor-Based Data by M. W. Mahoney, M. Maggioni, and P. Drineas, KDD 2003.
- Less is More: Compact Matrix Decomposition for Large Sparse Graphs by J. Sun, Y. Xie, H. Zhang, C. Faloutsos, SDM 2007.

- k-nearest neighbor, Perceptron[slides]
- Readings:
- Learning Using Large Datasets by L. Bottou, O. Bousquet, MMSD 2009.
- Map-Reduce for Machine Learning on Multicore, C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, K. Olukotun, NIPS 2006.

- Classification and regression trees [slides]
- Readings:
- PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce by B. Panda, J. Herbach, S. Basu, R. Bayardo, PVLDB 2009.

- Support Vector Machines, Cutting plane algorithm, SVM for structured output prediction [slides]
- Readings:
- Pegasos: Primal Estimated sub-GrAdient SOlver for SVM by S. Shalev-Shwartz, Y. Singer, N. Srebro, ICML 2007.
- A Suport Vector Method for Optimizing Average Precision by Y. Yue, T. Finley, F. Radlinski, T. Joachims, SIGIR 2007.
- Learning Using Large Datasets by L. Bottou, O. Bousquet, MMSD 2009.

- Submodular functions, outbreak detection in networks, finding influencers in networks [slides]
- Readings:
- Near-optimal Nonmyopic Value of Information in Graphical Models by A. Krause, C. Guestrin. UAI, 2005.
- Maximizing the Spread of Influence through a Social Network by D. Kempe, J. Kleinberg, E. Tardos, KDD 2003.
- Cost-effective Outbreak Detection in Networks by J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. Glance, KDD 2007.

- Mining the Web for Structured Data [slides]