May 19th, 2016
This page describes in slightly more detail the datasets for the machine learning programming part of the ultimate CS109 assignment. Each dataset is formatted in exactly the same way. See the problem set handout for more details on formatting. You don't need to know the details of the features or the prediction task to complete pset6. This information is provided simply to give you a deeper understanding of the tasks you are working on.
This dataset was collected by Kurgan et al and is hosted by the UCI Machine Learning Repository:
http://archive.ics.uci.edu/ml/datasets/SPECT+Heart
Thanks to Jim Notwell and Gill Bejerano from the Stanford Computer Science and Genetics departments for this dataset.
Credit: This dataset was curated by Chris Piech, but it is based on data originally made for the "Netflix Prize". The Netflix Prize data was initially retracted because of concerns over user privacy. Reed Hastings, the CEO of Netflix, gave the official thumbs up for CS109 to release this anonymized subset of data. Thanks to Matt Chen for his help in getting the Netflix Prize data.