Stanford MS&E 226 – Fundamentals of Data ScienceClass description – Autumn 2020This course is about understanding “small data”: these are datasets that allow interaction, visualization, exploration, and analysis on a local machine. The material provides an introduction to applied data analysis, with an emphasis on providing a conceptual framework for thinking about data from both statistical and machine learning perspectives. Topics will be drawn from the following list, depending on time constraints and class interest: approaches to data analysis: statistics (frequentist, Bayesian) and machine learning; binary classification; regression; bootstrapping; causal inference and experimental design; multiple hypothesis testing. Homeworks will have a significant practical and computational load to help students apply the concepts discussed in class. Outline
LogisticsClass times and locations:
Evaluation:
Downloading RThere is a computational component to this class, which requires using R. (If you like you may use Python or Matlab, but officially the class will use R.) An easy interface to R that you can use on your local machine is RStudio Desktop, which is available free for non-commercial use. R is powerful in part because of the range of packages available that increase its capabilities. After downloading and installing R, you will find it helpful to also load the following packages:
To install packages run install.packages(’<package_name>’) at the R command prompt. To load a package run library('package_name’) at the R command prompt. Some links to get you started with R: Course staffProfessor: TAs: Je-ok Choi (ICME Ph.D.) Lin Fan (MS&E Ph.D.) Hannah Li (MS&E Ph.D.) Yueyang Liu (MS&E Ph.D.) Linjia Wu (MS&E Ph.D.) NOTE: Please use Piazza for course-related communication. |