Stanford MS&E 226 – Fundamentals of Data Science

Class description

The primary goal of this course is to give you a foundation with which to ask critical questions about different methods you will encounter over a lifetime of working with data. These fall under the headings of prediction (basics of machine learning); inference (basics of statistics); and causality (basics of cause-and-effect relationships).

Outline of topics

  • Prediction. Train-test-validate; cross validation; binary classification; using optimization to build predictive models (maximum likelihood; linear and logistic regression; regularization, lasso, and ridge; other methods); model complexity and the bias-variance decomposition.

  • Inference. Frequentism and sampling distributions; p-values, confidence intervals, and hypothesis testing; application to linear and logistic regression; bootstrap; multiple hypothesis testing; post-selection inference.

  • Causality. The Rubin causal model, potential outcomes, and counterfactuals; randomized experiments; causal inference from observational data.

  • Bayesian statistics and decision-making. Basics of Bayesian statistics; priors and posteriors; Bayesian vs. frequentist statistics; a Bayesian approach to decision-making.

Course info

All logistical information about the course is available in the syllabus linked from the menu at left.

Enrolled students should use Ed Discussion via Canvas for course announcements.

Professor

Ramesh Johari