DATA SCIENCE 101

The course provides a solid introduction to data science, both exposing students to computational tools they can proficiently use to analyze data and exploring the conceptual challenges of inferential reasoning. Each module/week represents a new “data adventure,” analyzing real datasets, exploring different questions and trying out tools.

There will be three traditional lectures per week and two labs with active students participation. Data analysis and computations will be carried out in R, a language that will be introduced during the course. Lecture notes, datasets, and labs markdowns and links to readings and references are available below.

Stats101 is a new course, and as such it does not appear in the lists of required classes for majors. The statistics department believes that the materials in Stats101 cover the topics traditionally taught in Stats60 so talk to your advisors to see if Stats101 could be accepted as a substitute. There is no calculus prerequisite. For details about course structure, office hours, and so on please see the syllabus or the course Canvas page.

Topics

  1. Data science: what is the buzz about?
  2. Visualization tools
  3. Numerical summaries of data
  4. Sampling variability and the uncertainty of statistical estimates
  5. Inference
  6. Testing
  7. Linear regression and prediction
  8. High dimensional data and principal component analysis
  9. Nonparametric statistics (transformations of the data, ranking, etc.)
  10. Safeguarding Reproducibility

Instructor & TAs

Instructors for Spring 2017

Teaching Assistants

Modules materials

0. Getting set up

In this class, we use R heavily in class notes, lab exercises, and assignments. We will also use RStudio to generate documents, debug code, and more. See our install guide to get going with this free software and the additional packages we use.

2. Data Visualization

4. Sampling

5. Inference

6. Prediction

  • Reading Materials