# DATA SCIENCE 101¶

The course provides a solid introduction to data science, both exposing students to computational tools they can proficiently use to analyze data and exploring the conceptual challenges of inferential reasoning. Each module/week represents a new “data adventure,” analyzing real datasets, exploring different questions and trying out tools.

There will be three traditional lectures per week and two labs with active students participation. Data analysis and computations will be carried out in R, a language that will be introduced during the course. Lecture notes, datasets, and labs markdowns and links to readings and references are available below.

Stats101 is a new course, and as such it does not appear in the lists of required classes for majors. The statistics department believes that the materials in Stats101 cover the topics traditionally taught in Stats60 so talk to your advisors to see if Stats101 could be accepted as a substitute. There is no calculus prerequisite. For details about course structure, office hours, and so on please see the syllabus or the course Canvas page.

## Topics¶

- Data science: what is the buzz about?
- Visualization tools
- Numerical summaries of data
- Sampling variability and the uncertainty of statistical estimates
- Inference
- Testing
- Linear regression and prediction
- High dimensional data and principal component analysis
- Nonparametric statistics (transformations of the data, ranking, etc.)
- Safeguarding Reproducibility

## Modules materials¶

### 0. Getting set up¶

In this class, we use `R`

heavily in class notes, lab exercises, and
assignments. We will also use `RStudio`

to generate documents, debug
code, and more. See our install guide to get going
with this free software and the additional `packages`

we use.

### 1. Introduction to Data Science¶

- Lecture Notes
- Reading Materials
- Coding Resources
- Supplemental Files

### 2. Data Visualization¶

- Lecture Notes
- Reading Materials
- R for Data Science, Chapters 3 (Data Visualization) & 7 (Exploratory Analysis)

- Supplemental Files

### 3. Data Summaries¶

- Lecture Notes
- Reading Materials
- Coding Resources

### 4. Sampling¶

- Lecture Notes
- Labs
- Reading Materials
- Efron and Tibshirani (1980) "Introduction to the Bootstrap" Introduction and Accuracy of the Sample Mean
- Stigler (1989) Francis Galton's Account of the Invention of Correlation

### 5. Inference¶

- Lecture Notes
- Lab
- Reading Materials