Course Description

This graduate course is an introduction to Applied Statistics for Biology.

This is a three unit class which requires 9 hours of work a week (more if you miss a class). The course is open to Stanford students, and undergraduates can receive WIM credit by taking it as Stats 155 for a letter grade, graduate students take it as Stats 256, Bios 221 or Stats 366.

Prequisite: R Basics

For instance:

A class that uses R

Or you have followed the short introductions online available here:


Class Lectures for Fall 2021:

At this time, we are hoping to have an in class / in person course that meets Mondays and Wednesdays at 11.30am in room STLC 115 in the SAPP building (old chemistry).

Please bring your laptops to class.

Food is not allowed and masks will be required.

In case there are too many covid cases, we will switch to asynchoronous (pre-recorded) videos that will be posted on Coursework.

There will be 8 labs that follow and solidify the material. Please try to do the corresponding lab before the practical session times so that you have questions ready.

Teaching Team and Labs

Name email Lab and office hour
Professor Holmes susan@stat.stanford.edu Mon, Wed 1pm in Bowker, Sequoia 207
Saskia Comess saskiaco@stanford.edu Tues, 2:00-3:30pm, STLC 118
Zijun Gao zijungao@stanford.edu 3:30-5:00pm, Friday, LATHROP290

Tentative Timetable

(It’s preferable to do the reading before the dates below)

  • 0 - Introduction to the Course and to Bioconductor
  • 1 - Generative probabilistic models for biological data,
  • 2 - Statistical analysis of data; simulations, Monte Carlo and maximum likelihood
  • 2b - Dependent data and Markov Chains
  • 3 - Mixture models ; EM; bootstrapping
  • 4 - High quality graphics and visualization of large, heterogeneous data; the grammar of graphics and ggplot2
  • 5 - Hypothesis testing and Multiple hypothesis testing correction
  • 6 - Cluster analyses : finding latent groupings (ex:cytoF data).
  • 7 - RNA-seq and linear models
  • 8 - Multivariate analyses, PCA, SVD, et al.
  • 9 - RNA-seq revisited: single cell, Gamma-Poisson distribution, shrinkage
  • 10 - Multi-domain, multitable, heterogeneous multi-omics data.
  • 11 - Networks, graphs and phylogenetic trees
  • 12 - Working with image data
  • 13 - Microbial ecology; abundance testing
  • 14 - Supervised Learning methods for heterogeneous data.
  • 15 - Experimental design, analysis good practice, good use of computational tools

The syllabus will be adapted to the audience.
Through the course, you will get acquainted with more than 30 R and Bioconductor packages.

The textbook

We will lean heavily on the book, using exercises and examples that are done in detail in its chapters.

Modern Statistics for Modern Biology, Holmes and Huber.

The book for the course is available on Amazon, and Cambridge University Press

Available for free as an online html resource

(You can print the chapters to pdf from your browser)

The data are all available together as a large compressed tar file and will soon be available as an R package.

Computation

This is a course in Applied Statistics, you will need access to a laptop or desktop running the current release (R version 4.1 or above) of RStudio and R.

Auditors

Auditors are limited for this version of the course, which is not a minicourse but a standard ten week instance. We only have a limited number of auditor spots who commit to coming to all the live sessions and doing all the coursework.

If you want to audit the course, we can put you on the waiting list; please email Professor Holmes at with a commitment to attend all live sessions, your R skill level, and an agreement from your PI to release you from 10 hours of work a week for the ten weeks from September, 20th to December, 1st.

Assessment

If you are taking the course for CR/NC:

  • You need to complete all the 5 of the 8 Labs with their accompanying quizzes and submit the one take home assignment in Week 5 as a Rmd/pdf report.

  • You must attend and participate in the biweekly classes on MW at 11.30 (these count as part of the final assessment).

For a letter grade you will need to do a course project as well

  • Class project (50% of the letter grade)
    • Midterm (10%)
    • Final (oral+ writeup) (10%+30%)

The class project is composed of three major parts:

    1. Project Proposal (10%) 2 pages limit (single-spaced, 12 pt, 1 inch margins, not include graphs and tables), an overview of two methods you plan to compare and the real data set on which you will do your analysis, due on October 27, 17:00 (PST).
    1. Final Written Project (30%) 10-15 pages (single-spaced, 12 pt, 1 inch margins, not including graphs, code and tables that should appear in the supplementary material), due on December 6, 20:00 (PST).
    1. Presentation Slides and 5 minute recorded presentation (10%) 6 slides limit, the dimension of each slide must be no larger than 1024 X 768, and font size should be no smaller than 16pt. 5-7mins due on December 3rd, 20:00 (PST).