Course Description

This graduate course is an introduction to Applied Statistics for Biology.

This is a three unit class which requires 9 hours of work a week (more if you miss a class). The course is open to Stanford students, graduate students take it as Stats 256, Bios 221 or Stats 366.

Prequisite: R Basics

For instance:

A class that uses R

Or you have followed the short introductions online available here:


Class Lectures for Fall 2023:

At this time, the course will meet Mondays and Wednesdays at 1.30pm in room 203 in the History Corner building (200).

Please bring your laptops to class.

Food and drink are not allowed and masks will be welcomed.

There will be 8 labs that follow and solidify the material. Please try to do the corresponding lab before the practical session times so that you have questions ready.

Teaching Team and Labs

Name email Lab and office hour
Professor Holmes susan@stat.stanford.edu Mon 3pm in Bowker, Sequoia 207
Paula Gablenz pgablenz@stanford.edu Friday 4:00-6:00pm, Fishbowl
Yu Wang yw1@stanford.edu Thursday 1:00-3:00pm, Fishbowl

Tentative Timetable

(It’s preferable to do the reading before the dates below)

  • 0 - Introduction to the Course and to Bioconductor
  • 1 - Generative probabilistic models for biological data,
  • 2 - Statistical analysis of data; simulations, Monte Carlo and maximum likelihood
  • 2b - Dependent data and Markov Chains
  • 3 - Mixture models ; EM; bootstrapping
  • 4 - High quality graphics and visualization of large, heterogeneous data; the grammar of graphics and ggplot2
  • 5 - Hypothesis testing and Multiple hypothesis testing correction
  • 6 - Cluster analyses : finding latent groupings (ex:cytoF data).
  • 7 - RNA-seq and linear models
  • 8 - Multivariate analyses, PCA, SVD, et al.
  • 9 - RNA-seq revisited: single cells, Gamma-Poisson distribution, shrinkage
  • 10 - Multi-domain, multitable, heterogeneous multi-omics data.
  • 11 - Networks, graphs and phylogenetic trees
  • 12 - Working with image data
  • 13 - Microbial ecology; abundance testing
  • 14 - Supervised Learning methods for heterogeneous data.
  • 15 - Experimental design, analysis good practice, good use of computational tools

The syllabus will be adapted to the audience.
Through the course, you will get acquainted with more than 30 R and Bioconductor packages.

The textbook

We will lean heavily on the book, using exercises and examples that are done in detail in its chapters.

Modern Statistics for Modern Biology, Holmes and Huber.

The book for the course is available on Amazon, and Cambridge University Press

Available for free as an online html resource

(You can print the chapters to pdf from your browser)

The data are all available together as a large compressed tar file and will soon be available as an R package.

Computation

This is a course in Applied Statistics, you will need access to a laptop or desktop running the current release (R version 4.3) of RStudio and R.

Auditors

Auditors are limited for this version of the course, which is not a minicourse but a standard ten week instance. We only have a limited number of auditor spots who commit to coming to all the live sessions and doing all the coursework.

If you want to audit the course, we can put you on the waiting list; please email Professor Holmes at with a commitment to attend all live sessions, your R skill level, and an agreement from your PI to release you for 10 hours of work a week for the ten weeks from September, 26th to December, 8th.

Assessment

If you are taking the course for CR/NC:

  • You need to complete all the 5 of the 8 Labs with their accompanying quizzes and submit the one take home assignment in Week 5 as a Rmd/pdf report.

  • You must attend and participate in the biweekly classes on MW at 1.30 (these count as part of the final assessment).

  • You must hand in the three errors assignment before Thanksgiving (more about this later).

For a letter grade you will need to do a course project as well

  • Class project (50% of the letter grade)
    • Midterm (10%)
    • Final (oral+ writeup) (10%+30%)

The class project is composed of three major parts:

    1. Project Proposal (10%) 2 pages limit (single-spaced, 12 pt, 1 inch margins, not include graphs and tables), an overview of two methods you plan to compare and the real data set on which you will do your analysis, due on October 31, 17:00 (PST).
    1. Final Written Project (30%) 10-15 pages (single-spaced, 12 pt, 1 inch margins, not including graphs, code and tables that should appear in the supplementary material), due on December 12, 20:00 (PST).
    1. Presentation Slides and 5 minute recorded presentation (10%) 6 slides limit, the dimension of each slide must be no larger than 1024 X 768, and font size should be no smaller than 16pt. 5-7mins due on December 7th, 20:00 (PST).