Research Interests

My research interests are primarily in statistical machine learning, including

  • Scalable algorithms for massive data sets

  • Multivariate analysis and dimensionality reduction in large data sets

  • Data mining and selective inference

  • Methods for stable estimation and inference in heavy-tailed data

  • Ecological statistics and the so-called “presence-only problem”

Publications and Preprints

Scalable Algorithms for Massive Data Sets

Local Case-Control Sampling: Efficient Subsampling in Imbalanced Data Sets, William Fithian and Trevor Hastie, 2014, Annals of Statistics (to appear).

Scalable Convex Methods for Flexible Low-Rank Matrix Modeling, William Fithian and Rahul Mazumder. Submitted.

Ecological Statistics

Finite Sample Equivalence in Statistical Models for Presence-Only Data, William Fithian and Trevor Hastie, 2013, Annals of Applied Statistics.

Inference from Presence-Only Data: the Ongoing Controversy, (pdf) Trevor Hastie and William Fithian, 2013, Ecography.

Bias Correction in Species Distribution Models: Pooling Survey and Collection Data for Multiple Species, William Fithian, Jane Elith, Trevor Hastie, and David A. Keith. 2014, Methods in Ecology and Evolution (to appear).


Selection Adjusted Confidence Intervals with More Power to Determine the Sign, Asaf Weinstein, William Fithian, and Yoav Benjamini, 2012, JASA.

Semiparametric Exponential Families for Heavy-Tailed Data, William Fithian and Stefan Wager. Submitted.

Effective Degrees of Freedom: A Flawed Metaphor, Lucas Janson, William Fithian, and Trevor Hastie. Submitted.

Altitude Training: Strong Bounds for Single-Layer Dropout, Stefan Wager, William Fithian, Sida Wang, and Percy Liang. Submitted.