Research Interests

  • Statistical machine learning
  • High-dimensional statistics
  • Scalable algorithms and data structures
  • Matrix concentration inequalities

In the Pipeline

  • Measuring Sample Quality with Diffusions. (arxiv, code)
    Jackson Gorham, Andrew B. Duncan, Sebastian J. Vollmer, and Lester Mackey.
    View details »

    Standard Markov chain Monte Carlo diagnostics, like effective sample size, are ineffective for biased sampling procedures that sacrifice asymptotic correctness for computational speed. Recent work addresses this issue for a class of strongly log-concave target distributions by constructing a computable discrepancy measure based on Stein's method that provably determines convergence to the target. We generalize this approach to cover any target with a fast-coupling Ito diffusion by bounding the derivatives of Stein equation solutions in terms of Markov process coupling rates. As example applications, we develop computable and convergence-determining diffusion Stein discrepancies for log-concave, heavy-tailed, and multimodal targets and use these quality measures to select the hyperparameters of biased samplers, compare random and deterministic quadrature rules, and quantify bias-variance tradeoffs in approximate Markov chain Monte Carlo. Our explicit multivariate Stein factor bounds may be of independent interest.

Publications

  • Measuring Sample Quality with Kernels. (arxiv, slides, code)
    Jackson Gorham and Lester Mackey.
    International Conference on Machine Learning (ICML). August 2017.
    View details »

    Approximate Markov chain Monte Carlo (MCMC) offers the promise of more rapid sampling at the cost of more biased inference. Since standard MCMC diagnostics fail to detect these biases, researchers have developed computable Stein discrepancy measures that provably determine the convergence of a sample to its target distribution. This approach was recently combined with the theory of reproducing kernels to define a closed-form kernel Stein discrepancy (KSD) computable by summing kernel evaluations across pairs of sample points. We develop a theory of weak convergence for KSDs based on Stein's method, demonstrate that commonly used KSDs fail to detect non-convergence even for Gaussian targets, and show that kernels with slowly decaying tails provably determine convergence for a large class of target distributions. The resulting convergence-determining KSDs are suitable for comparing biased, exact, and deterministic sample sequences and simpler to compute and parallelize than alternative Stein discrepancies. We use our tools to compare biased samplers, select sampler hyperparameters, and improve upon existing KSD approaches to one-sample hypothesis testing and sample quality improvement.

  • Improving Gibbs Sampler Scan Quality with DoGS. (arxiv)
    Ioannis Mitliagkas and Lester Mackey.
    International Conference on Machine Learning (ICML). August 2017.
    View details »

    The pairwise influence matrix of Dobrushin has long been used as an analytical tool to bound the rate of convergence of Gibbs sampling. In this work, we use Dobrushin influence as the basis of a practical tool to certify and efficiently improve the quality of a discrete Gibbs sampler. Our Dobrushin-optimized Gibbs samplers (DoGS) offer customized variable selection orders for a given sampling budget and variable subset of interest, explicit bounds on total variation distance to stationarity, and certifiable improvements over the standard systematic and uniform random scan Gibbs samplers. In our experiments with joint image segmentation and object recognition, Markov chain Monte Carlo maximum likelihood estimation, and Ising model inference, DoGS consistently deliver higher-quality inferences with significantly smaller sampling budgets than standard Gibbs samplers.

  • Empirical Bayesian Analysis of Simultaneous Changepoints in Multiple Data Sequences. (arxiv, code)
    Zhou Fan and Lester Mackey.
    Annals of Applied Statistics. To appear.
    View details »

    Copy number variations in cancer cells and volatility fluctuations in stock prices are commonly manifested as changepoints occurring at the same positions across related data sequences. We introduce a Bayesian modeling framework, BASIC, that employs a changepoint prior to capture the co-occurrence tendency in data of this type. We design efficient algorithms to sample from and maximize over the BASIC changepoint posterior and develop a Monte Carlo expectation-maximization procedure to select prior hyperparameters in an empirical Bayes fashion. We use the resulting BASIC framework to analyze DNA copy number variations in the NCI-60 cancer cell lines and to identify important events that affected the price volatility of S&P 500 stocks from 2000 to 2009.

  • Predicting Patient "Cost Blooms" in Denmark: A Longitudinal Population-based Study. (pdf, bib)
    Suzanne Tamang, Arnold Milstein, Henrik Toft Sorensen, Lars Pedersen, Lester Mackey, Jean-Raymond Betterton, Lucas Janson, and Nigam Shah.
    BMJ Open. January 2017.
    View details »

    Objectives: To compare the ability of standard vs. enhanced models to predict future high-cost patients, especially those who move from a lower to the upper decile of per capita healthcare expenditures within one year - i.e., "cost bloomers."
    Design: We developed alternative models to predict being in the upper decile of healthcare expenditures in Year 2 of a sample, based on data from Year 1. Our six alternative models ranged from a standard cost-prediction model with four variables (i.e., traditional model features), to our largest enhanced model with 1,053 nontraditional model features. To quantify any increases in predictive power that enhanced models achieved over standard tools, we compared the prospective predictive performance of each model.
    Participants and setting: We used the population of Western Denmark between 2004 and 2011 (2,146,801 individuals) to predict future high-cost patients and examine characteristics of high-cost cohorts. Using the most recent two-year period (2010-11) for model evaluation, our whole-population model used a cohort of 1,557,950 individuals with a full year of active residency Year 1 (2010). Our cost-bloom model excluded the 155,795 individuals who were already high cost at the population level in Year 1, resulting in 1,402,155 individuals for prediction of cost bloomers in Year 2 (2011).
    Primary outcome measures: Using unseen data from a future year, we evaluated each model's prospective predictive performance by calculating the ratio of predicted high-cost patient expenditures to the actual high-cost patient expenditures in Year 2 - i.e., cost capture.
    Results: Our best enhanced model achieved a 21 percent and 30 percent improvement in cost capture over a standard diagnosis-based model for predicting population-level high-cost patients and cost bloomers, respectively.
    Conclusions: In combination with modern statistical learning methods for analyzing large datasets, models enhanced with a large and diverse set of features led to better performanceÑespecially for predicting future cost bloomers.


  • Predicting inpatient clinical order patterns with probabilistic topic models vs. conventional order sets. (pdf, bib)
    Jonathan H. Chen, Mary K. Goldstein, Steven M. Asch, Lester Mackey, and Russ B. Altman.
    Journal of the American Medical Informatics Association. September 2016.
    View details »

    Objective Build probabilistic topic model representations of hospital admissions processes and compare the ability of such models to predict clinical order patterns as compared to preconstructed order sets.
    Materials and Methods The authors evaluated the first 24 hours of structured electronic health record data for > 10 K inpatients. Drawing an analogy between structured items (e.g., clinical orders) to words in a text document, the authors performed latent Dirichlet allocation probabilistic topic modeling. These topic models use initial clinical information to predict clinical orders for a separate validation set of > 4 K patients. The authors evaluated these topic model-based predictions vs existing human-authored order sets by area under the receiver operating characteristic curve, precision, and recall for subsequent clinical orders.
    Results Existing order sets predict clinical orders used within 24 hours with area under the receiver operating characteristic curve 0.81, precision 16%, and recall 35%. This can be improved to 0.90, 24%, and 47% by using probabilistic topic models to summarize clinical data into up to 32 topics. Many of these latent topics yield natural clinical interpretations (e.g., "critical care," "pneumonia," "neurologic evaluation").
    Discussion Existing order sets tend to provide nonspecific, process-oriented aid, with usability limitations impairing more precise, patient-focused support. Algorithmic summarization has the potential to breach this usability barrier by automatically inferring patient context, but with potential tradeoffs in interpretability.
    Conclusion Probabilistic topic modeling provides an automated approach to detect thematic trends in patient care and generate decision support content. A potential use case finds related clinical orders for decision support.


  • Efron-Stein Inequalities for Random Matrices. (pdf, bib)
    Daniel Paulin, Lester Mackey, and Joel A. Tropp.
    Annals of Probability. September 2016.
    View details »

    This paper establishes new concentration inequalities for random matrices constructed from independent random variables. These results are analogous with the generalized Efron-Stein inequalities developed by Boucheron et al. The proofs rely on the method of exchangeable pairs.


  • Multivariate Stein Factors for a Class of Strongly Log-concave Distributions. (arxiv, bib)
    Lester Mackey and Jackson Gorham.
    Electronic Communications in Probability. September 2016.
    View details »

    We establish uniform bounds on the low-order derivatives of Stein equation solutions for a broad class of multivariate, strongly log-concave target distributions. These "Stein factor" bounds deliver control over Wasserstein and related smooth function distances and are well-suited to analyzing the computable Stein discrepancy measures of Gorham and Mackey. Our arguments of proof are probabilistic and feature the synchronous coupling of multiple overdamped Langevin diffusions.

  • Jet-Images -- Deep Learning Edition. (pdf, code, bib)
    Luke de Oliveira, Michael Kagan, Lester Mackey, Benjamin Nachman, and Ariel Schwartzman.
    Journal of High Energy Physics. July 2016.
    View details »

    Building on the notion of a particle physics detector as a camera and the collimated streams of high energy particles, or jets, it measures as an image, we investigate the potential of machine learning techniques based on deep learning architectures to identify highly boosted W bosons. Modern deep learning algorithms trained on jet images can out-perform standard physically-motivated feature driven approaches to jet tagging. We develop techniques for visualizing how these features are learned by the network and what additional information is used to improve performance. This interplay between physically-motivated feature driven tools and supervised learning algorithms is general and can be used to significantly increase the sensitivity to discover new particles and new forces, and gain a deeper understanding of the physics within jets.

  • Fuzzy Jets. (pdf, code, bib)
    Lester Mackey, Benjamin Nachman, Ariel Schwartzman, and Conrad Stansbury.
    Journal of High Energy Physics. June 2016.
    View details »

    Collimated streams of particles produced in high energy physics experiments are organized using clustering algorithms to form jets. To construct jets, the experimental collaborations based at the Large Hadron Collider (LHC) primarily use agglomerative hierarchical clustering schemes known as sequential recombination. We propose a new class of algorithms for clustering jets that use infrared and collinear safe mixture models. These new algorithms, known as fuzzy jets, are clustered using maximum likelihood techniques and can dynamically determine various properties of jets like their size. We show that the fuzzy jet size adds additional information to conventional jet tagging variables. Furthermore, we study the impact of pileup and show that with some slight modifications to the algorithm, fuzzy jets can be stable up to high pileup interaction multiplicities.

  • Measuring Sample Quality with Stein's Method. (arxiv, poster, code, bib)
    Jackson Gorham and Lester Mackey.
    Advances in Neural Information Processing Systems (NIPS). December 2015.
    View details »

    To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. The reasoning is sound: a reduction in variance due to more rapid sampling can outweigh the bias introduced. However, the inexactness creates new challenges for sampler and parameter selection, since standard measures of sample quality like effective sample size do not account for asymptotic bias. To address these challenges, we introduce a new computable quality measure based on Stein's method that quantifies the maximum discrepancy between sample and target expectations over a large class of test functions. We use our tool to compare exact, biased, and deterministic sample sequences and illustrate applications to hyperparameter selection, convergence rate assessment, and quantifying bias-variance tradeoffs in posterior inference.


  • Weighted Classification Cascades for Optimizing Discovery Significance in the HiggsML Challenge. (pdf, bib)
    Lester Mackey, Jordan Bryan, and Man Yue Mo.
    Proceedings of the NIPS Workshop on High Energy Physics, Machine Learning, and the HiggsML Data Challenge. August 2015.
    View details »

    We introduce a minorization-maximization approach to optimizing common measures of discovery significance in high energy physics. The approach alternates between solving a weighted binary classification problem and updating class weights in a simple, closed-form manner. Moreover, an argument based on convex duality shows that an improvement in weighted classification error on any round yields a commensurate improvement in discovery significance. We complement our derivation with experimental results from the 2014 Higgs boson machine learning challenge.


  • Distributed Matrix Completion and Robust Factorization. (pdf, website, code, bib)
    Lester Mackey, Ameet Talwalkar, and Michael I. Jordan.
    Journal of Machine Learning Research. April 2015.

  • Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. (pdf, website, bib)
    Robert Kuffner, Neta Zach, Raquel Nore, Johann Hawe, David Schoenfeld, Liuxia Wang, Guang Li, Lilly Fang, Lester Mackey, Orla Hardiman, Merit Cudkowicz, Alexander Sherman, Gokhan Ertaylan, Moritz Grosse-Wentrup, Torsten Hothorn, Jules van Ligtenberg, Jakob H. Macke, Timm Meyer, Bernhard Scholkopf, Linh Tran, Rubio Vaughan, Gustavo Stolovitzky, and Melanie L. Leitner.
    Nature Biotechnology. November 2014.

  • Combinatorial Clustering and the Beta Negative Binomial Process. (pdf, code, bib)
    Tamara Broderick, Lester Mackey, John Paisley, and Michael I. Jordan.
    IEEE Transactions on Pattern Analysis and Machine Intelligence. April 2014.

  • Matrix Concentration Inequalities via the Method of Exchangeable Pairs. (pdf, bib, Joel Tropp's talk)
    Lester Mackey, Michael I. Jordan, Richard Y. Chen, Brendan Farrell, and Joel A. Tropp.
    Annals of Probability. March 2014.

  • Corrupted Sensing: Novel Guarantees for Separating Structured Signals. (pdf, bib)
    Rina Foygel and Lester Mackey.
    IEEE Transactions on Information Theory. February 2014.

  • Distributed Low-rank Subspace Segmentation. (pdf, code, bib)
    Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, and Michael I. Jordan.
    IEEE International Conference on Computer Vision (ICCV). December 2013.

  • The Asymptotics of Ranking Algorithms. (pdf, bib)
    John C. Duchi, Lester Mackey, and Michael I. Jordan.
    Annals of Statistics. November 2013.

  • Joint Link Prediction and Attribute Inference using a Social-Attribute Network. (pdf, website, bib)
    Neil Zhenqiang Gong, Ameet Talwalkar, Lester Mackey, Ling Huang, Eui Chul Richard Shin, Emil Stefanov, Elaine (Runting) Shi, and Dawn Song.
    ACM Transactions on Intelligent Systems and Technology. March 2013.

  • Divide-and-Conquer Matrix Factorization. (pdf, website, code, bib)
    Lester Mackey, Ameet Talwalkar, and Michael I. Jordan.
    Advances in Neural Information Processing Systems (NIPS). December 2011.

  • Visually Relating Gene Expression and in vivo DNA Binding Data. (pdf, bib)
    Min-Yu Huang, Lester Mackey, Soile Keranen, Gunther Weber, Michael Jordan, David Knowles, Mark Biggin, and Bernd Hamann.
    IEEE International Conference on Bioinformatics and Biomedicine (BIBM). November 2011.

  • Mixed Membership Matrix Factorization. (pdf, supp info, slides, code, bib)
    Lester Mackey, David Weiss, and Michael I. Jordan.
    International Conference on Machine Learning (ICML). June 2010.
    Handbook of Mixed Membership Models and Their Applications. November 2014.

  • On the Consistency of Ranking Algorithms. (pdf, slides, bib)
    John Duchi, Lester Mackey, and Michael I. Jordan.
    International Conference on Machine Learning (ICML). June 2010.
    • Winner of the ICML 2010 Best Student Paper Award.

  • Deflation Methods for Sparse PCA. (pdf, poster, code, bib)
    Lester Mackey.
    Advances in Neural Information Processing Systems (NIPS). December 2008.

  • Fault-tolerant Typed Assembly Language. (pdf, bib)
    Frances Perry, Lester Mackey, George A. Reis, Jay Ligatti, David I. August, and David Walker.
    ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). June 2007.

  • Static Typing for a Faulty Lambda Calculus. (pdf, bib)
    David Walker, Lester Mackey, Jay Ligatti, George Reis, and David August.
    ACM SIGPLAN International Conference on Functional Programming (ICFP). September 2006.

  • Participatory Design with Proxies: Developing a Desktop-PDA System to Support People with Aphasia. (pdf, bib)
    Jordan Boyd-Graber, Sonya Nikolova, Karyn Moffatt, Kenrick Kin, Joshua Lee, Lester Mackey, Marilyn Tremaine, and Maria Klawe.
    SIGCHI Conference on Human Factors in Computing Systems (CHI). April 2006.

Other Work

  • Deriving Matrix Concentration Inequalities from Kernel Couplings. (arxiv)
    Daniel Paulin, Lester Mackey, and Joel A. Tropp. May 2013

  • Sparse Representation and Low-Rank Approximation. (website)
    Organized with Francis Bach, Michael Davies, Remi Gribonval, Michael Mahoney, Mehryar Mohri, Guillaume Obozinski, and Ameet Talwalkar.
    Neural Information Processing Systems (NIPS) 2011 workshop. December 2011.

  • Feature-Weighted Linear Stacking. (arxiv, Joe Sill's talk)
    Joint work with Joe Sill, Gabor Takacs, and David Lin. November 2009.

  • Anomaly Detection for Asynchronous and Incomplete Data.
    Joint work with John Duchi and Fabian Wauthier.
    Advanced Topics in Computer Systems (UC Berkeley CS 262A, E. Brewer). December 2008.

  • Scalable Dyadic Kernel Machines. (pdf)
    Advanced Topics in Learning and Decision Making (UC Berkeley CS 281B, P. Bartlett). May 2008.

  • Latent Dirichlet Markov Random Fields for Semi-supervised Image Segmentation and Object Recognition. (pdf)
    Statistical Learning Theory (UC Berkeley CS 281A, M. Jordan) and Computer Vision (UC Berkeley CS 280, J. Malik). December 2007.

Invited Talks

  • Measuring Sample Quality with Kernels. (slides)
    • SAMSI Workshop on Quasi-Monte Carlo and High-Dimensional Sampling Methods, Duke University, Aug. 2017.
    • Borchard Colloquium on Concentration Inequalities, High Dimensional Statistics, and Stein's Method, Missilac, France, July 2017.
    • New England Machine Learning Day, Cambridge, MA, May 2017.
    • Machine Learning Seminar, MIT, Mar. 2017.

  • Statistics for Social Good
    • AI Now Symposium on the Social and Economic Impact of Artificial Intelligence Technologies, MIT, July 2017.
    • Data Science @ Stanford Seminar, Stanford, June 2016.

  • Measuring Sample Quality with Stein's Method. (slides)
    • Gatsby Unit Seminar, University College London, Oct. 2016.
    • Seminar, University of Liege, Sep. 2016.
    • Quetelet Seminar, Ghent University, Sep. 2016.
    • International Conference on Monte Carlo and Quasi-Monte Carlo Methods (MCQMC), Stanford, CA, Aug. 2016.
    • Statistics Seminar, Columbia University, Feb. 2016.
    • Quasi-Monte Carlo Invited Session, IMS-ISBA Joint Meeting (MCMSki V), Jan. 2016.
    • Wharton Statistics Seminar, University of Pennsylvania, Dec. 2015.
    • Neyman Seminar, UC Berkeley, Sep. 2015.
    • IMS-Microsoft Research Workshop: Foundations of Data Science, Cambridge, MA, June 2015.
    • Stochastics and Statistics Seminar, MIT, May 2015.
    • Statistics Seminar, Stanford University, May 2015.

  • Matrix Completion and Matrix Concentration. (slides)
    • IDSS Special Seminar, MIT, Feb. 2016.
    • Statistics Seminar, Harvard University, Nov. 2014.
    • Blackwell-Tapia Conference, Los Angeles, CA, Nov. 2014.
    • Information Systems Laboratory Colloquium, Stanford University, April 2013.
    • Statistics Seminar, Yale University, April 2013.
    • Statistics Seminar, Columbia University, April 2013.
    • Computer Science Seminar, University of Southern California, May 2012.
    • Statistics Seminar, Stanford University, Jan. 2012.

  • Divide-and-Conquer Matrix Factorization. (slides)
    • CS Department Colloquium, Princeton University, Dec. 2015.
    • Workshop on Big Data: Theoretical and Practical Challenges, Paris, France, May 2013.
    • Kaggle, San Francisco, CA, Feb. 2013.
    • Statistical Science Seminar Series, Duke University, Jan. 2012.
    • CMS Seminar, Caltech, Jan. 2012.
    • San Francisco Bay Area Machine Learning Meetup, San Francisco, CA, Nov. 2011.

  • Predicting ALS Disease Progression with Bayesian Additive Regression Trees. (slides)
    • Big Data in Biomedicine Conference, Stanford University, May 2015.
    • Guest Lecture, Stats 202, Stanford University, Nov. 2013.
    • Statistics Seminar, Stanford University, April 2013.
    • RECOMB Conference on Regulatory and Systems Genomics, San Francisco, CA, Nov. 2012.

  • Weighted Classification Cascades for Optimizing Discovery Significance. (slides)
    • NIPS Workshop on High-energy particle physics, machine learning, and the HiggsML data challenge (HEPML), December 2014.

  • Ranking, Aggregation, and You. (slides)
    • Statistics Seminar, University of Chicago, Oct. 2014
    • Yale MacMillan-CSAP Workshop on Quantitative Research Methods, Yale University, Sep. 2014.
    • Wharton Statistics Seminar, University of Pennsylvania, Sep. 2014.
    • Statistics Seminar, Carnegie Mellon University, Sep. 2014.
    • Western Section Meeting, American Mathematical Society, Nov. 2013.
    • Statistics Seminar, Stanford University, Sep. 2013.
    • Stanford Statistics/Machine Learning Reading Group, Stanford University, Nov. 2012.

  • Dividing, Conquering, and Mixing Matrix Factorizations. (slides)
    Technicolor, Palo Alto, CA, June 2013.

  • Stein's Method for Matrix Concentration. (slides)
    • Institut National de Recherche en Informatique et en Automatique (INRIA), Dec. 2012.
    • Berkeley Probability Seminar, University of California, Berkeley, May 2012.

  • Build a Better Netflix, Win a Million Dollars?
    SPARC Camp, Aug. 2014. (slides)
    USA Science and Engineering Festival, Washington, DC, Apr. 2012. (slides)

  • The Story of the Netflix Prize: An Ensembler's Tale. (slides, video)
    National Academies' Seminar, Washington, DC, Nov. 2011.

  • Mixed Membership Matrix Factorization. (slides)
    Joint Statistical Meetings, Miami Beach, FL, July 2011.

  • False Event Identification and Beyond: A Machine Learning Approach.
    Presented with Ariel Kleiner.
    Comprehensive Test Ban Treaty Organization Technical Meeting on Data Mining, Vienna, Austria, Nov. 2009.

  • The Dinosaur Planet Approach to the Netflix Prize.
    • LIDS Seminar Series, MIT, Nov. 2008, presented with David Weiss.
    • Guest Lecture, Stat 157, U.C. Berkeley, Sept. 2008.
    • Process Driven Trading Group, Morgan Stanley, April 2008, presented with David Lin and David Weiss.