Pritchard Lab Research


Genetic variation within species is the basic stuff of evolution and phenotypic variation. Most of our work uses statistical and computational methods to study aspects of genetic variation in genomics and in evolutionary biology.

We often work on problems where we are tackling new kinds of genomic data or new questions. Thus, a central part of our work involves developing appropriate statistical and computational approaches that can yield new insights into modern genome-scale data sets. Some of our main research interests are described below, with representative references.


Genetic variation in gene expression

Genetic variants that impact gene regulation play central roles in the genetics of complex traits and adaptation. Yet we still have limited understanding of exactly how the genome sequence encodes regulatory information, or how that information is "read" by the cellular machinery in any given cell type or context. The challenge of distinguishing functional regulatory variants from the many millions of nonfunctional variants is a fundamental hurdle to understanding complex traits and evolution.

One primary focus in our lab is to understand the primary mechanisms by which variation impacts expression, and to be able to predict which variants have regulatory activity. Our work, in collaboration with Yoav Gilad, applies a combination of computational and experimental approaches. We are using QTL mapping for a broad range of cellular phenotypes - ranging from chromatin measurements to mRNA to protein levels - to measure the regulatory effects of genetic variants.

WASP: allele-specific software for robust molecular quantitative trait locus discovery. van de Geijn et al 2015. Nature Methods. 12:1061-3. [PDF]

Impact of regulatory variation from RNA to protein. Battle et al 2015. Science 347:664-7. [PDF]

Identification of Genetic Variants That Affect Histone Modifications in Human Cells. McVicker et al 2013. Science 342:747-9. [PDF]

Primate Transcript and Protein Expression Levels Evolve under Compensatory Selection Pressures. Khan et al 2013. Science 342:1100-4. [PDF]

DNaseI sensitivity QTLs are a major determinant of human expression variation. Degner et al 2012. Nature 482:390-4. [PDF]

Dissecting the regulatory architecture of gene expression QTLs. Gaffney et al 2012. Genome Biology 13(1):R7. [PDF]

Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Pique-Regi et al 2011. Genome Research 21:447-455. [PDF]

Understanding mechanisms underlying human gene expression variation with RNA sequencing. Pickrell et al 2010. Nature 464:768-72. [PDF]


Inference of population structure and history

A second long-term interest for us has been in the development of methods for interpreting population structure and population history from genetic data. The history of a species is recorded in the patterns of genetic variation within and between individuals and populations.

One class of methods that we have developed makes use of multilocus data from SNPs or other markers, most notably as implemented in our Structure algorithm (in collaboration with Matthew Stephens, Peter Donnelly, Daniel Falush and others). Structure views a sample of individuals as (potentially) representing a mixture from different genetic populations. It uses the marker data to infer both the overall genetic structure and the ancestry of individuals. This type of approach has become widely used in many applications of population genetics. Closely related models - developed independently by David Blei and colleagues - have been very influential in the topic modeling literature.

In other early work, we built on key papers from Simon Tavare and Gunther Weiss to develop the first application of Approximate Bayesian Computation, in this case to estimate human demography from Y chromosome data [PDF].

More recently, with Joe Pickrell, we developed the TreeMix algorithm for inferring the relationships among modern populations, while allowing for pulses of gene flow between different clades in the tree (as illustrated at right). Our current work in this area is focused on the problem of fitting complex multi-population historical models using whole-genome data. It's clear that modern whole genome data contain enormous amounts of information about history - the main challenges are now to develop suitable statistical models and computationally feasible algorithms for extracting this information.

fastSTRUCTURE: variational inference of population structure in large SNP data sets. Raj et al 2014. Genetics 197:573-89. [PDF]

Inference of population splits and mixtures from genome-wide allele frequency data. Pickrell and Pritchard 2012. PLoS Genetics 8:e1002967 [PDF] [Software]

Sequencing and Analysis of Neanderthal Genomic DNA. Noonan et al 2006. Science 314:1113-1118. [PDF]

The genetic structure of human populations. Rosenberg et al 2002. Science 298: 2381-2385. [PDF]

Inference of population structure using multilocus genotype data. Pritchard et al 2000. Genetics 155: 945-959. [PDF], [Software]


Natural selection in human populations

A third major area of interest is in understanding natural selection in human populations. We are interested in understanding both the typical modes by which selection acts, as well as the key genes and phenotypes that have been targets of adaptation in different human populations.

In our early work on this problem our goal was to identify the strongest signals of selective sweeps in the genome (Voight et al 2006).

However in 2009, with more extensive data, we argued that in fact there appear to be fewer signals of strong classical hard sweeps in recent human evolution than we had believed earlier (see also related work by our colleague Molly Przeworski).

Instead we have proposed that most adaptation likely occurs through a process of "polygenic adapation" in which small allele frequencies at large numbers of quantitative trait loci allow very rapid phenotypic adaptation but are difficult to detect by standard tests. We are now working actively on new approaches for studying soft sweeps and polygenic adaptation.

The deleterious mutation load is insensitive to recent population history. Simons et al 2014. Nature Genetics 46:220-4. [PDF]

The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Pritchard et al 2010 Current Biology. 20:R208-15. [PDF]

How we are evolving. Pritchard 2010 Scientific American. 301(10):41-47. [link]

The role of geography in human adaptation. Coop et al 2009 PLoS Genetics 5:e1000500. [PDF]

A Map of Recent Positive Selection in the Human Genome. Voight, et al 2006. PLoS Biol 4(3): e72 [PDF] [Recent Blog]



Recombination, LD, deletions, association mapping...

Beyond the areas listed above, we have broad interests in problems where our computational toolbox can provide biological insights; some examples are given below.

One area of interest has been in understanding linkage disequilibrium and recombination (much of this in collaboration with Molly Przeworski and Graham Coop), including providing evidence for variation in hotspot usage across individuals (now known to be due to variation at PRDM9).

When Don Conrad was in the lab he provided one of the early genome-wide surveys of deletion polymormisms, when it was first becoming clear that copy number variation is an important aspect of genome variation (see figure at right).

We also helped to introduce the idea of using genotype data to detect and controlling for the confounding effects of population structure in association mapping. More broadly we have been interested in population genetic models of complex traits, including work on the role of rare variants in disease.

High-Resolution Mapping of Crossovers Reveals Extensive Variation in Fine-Scale Recombination Patterns Among Humans. Coop et al 2008. Science 319: 1395-1398. [PDF]

A high-resolution survey of deletion polymorphism in the human genome. Conrad et al 2006. Nature Genetics 38:75-81. [PDF]

Clonal origin and evolution of a transmissible cancer. Murgia, et al 2006. Cell 126:477-87. [PDF]

Linkage disequilibrium in humans: models and data. Pritchard and Przeworski 2001. Am. J. Hum. Genet. 69:1-14 [PDF]

Use of unlinked genetic markers to detect population stratification in association studies. Pritchard and Rosenberg 1999. Am. J. of Hum. Gen. 65: 220-228. [PDF]


Research Funding.

Our work has been generously supported by the Howard Hughes Medical Institute, the National Institutes of Health, the Packard Foundation, the Sloan Foundation, and Burroughs Wellcome Fund.