Pritchard Lab Software

Structure 2.3.X

The program structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. It can be applied to most of the commonly-used genetic markers, including SNPs, microsatellites, RFLPs and AFLPs. The basic algorithm was described by Pritchard, Stephens & Donnelly (2000). Extensions to the method were published by Falush, Stephens and Pritchard (2003), and (2007) and Hubisz, Falush, Stephens and Pritchard (2009).

Structure Homepage

SDS: Singleton Density Score

SDS is an approach for studying very recent changes in allele frequencies within a population, using whole genome sequence data. Applied to data from the UK10K Project (3000 individuals) we estimate that SDS reflects frequency changes during the past 2,000 years. Large changes in frequencies imply adaptive events. The software was developed by Yair Field, and the paper was jointly written by Yair, Evan Boyle, and Natalie Telis. [software] [preprint] [SDS values in UK10K]


WASP [software] [preprint] is a software package for two related tasks: (1) removing allelic bias in mapped sequencing reads and, (2) identifying molecular quantitative trait loci (QTLs) using next-generation sequencing data (e.g. gene expression QTLs or histone mark QTLs). WASP identifies molecular QTLs using a statistical test that combines information about the total depth and allelic imbalance of mapped reads. WASP can call QTLs with very small sample sizes (as few as 10) compared to traditional QTL mapping approaches. WASP was developed by Bryce van de Geijn and Graham McVicker.

TreeMix: estimation of population trees with admixture

TreeMix uses large numbers of SNPs to estimate the historical relationships among populations, using a graph representation that allows both population splits and migration events. You can download the TreeMix paper by Pickrell and Pritchard (2012) here.

CENTIPEDE: software for inference of TF binding sites

CENTIPEDE is a method developed by Roger Pique-Regi and Jacob Degner that uses PWM information plus experimental data such as DNase1, histone marks or FAIRE to infer transcription factor binding sites with high specificity. Software and data are available here.

Transcriptome Assembler

Software for transcriptome assembly used in RNA-seq of 16 mammalian species.
Download assembler.

BFCounter: Memory efficient k-mer counting

BFCounter is a program for counting k-mers from DNA sequencing data it uses a Bloom filter data structure to filter unique k-mers, likely generated from sequencing errors. BFCounter Homepage

Analysis and modeling of natural selection

Scripts for computing two test statistics for detecting positive selection (iHS and XP-EHH), as well as a flexible tool for performing Wright-Fisher simulations with selection can be found here. (Note that data from our selection scans in humans [Voight et al 2006 and Pickrell et al 2009] can be found here and here, respectively.)

Two programs written by Graham Coop when he was in the lab: one for simulating sweeps on standing variation and one for testing for correlations between allele frequencies and environmental variables can be obtained via Graham's website at UC Davis.

Software for RNA-seq analysis, eQTLs, etc

Some of the software that we have developed for analyzing RNA-seq data, including programs for counting reads, de novo identification of splice junctions, and detection of poly-A sites, can be found here.

TreeLD 1.0

TreeLD is a software tool for mapping complex trait loci, developed by Zollner and Pritchard (2005). TreeLD performs a multipoint LD-analysis by inferring the ancestry of a genomic region and analyzing this ancestry for signals of disease mutations. The generated likelihoods can be used to test for the presence of a disease locus and to fine-map its location, providing a point estimate and a credible region. Furthermore, the package provides a novel way of visualizing the association signal in a sample. TreeLD is designed for high-density SNP haplotypes and can be applied to case-control data, TDT trio data and quantitative trait data. Download TreeLD 1.0


STRAT is a companion program to structure. This is a structured association method, for use in association mapping, enabling valid case-control studies even in the presence of population structure. This method was described in an article in Am. J. Hum Genet 2000 (67:170-181).  Collaborators:  Matthew Stephens, Noah Rosenberg, Peter Donnelly.   [Abstract], [Manuscript], [Review of structured association methods]. [Download software]


MALDSoft is a program for admixture mapping of complex trait loci, using case-control data. The samples should come from a recently-admixed; population; additional 'learning' samples from the parental populations are helpful. The method was described in a paper by Giovanni Montana and Jonathan Pritchard [Abstract] [Manuscript]. Download MALDSoft

Programs from 'rare variants' paper

Download the simulation programs used for Pritchard's 2001 paper on rare variants. One program is an implementation of the ancestral selection graph (for simulating genealogies with selection). The other program simulates a multi-locus model of complex disease.