Selected software packages are linked below. For others (especially for more recent work) please follow links from the journal articles. Let us know if you cannot find what you need and we will do our best to help.
Structure 2.3.X
The program structure is a free software package for using
multi-locus genotype data to investigate population structure. Its
uses include inferring the presence of distinct populations, assigning
individuals to populations, studying hybrid zones, identifying
migrants and admixed individuals, and estimating population allele
frequencies in situations where many individuals are migrants or
admixed. It can be applied to most of the commonly-used genetic
markers, including SNPs, microsatellites, RFLPs and AFLPs. The basic
algorithm was described by Pritchard, Stephens & Donnelly
(2000). Extensions to the method were published by Falush,
Stephens and Pritchard (2003),
and (2007)
and Hubisz, Falush, Stephens and Pritchard (2009).
Structure Homepage
SDS: Singleton Density Score
SDS is an approach for studying very recent changes in allele frequencies
within a population, using whole genome sequence data. Applied to data
from the UK10K Project (3000 individuals) we estimate that SDS reflects
frequency changes during the past 2,000 years. Large changes in
frequencies imply adaptive events. The software was developed by Yair Field,
and the paper was jointly written by Yair, Evan Boyle, and Natalie Telis.
[software]
[preprint]
[SDS values in UK10K]
WASP
WASP
[software]
[preprint]
is a software package for two related tasks: (1) removing allelic
bias in mapped sequencing reads and, (2) identifying molecular
quantitative trait loci (QTLs) using next-generation sequencing data
(e.g. gene expression QTLs or histone mark QTLs). WASP identifies
molecular QTLs using a statistical test that combines information
about the total depth and allelic imbalance of mapped reads. WASP can
call QTLs with very small sample sizes (as few as 10) compared to
traditional QTL mapping approaches. WASP was developed by Bryce van de Geijn and
Graham McVicker.
TreeMix: estimation of population trees with admixture
TreeMix uses large
numbers of SNPs to estimate the historical relationships among
populations, using a graph representation that allows both
population splits and migration events. You can download the TreeMix
paper by Pickrell and Pritchard (2012) here.
CENTIPEDE: software for inference of TF binding sites
CENTIPEDE is a method developed by Roger Pique-Regi and Jacob
Degner that uses PWM information plus experimental data such as
DNase1, histone marks or FAIRE to infer transcription factor binding
sites with high specificity. Software and data are available here.
Transcriptome Assembler
Software for transcriptome assembly used in RNA-seq of 16 mammalian species.
Download assembler.
BFCounter: Memory efficient k-mer counting
BFCounter is a program for counting k-mers from DNA sequencing data
it uses a Bloom filter data structure to filter unique k-mers, likely generated
from sequencing errors. BFCounter Homepage
Analysis and modeling of natural selection
Scripts for computing two test statistics for detecting positive
selection (iHS and XP-EHH), as well as a flexible tool for
performing Wright-Fisher simulations with selection can be found here. (Note that data
from our selection scans in humans [Voight et al 2006 and Pickrell et al 2009]
can be found here and here, respectively.)
Two programs written by Graham Coop when he was in the lab: one for simulating sweeps
on standing variation and one for testing for correlations between allele frequencies
and environmental variables can be obtained via
Graham's website at UC Davis.
Software for RNA-seq analysis, eQTLs, etc
Some of the software that we have developed for analyzing RNA-seq data, including programs
for counting reads, de novo identification of splice junctions, and detection of poly-A sites,
can be found here.
TreeLD 1.0
TreeLD is a software tool for mapping complex trait
loci, developed by Zollner and Pritchard (2005).
TreeLD performs a multipoint LD-analysis by inferring the ancestry
of a genomic region and analyzing this ancestry for signals of disease mutations. The generated likelihoods can be used to test for the presence of a disease locus and to fine-map its location, providing a point estimate and a credible region. Furthermore, the package provides a novel way of visualizing the association signal in a sample. TreeLD is designed for high-density SNP haplotypes and can be applied to case-control data, TDT trio data and quantitative trait data.
Download TreeLD 1.0
STRAT
STRAT is a companion program to
structure. This
is a structured association method, for use in association mapping,
enabling valid case-control studies even in the presence of population
structure. This method was described in an article in Am. J. Hum Genet
2000 (67:170-181). Collaborators: Matthew Stephens, Noah Rosenberg,
Peter Donnelly. [Abstract],
[Manuscript],
[Review
of structured association methods].
[Download software].
MALDSoft
MALDSoft is a program for admixture mapping of complex trait loci, using
case-control data. The samples should come from a recently-admixed; population;
additional 'learning' samples from the parental populations are helpful.
The method was described in a paper by Giovanni Montana and Jonathan Pritchard
[Abstract] [Manuscript].
Download MALDSoft
Programs from 'rare variants' paper
Download the simulation programs used for
Pritchard's 2001 paper
on rare variants. One program is an implementation of the ancestral
selection graph (for simulating genealogies with selection). The
other program simulates a multi-locus model of complex disease.