nextuppreviouscontents
Next:Estimating admixture proportions whenUp:Allele frequency modelsPrevious:Estimating :Contents

Correlations model:

We have modified our version of the correlated frequencies model from Pritchard et al. 2000a. The implementation and interpretation of the new ``$ F_{ST}$'' model will be described in detail in a forthcoming paper by Falush et al ., 2003 . Brief details are provided below. We introduce a new (multidimensional) vector, $ P_A$, which records the allele frequencies in a hypothetical ``ancestral'' population. It is assumed that the $ K$ populations represented in our sample have each undergone independent drift away from these ancestral frequencies, at a rate that is parameterized by $ F_1,F_2,F_3,...,F_K$, respectively. The estimated $ F_k$ values should be numerically similar to $ F_{ST}$ values, apart from differences that stem from the slightly different model, and differences in estimation. Also, it is difficult to estimate $ F_k$ accurately for data with lots of admixture. $ P_A$ is assumed to have a Dirichlet prior of the same form as that used above for the population frequencies:
$\displaystyle p_{Al\cdot} \sim {\cal D}(\lambda_1,\lambda_2,\dots,\lambda_{J_l}),$ (1)

independently for each $ l$. Then the prior for the frequencies in population $ k$ is

$\displaystyle p_{kl\cdot} \sim {\cal D}(P_{Al1}{{1-F_k}\over{F_k}},P_{Al2}{1-F_k\over{F_k}},\dots,P_{AlJ_l}{1-F_k\over{F_k}}),$ (2)

independently for each $ k$ and $ l$. In this model, the $ F$s have a close relationship to the standard measure of genetic distance, $ F_{ST}$. In the standard parametrization of $ F_{ST}$, the expected frequency in each population is given by overall mean frequency, and the variance in frequency across subpopulations of an allele at overall frequency$ p$ is $ p(1-p) F_{ST}$. The model here is much the same, except that we generalize the model slightly by allowing each population to drift away from the ancestral population at a different rate ($ F_k$), as might be expected if populations have different sizes. We also try to estimate ``ancestral frequencies'', rather than using the mean frequencies. We have placed independent priors on the $ F_k$, proportional to a gamma distribution with means of 0.01 and standard deviation 0.05 (but with $ {\rm Pr}[F_k\geq1]=0$). The parameters of the gamma prior can be modified by the user. Some experimentation suggests that the prior mean of 0.01, which corresponds to very low levels of subdivision, often leads to good performance for data that are difficult for the independent frequencies model. In other problems, where the differences among populations are more marked, it seems that the data usually overwhelm this prior on $ F_k$


nextuppreviouscontents
Next:Estimating admixture proportions whenUp:Allele frequency modelsPrevious:Estimating :Contents
William Wen 2002-07-18