nextuppreviouscontents
Next:Allele frequency modelsUp:Ancestry ModelsPrevious:3. Linkage modelContents

4. Using prior population information

The default mode for structure uses only genetic information to learn about population structure. However, there is often other information that can be used (e.g., physical characteristics of sampled individuals or geographic sampling location) to assist the clustering. In the section ``Model with prior population information'' in (Pritchard et al. 2000a), we described our framework for incorporating this type of information into the inference procedure. There are at least two kinds of reasons for making use of this sort of extrinsic population information. One is that one may want to test whether any individuals in the sample are immigrants to their supposed populations, or have recent immigrant ancestors (see the example in Pritchard et al. 2000a). A second is that we may want to make use of learning samples: ie., we have some individuals of known origin, and we want to use them to help us classify individuals of unknown origin. For example in Beaumont et al., 2001, we wanted to learn about the ancestry of Scottish wildcats (many of which are hybridized with feral domestic cats). We had genetic data from a bunch of pet house cats which we defined as being in one population, while we inferred $ Q$ for the wildcats (with $ K=2$). Use of this sort of prior information will normally improve the accuracy of the inference. To use these options you need to set USEPOPINFO to 1, and choose a value of MIGRPRIOR (which is $ \nu$ in Pritchard et al). You might choose something in the range 0.001 to 0.1 for $ \nu$. Even when using learning samples, it may be sensible to allow for some misclassification by setting MIGRPRIOR larger than 0. The pre-defined population for each individual is set in the input data file (see PopData) and should be an integer between 1 and MAXPOPS, inclusive. If PopData for any individual is outside this range, their $ q$ will be updated in the normal way (ie without prior population information, according to the model that would be used if USEPOPINFO was turned off.6). Learning samples are implemented through the use of the PopFlag column in the data file. The pre-defined population is used for those individuals for whom PopFlag=1, and it is ignored for individuals for whom PopFlag=0. If there is no PopFlag column in the data file, then when USEPOPINFO is turned on, PopFlag is set to 1 for all individuals. In general, we advocate that the user should first run the program without population information to ensure that the pre-defined populations are in rough agreement with the genetic information. 
nextuppreviouscontents
Next:Allele frequency modelsUp:Ancestry ModelsPrevious:3. Linkage modelContents
William Wen 2002-07-18