nextuppreviouscontents
Next:Components of the dataUp:Documentation for structure software:Previous:OverviewContents


Format for the data file

The format for the genotype data is shown in Table 2 (and Table 1 shows an example). Essentially, the entire data set is arranged as a matrix in a single file, in which the data for individuals are in rows, and the loci are in columns. The user can make several choices about format, and most of these data (apart from the genotypes!) are optional. For a diploid organism, data for each individual can be stored either as 2 consecutive rows, where each locus is in one column, or in one row, where each locus is in two consecutive columns. Except for linked loci (see below) the order of the alleles for a single individual does not matter. The pre-genotype data columns (see below) are recorded twice for each individual. (Similarly, for $ n$-ploid organisms, data for each individual is stored in $ n$ consecutive rows, or in one row where each locus is in $ n$ consecutive columns.) The first few columns are for storing several kinds of non-genotype information. After that, each column stores the genotype data for a single locus. The columns are separated by spaces or other whitespace. The user can also include up to two rows at the beginning of the file, consisting of marker names, and map distances. For diploid data, the user can include phase information as an additional row following the genotypes for each individual. Table 1 shows an example data file. Full details of file format are given below.
 
 
George 1   -9 145 66 0 92      
George 1   -9 -9 64 0 94      
Paula 1   106 142 68 1 92      
Paula 1   106 148 64 0 94      
Matthew 2   110 145 -9 0 92      
Matthew 2   110 148 66 1 -9      
Bob 2   108 142 64 1 94      
Bob 2   -9 142 -9 0 94      
Anja 1   112 142 -9 1 -9      
Anja 1   114 142 66 1 94      
Peter 1   -9 145 66 0 -9      
Peter 1   110 145 -9 1 -9      
Carsten 2   108 145 62 0 -9      
Carsten 2   110 145 64 1 92      
                     
Table 1: Sample data file. Here LABEL=1, POPDATA=1, NUMINDS=7, NUMLOCI=5, and MISSING=-9. 
Also, POPFLAG=0, PHENOTYPE=0, EXTRACOLS=0. The second column shows the geographic sampling 
location of individuals. We can also store the data with one row per individual, in which case the first row would 
read ``George 1 -9 -9 145 -9 66 64 0 0 92 94''.



Subsections
nextuppreviouscontents
Next:Components of the dataUp:Documentation for structure software:Previous:OverviewContents
William Wen 2002-07-18