Next:Components of the dataUp:Documentation for structure software:Previous:Overview Contents

Format for the data file

The format for the genotype data is shown in Table 2 (and Table 1 shows an example). Essentially, the entire data set is arranged as a matrix in a single file, in which the data for individuals are in rows, and the loci are in columns. The user can make several choices about format, and most of these data (apart from the genotypes!) are optional. For a diploid organism, data for each individual can be stored either as 2 consecutive rows, where each locus is in one column, or in one row, where each locus is in two consecutive columns. Except for linked loci (see below) the order of the alleles for a single individual does not matter. The pre-genotype data columns (see below) are recorded twice for each individual. (Similarly, for

-ploid organisms, data for each individual is stored in

consecutive rows, or in one row where each locus is in

consecutive columns.) The first few columns are for storing several kinds of non-genotype information. After that, each column stores the genotype data for a single locus. The columns are separated by spaces or other whitespace. The user can also include up to two rows at the beginning of the file, consisting of marker names, and map distances. For diploid data, the user can include phase information as an additional row following the genotypes for each individual. Table 1 shows an example data file. Full details of file format are given below.

George	1	-9	145	66	0	92
George	1	-9	-9	64	0	94
Paula	1	106	142	68	1	92
Paula	1	106	148	64	0	94
Matthew	2	110	145	-9	0	92
Matthew	2	110	148	66	1	-9
Bob	2	108	142	64	1	94
Bob	2	-9	142	-9	0	94
Anja	1	112	142	-9	1	-9
Anja	1	114	142	66	1	94
Peter	1	-9	145	66	0	-9
Peter	1	110	145	-9	1	-9
Carsten	2	108	145	62	0	-9
Carsten	2	110	145	64	1	92

Table 1: Sample data file. Here LABEL=1, POPDATA=1, NUMINDS=7, NUMLOCI=5, and MISSING=-9.
Also, POPFLAG=0, PHENOTYPE=0, EXTRACOLS=0. The second column shows the geographic sampling
location of individuals. We can also store the data with one row per individual, in which case the first row would
read ``George 1 -9 -9 145 -9 66 64 0 0 92 94''.

Subsections

Next:Components of the dataUp:Documentation for structure software:Previous:Overview Contents

William Wen 2002-07-18