nextuppreviouscontents
Next:Columns:Up:Format for the dataPrevious:Components of the data  Contents

Row

  1. Marker Names (Optional; string) The first row in the file can contain a list of identifiers for each of the markers in the data set. This row contains $ L$ strings of integers or characters, where$ L$ is the number of loci.
  2. Inter-Marker Distances (Optional; real) the next row in the file is a set of inter-marker distances, for use with linked loci. These should be genetic distances (e.g., centiMorgans), or some proxy for this based, for example, on physical distances. The actual units of distance do not matter too much, provided that the marker distances are (roughly) proportional to recombination rate (the algorithm estimates an appropriate scaling from the data). The markers must be in map order within linkage groups. When consecutive markers are from different linkage groups (e.g., different chromosomes), this should be indicated by the value -1. The first marker is also assigned the value -1. All other distances are non-negative. This row contains $ L$ real numbers.
  3. Individual Data (Required) Data for each sampled individual is arranged into one or more rows as described above (further details below).
  4. Phase Information (Optional; diploid data only; real number in the range [0,1]). This is for use with linked loci only. This is a single row of $ L$ probabilities that appears after the genotype data for each individual. If phase is known completely, or no phase information is available, these rows are unnecessary. They may be useful when there is partial phase information from family data or when haploid X chromosome data from males and diploid autosomal data are input together. There are two alternative representations for the phase information: (1) the two rows of data for an individual are assumed to correspond to the paternal and maternal contributions, respectively. The phase line indicates the probability that the ordering is correct at the current marker (set MARKOVPHASE=0); (2) the phase line indicates the probability that the phase of one allele relative to the previous allele is correct (set MARKOVPHASE=1). The first entry should be filled in with 0.5 to fill out the line to $ L$ entries.For example the following data input would represent the information from an male with 5 unphased autosomal microsatellite loci followed by three X chromosome loci, using the maternal/paternal phase model:

  5.  
    102 156 165 101 143 105 104 101    
    100 148 163 101 143 -9 -9 -9    
    0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0    
                       
    where -9 indicates "missing data", here missing due to the absence of a second X chromosome, the 0.5 indicates that the autosomal loci are unphased, and the 1.0s indicate that the X chromosome loci are have been maternally inherited with probability 1.0, and hence are phased. The same information can be represented with the markovphase model.In this case the input file would read:
    102 156 165 101 143 105 104 101    
    100 148 163 101 143 -9 -9 -9    
    0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0    
                       
    Here, the two 1.0s indicate that the first and second, and second and third X chromosome loci are perfectly in phase with each other. Note that the site by site output under these two models will be different. In the first case, structure would output the assignment probabilities for maternal and paternal chromosomes. In the second case, it would output the probabilities for each allele listed in the input file.

nextuppreviouscontents
Next:Columns:Up:Format for the dataPrevious:Components of the dataContents
William Wen 2004-07-13