We have implemented reasonably careful error checking to make sure
that the data set is in the correct format, and the program will
attempt to provide some indication about the nature of any problems
that exist. The front end requires returns at the ends of each row,
and does not allow returns within rows; the command-line version of
structure treats returns in the same way as spaces or tabs.
One problem that can arise is that editing programs used to assemble
the data prior to importing them into structure can introduce
hidden formatting characters, often at the ends of lines, or at the
end of the file. The front end can remove many of these
automatically, but this type of problem may be responsible for errors
when the data file seems to be in the right format. If you are
importing data to a UNIX system, the dos2unix function can be helpful
for cleaning these up.
Table 2:
Format of the data file, in two-row format.
Most of these components are
optional (see text for details). is an identifier for marker
. is the distance between markers and .
is the label for individual , is the
geographic origin of individual (PopData); is a flag
used to incorporate learning samples (PopFlag);
can store
a phenotype for individual ;
are for
storing extra data (ignored by the program);
stores the genotype of individual at locus . is
the phase information for marker in individual .