Byte Order Mark
Older versions of UTF-8 mode in xfst could not accommodate the
initial Byte Order Mark that some UTF-8 editors automatically insert to
the beginning of a file. The problem has been fixed. Thanks to Mohammed
Attia for flushing out this bug.
What is a Byte Order Mark?
Besides UTF-8, there are other UTFs (Unicode Transport Format). In some
of them bytes are interpreted differently depending on whether the
machine is "big-endian" (Sun, MacOSX on PPC) or "little-endian" (Windows,
most Linux machines, and MacOSX on Intel machines). For this reason,
the Unicode standard specifies that a file may begin with a BOM,
a sequence of reserved bytes that indicate byte order as well as the type
of UTF encoding. But the UTF-8 standard (by far the most common UTF) is
the same for both big-endian and little-endian machines. No BOM is needed
in a UTF-8 file. Some UTF-8 editors insert a BOM, some don't.
It does not indicate byte order; it just serves to indicate that the
encoding is UTF-8 rather than something else.
If you look at your UTF-8 file in a text editor such as Emacs, you may see
an initial BOM mark appearing as a strange blob. You can remove it with a
Control-D and save the file. Everything will work fine without the mark.
The current version of xfst
ignores a BOM in the beginning of a UTF-8 file and will signal
an error if it sees some conflicting character encoding directive.