FAQ: When to use lexc vs. xfst?

The existence of two separate programming languages, lexc and xfst, is a challenge for many users, and especially for beginners. The two languages have different syntax, different special characters, different semantics, etc.

While the arbitrary syntactic differences are unfortunate, there is a good case for using lexc instead of xfst to define non-trivial dictionaries: lexc compiles them faster. The difference is that lexc is optimized to perform large unions more efficiently. Thus

LEXICON NRoots
dog N ;
cat N ;
horse N ;
elephant N ;
...
zebra N ;

denotes a potentially large union that might contain tens of thousands of operands. Lexc does a quick-union of each entry as it is parsed, without performing the expensive steps of determinization and minimization for each union. Every 1000 (or 10000?) quick-unions, it performs the expensive determinization/minimization algorithms.

In xfst, you can write something like

define NRoots {dog} | {cat} | {horse} | {elephant} | ... | {zebra} ;

or

read regex {dog} | {cat} | {horse} | {elephant} | ... | {zebra} ;

but xfst always performs each union as a full operation, including determinization and minimization each time. The result is the same, but the xfst "dictionary" will compile more slowly.

Note also that if the lexicon is really as simple as the examples above, involving a simple list of words with no internal structure, the xfst command 'read text' is as efficient as lexc in compiling the list. For example, if "mylist" is a word-list file with one word per line:

dog
cat
horse
elephant
...
zebra

You can compile it efficiently into a finite-state network from the xfst command line:

read text < mylist

or with a regular expression command:

read regex @txt"mylist";

These alternatives give the same result. Recall that @txt is a regular expression operator that invokes the same process as the 'read text' command.

[an error occurred while processing this directive][an error occurred while processing this directive]

Last Modified:Sunday, 07-Mar-2004 22:39:24 PST