Spelling Correction


A robust morphological analyzer should not only recognize words that are spelled correctly. It should also deal intelligently with common misspellings. Words ending in -ible, -able, -ant, -ent are often misspelled: irresistible ~ irresistable, indispensible ~ indispensable, dominant ~ dominent, inadvertant ~ inadvertent. Doubled consonants provide many opportunties for wrong spellings: misspelling ~ mispelling, occurence ~ ocurrence ~ occurrence, embarasment ~ embarrasment ~ embarrassment. A Spelling Test by Mindy McAdams gives you a chance to test your skills on 50 commonly misspelled words.

The Task

Pick at least a dozen correctly spelled words from the McAdams list and write one or several replace rules that produce one or more misspelled variants for each of the words on your list.  For example, to produce irresistable from irresistible and dominent from dominant, you can use rules such as
   
{ible} -> {able}, {ant} -> {ent} ;

The task is to make a lexical transducer that has on the upper side only correctly spelled words. On the lower side, it should have both the correct spelling and the incorrect spelling produced by the misspelling rules. Each misspelled word on the lower  side is paired with the correctly spelled form. That is, the commands

xfst[1]: apply up irresistable
xfst[1]: apply up irresistible

should both produce the result

irresistible

On the other hand the command

xfst[1]: apply down irresistable

should not yield any output because it is an incorrect spelling.  And the command

xfst[1]: apply down irresistible

should only produce the output

irresistible

and not the misspelled variant. To get this behavior you need to have the failure flag diacritic, @F@, on the upper side of each path that contains a misspelled variant of the lower side. For example,

Upper side:   @F@ i r r e s i s t i b l e
Lower side:       i r r e s i s t a b l e

The position of the @F@ along the path does not matter. It could also be where the error occurs or at the end. What is important is that the @F@ flag is mapped to an epsilon on the lower side. Therefore, it is not visible to the apply up routine but blocks the incorrect realization in the apply down case.

To make this exercise more realistic, let us throw in a few words such as banjo that have two possible legal spellings in the plural. The commands

xfst[1]: apply up banjoes
xfst[1]: apply up banjos

should both produce the output

banjos

and the command

xfst[1]: apply down banjos

should produce two outputs

banjoes
banjos

The purpose of this lexical transducer is twofold: to correct incorrect spellings and to normalize variant spellings into a single canonical form.
The xfst script that creates it should leave the transducer on the stack for testing.