Optional Replacement

The Beesley and Karttunen Book gives an incorrect explanation of the semantics of the optional replace opearator (->). It says, on p. 66,

[A (->) B] denotes an optional replacement, that is, the union of [A -> B] with the identity relation on A.

That is incorrect, as you can see by comparing the networks compiled with [a (->) b] and [[a -> b] | a]:

Regular expression
Network
[a (->) b] fs0:  ? -> fs0, a -> fs0, b -> fs0, a:b -> fs0.
[[a -> b] | a] fs0:  ? -> fs1, a -> fs2, b -> fs1, a:b -> fs1.
fs1:  ? -> fs1, b -> fs1, a:b -> fs1.
fs2:  (no arcs)

The correct statement is

[A (->) B] denotes an optional replacement. It is equivalent to [[A -> B] | ?*]*.

The right hand side of the union must be ?*, here the universal identity relation that maps any string into itself. Furthermore, we need the zero-plus (Kleene-star) operator around the union to ensure that any instance of "a" in the upper side string corresponds both to a "b" and to an "a".  For example,  the input "aa" must produce four output strings:

xfst[0]: read regex [[a -> b] | ?*]*;
360 bytes. 1 state, 4 arcs, Circular.

xfst[1]: apply down aa
aa
ab
ba
bb

Without the outmost * operator,  we would only get two outputs, "aa" and "bb".

Thanks to Gary A. Coen for spotting the error!


Last Modified: Tuesday, 10-Jul-2007 18:39:49 PDT