*Q: Why is there a "dotted brackets" operator [..] ->
operator and how is it different from the normal square bracket []
-> operator in Replace Rules? *

A: OK. To start with, the rule operator is really just the right
arrow `->`

.
The [..] is a special
"dotted-bracket" notation in Replace
Rules that can be used to surround a regular expression to be replaced.
It is typically needed in cases where the language to be replaced
contains the empty string (epsilon).

Here's the problem: your basic right-arrow rewrite rule is always of the following form

` ````
upper -> lower
|| left _ right
```

where the left and right contexts are optional. I'll stick to the simpler

` ````
upper -> lower
```

type of Replace Rule for now.

Looking at generation, if the upper expression can match the input
string at any point, then the rule must apply and fire. In the
following case, the
rule states that every *a* in the upper string must be mapped
to a
*b* in the lower string. E.g. applying

` `` a -> b `

downward to the input string "cacacac" will yield "cbcbcbc". The rule matches the input string in three places in this case and applies three times.

Now consider the following rules, built on the exact same pattern. [] is the same as 0 and matches the empty string.

` ````
[] -> b
```

is equivalent to

` `` 0 -> b `

Here the [] (or the 0) is the upper expression, b is the lower
expression, and -> is (as before) the rule operator. For generation
(downward application), the rule [] -> b states that wherever the
input string contains an empty string, then the empty string must be
mapped to a *b* in the output string.

Syntactically and semantically, the rule `[] -> b`

is completely parallel to `a -> b`

. In practice, the
problem with `[] -> b`

is that any input string contains
an infinite number of empty strings. E.g. the input string "ab"
contains an infinite number of empty strings before the *a*, an
infinite number of empty strings between *a* and *b*,
and an infinite number of empty strings after the
final *b*. So if you apply

` `` [] -> b `

or the equivalent

` `` 0 -> b `

to the input string "ab", then the rule will match an infinite number of times and will try to generate an infinite number of output strings. This lamentable situation comes from the basic semantics of replace rules and the fact that any input string contains an infinite number of empty strings.

When xfst beginners write a rule like `0 -> b`

or ```
[]
-> b
```

, what they usually intend is for the rule to fire a
single time (rather than an infinite number of times) wherever the rule
can match an empty string. And that's what the special "dotted-bracked"
notation allows you to specify. When you write ` [..] -> b `

and apply
that
downwards to the input string "ac", the single result is "babcb". You
can
test this easily in xfst:

xfst[0]: regex [] -> b ;

124 bytes. 1 state, 3 arcs, Circular.

xfst[1]: down ac

Network has an epsilon loop on the input side.

ac

[N.B. above that xfst sees that the rule can match an infinite number of times and prints a warning about the epsilon (empty string) loop.]

xfst[1]: regex [..] -> b ;

200 bytes. 3 states, 4 arcs, Circular.

xfst[2]: down ac

babcb

With this second test, using an upper expression `[..]`

, a single *b* is
inserted wherever an empty string can be matched in the
input expression, i.e. once at the beginning, once at the end, and once
between
symbols.

Next step: It's important to understand that the dotted brackets [.
and
.] can appear alone as in `[..] -> b`

,
but this in in fact a
special case. They are in fact separate operators that *surround*
an expression designating the language to be replaced. And you
typically need to use them whenever the language to be replaced *can*
match an
empty string.

Consider the following (bad) rule:

` ````
a*b* -> c
```

This states (for generation) that whenever the input string matches
"zero
or more *a*s followed by zero or more *b*s, then the
rule must
fire and map them to a *c*". The problem is that `a*b*`

can (and therefore will) match the empty string. If you apply

` ````
a*b* -> c
```

to any input string, e.g. "d", then the rule will once again match an infinite number of times and try to output an infinite number of solutions. (There are an infinite number of empty strings at the beginning and at the end of "d".) This can be shown easily in xfst:

xfst[0]: regex a*b* -> c ;

344 bytes. 3 states, 18 arcs, Circular.

xfst[1]: down d

Network has an epsilon loop on the input side.

ccd

ccdc

cd

cdc

d

dc

xfst[1]: regex [.a*b*.] -> c ;

324 bytes. 5 states, 11 arcs, Circular.

xfst[2]: down d

cdc

As the example shows, the way to prevent this "infinite epsilon loop" problem is to surround the input expression with the [. and .] dotted brackets. This causes the rule to be compiled differently. That way, wherever the input string contains an infinite number of empty strings, the rule treats it like a single empty string and does a single mapping. Apply [. a*b* .] -> c to the input string "d" and you get just "cdc".

For left-arrow rules, as in

` `` a <- b `

the right-hand (lower) side is the input side, so

` ``a <- 0`

or

` ``a <- []`

or something like

` ``a <- a*b*c*`

will cause the same infinite-match problem. Applying such rules upwards to an input string will result in an infinite number of matches, technically requiring an infinite number of outputs. You can get the desired behavior by surrounding the lower-side (input) expression with dotted brackets:

` ````
a <- [..]
```

` `` a <- [. a*b*c* .] `

as in the following xfst example:

xfst[2]: regex a <- [. a*b*c* .] ;

388 bytes. 6 states, 15 arcs, Circular.

xfst[3]: up m

ama

So, in conclusion, you typically need to surround the expression that matches the input string with [. and .] whenever the input expression can match the empty string.

[an error occurred while processing this directive][an error occurred while processing this directive]