background image
Morphological Analyses in an XML Format
Kenneth R. Beesley
Xerox Research Centre Europe
6, chemin de Maupertuis
38240 Meylan, France
Ken.Beesley@xrce.xerox.com
October 11, 2004
1
Introduction
Traditionally, Xerox morphological analysis strings consist of a simple sequence
of baseform and tags, as in the following example.
Multichar_Symbols [Noun] [Sg] [Pl]
LEXICON Root
dog
N ;
cat
N ;
LEXICON N
[Noun][Sg]:0
# ;
[Noun][Pl]:s
# ;
Analyses from such a system look like
dog[Noun][Sg]
and
dog[Noun][Pl]
,
where
[Noun]
,
[Sg]
and
[Pl]
are single symbols.
2
XML
2.1
Analyses in an XML Format?
As XML increases in popularity, it has occurred to several people that they might
like to build a transducer that produces output strings in an XML format, to fa-
cilitate further processing by XML-savvy programs. There is no reason why this
1