Exploiting XLE's Finite State Interface in LFG-based Statistical Machine Translation
We present the addition of a morphological generation component to an LFG-based Statistical Machine Translation System, taking advantage of existing morphological grammars and the FST (Finite State Transducer) processing pipeline of the XLE system. The extended syntax-driven translation system takes separate stochastic decisions for lemmata and morphological tags; the role of finite-state morphological grammars is to generate full forms out of a bundle of morphological tags produced by the translation component. This technique can lead to a more effective use of a given amount of training data from a parallel corpus, since lexical vs. morphosyntactic translation patterns can be induced independently.
The existing FST processing cascade for German, when added to the Statistical Machine Translation System, suffers from generation failures. These occur due to overgeneralisation by the syntax-driven translation process and originate from (i) the use of various underspecification tags in the morphological grammar, or (ii) erroneous assignment of certain tags to a given lemma. In order to deal with this, we add a set of replacement/correction rules on top of the cascade. The augmented FST cascade leads to an increase of generation coverage from 47.90% to 75.35%. A detailed error analysis for the remaining 24.65% is given.