CSLI Publications logo
new books
contact us
for authors
LFG Proceedings
CSLI Publications

Parsing Arabic Using Treebank-based LFG Resources

Lamia Tounsi, Mohammed Attia, and Josef van Genabith


In this paper we present initial results on parsing Arabic using treebank-based parsers and automatic LFG f-structure annotation methodologies. The Arabic Annotation Algorithm (A3) (Tounsi et al., 2009) exploits the rich functional annotations in the Penn Arabic Treebank (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) to assign LFG f-structure equations to trees. For parsing, we modify Bikel's (2004) parser to learn ATB functional tags and merge phrasal categories with functional tags in the training data. Functional tags in parser output trees are then "unmasked" and available to A3 to assign f-structure equations. We evaluate the resulting f-structures against the DCU250 Arabic gold standard dependency bank (Al-Raheb et al., 2006). Currently we achieve a dependency f-score of 77%.

pubs @ csli.stanford.edu 
CSLI Publications
Stanford University
Cordura Hall
210 Panama Street
Stanford, CA 94305-4101
(650) 723-1839