Building a Robust QA System Via Diverse Backtranslation

While question answering (QA) systems have been an active topic of research in recent years, these models typically perform poorly on out-of-domain datasets. Thus, the goal for our project was to build a question answering system that is robust to distributional shift. Utilizing a pretrained DistilBERT model as our baseline, we tested two adaptation methods: backtranslation and few-sample fine-tuning. Backtranslation, which involves translating input data into an intermediate language before translating back to the original language, is a common data augmentation technique in many NLP tasks. We found that implementing standard backtranslation on out-of-domain training examples yielded significant increases in Exact Match (EM) and F1 scores over our baseline model. We compared these results to several modified backtranslation schemes including one in which we combined backtranslation with techniques from few-sample fine-tuning. Ultimately, we found that combining few-sample fine-tuning techniques with backtranslation did not improve performance. Our best model achieved an EM of 42.225 and F1 of 59.162 on the test set, and an EM of 38.74 and F1 of 51.19 on the development set.