Building a Robust QA system with Data Augmentation

Pre-trained neural models such as our baseline model fine-tuned on a BERT based pre-trained transformer to perform nature language question and answering prob- lems usually show high levels of accuracy with in-context data, but often display a lack of robustness with out-of-context data. We hypothesize that this issue is not primarily caused by the pre-trained model's limitations, but rather by the lack of diverse training data that might convey important contextual information in the fine-tuning stage. We explore several methods to augment standard training data with syntactically informative data, generated by randomly replacing the grammatical tense of data, removing words associated with gender, race, or economic means, and only replacing question sentences with synonym words from a lexicon of words. We found that the augmentation method that performed the best was changing the grammar of more and one word in every question. Although it only made less than 1 point increase in the F1 and EM scores, we believe that if we also applied this method to the context and answers training data we would be able to see even more significant improvements. We were also surprised that the method of removing associations with gender, race, or economic status performed relatively well given that we removed a lot of words from the dataset.