Improving Robustness of Question-Answering System Using Domain-adaptive Pretraining, Adversarial Training, Data Augmentation and Finetuning

From previous work, we know that Question-Answering (QA) system based on neural language models (NLM) is highly sensitive to the knowledge domain of training data and often has inferior performance when used for out-of-domain QA tasks. In this project, the authors attempt to combine a few published methods to improve the robustness of the QA system on out-of-domain data. We have tried methods including domain adversarial training, domain adaptive pretraining, finetuning on few samples, and data augmentation. We applied these methods through experimentation, improving the robustness of our baseline model on out-of-domain test datasets given two groups of training datasets: three large in-domain datasets and three very small out-of-domain datasets. We experimented and evaluated the effects of the above-mentioned methods both individually and combined, and found that while the individual method generates mixed results, the combination of them can improve the robustness of the baseline model in the QA task to the greatest extent on the out-of-domain datasets. We have also included a qualitative analysis of our results, shedding some light on the real capabilities of our model.