Improving Out-of-Domain Question Answering with Auxiliary Loss and Sequential Layer Unfreezing

The proliferation of pretrained Language Models such as BERT and T5 has been a key development is Natural Language Processing (NLP) over the past several years. In this work, we adapt a DistilBERT model, pretrained on masked language modeling (MLM), for the task of question answering (QA). We train the DistilBERT model on a set of in-domain data and finetune it on a smaller set of out-of-domain (OOD) data, with the goal of developing a model that generalizes well to new datasets. We significantly alter the baseline model by adapting an auxiliary language modeling loss, adding an additional DistilBERT layer, and undergoing training with sequential layer unfreezing. We find that adding an additional layer with sequential layer unfreezing offered the most improvement, producing a final model that achieve 5% over a naive baseline.