Improve DistilIBERT-based Question Answering model performance on out-of-domain datasets by Mixing Right Experts

In this work, we built a MOE model by mixing 7 DistilBERT-based QA expert models that are task-fine-tuned on in-domain training datasets. We built data insight by carefully examining performance correlation across in-domain datasets and out-of-domain datasets and found out domain-fine-tuning on small target out-of-domain dataset that has quite different distribution than in-domain training dataset does not necessarily translate into out-of-domain performance on target dataset. We carefully select a set expert models for each out-of-domain set by leveraging data insights aforementioned. We achieved F1 score of 61.7} (ranked 6th out of 74 in test leaderboard) and EM score of 44.4 (ranked 2nd out of 74 in test leaderboard) in out-of-domain test datasets as of March 19, 2021.