Self-Attention in Question Answering

img
For the default final project, our task was to build a model that performs question answering over the Stanford Question Answering Dataset (SQuAD). Our goal was to improve on the baseline BiDAF model's F1 and EM scores on the task. To do so, we made two additions to the model: character embeddings and a self-attention layer, both which were used in R-Net. We found that while these additions improved the F1 and EM scores, it also required significantly more memory and training time.