Question Answering with Binary Objective

We added a secondary binary objective of predicting answerability to QANet. As shown in the picture, this objective is computed using the three outputs from the modeling layer in QANet. More specifically, we concatenate the 0th words of m0, m1, m2 (these are the outputs of the first, second, and third pass of the modeling encoder) and pass it through a single feed-forward layer with sigmoid activation. Our results showed that adding this secondary objective resulted in meaningful improvements in both EM and F1 over our implementation of QANet, which mostly follows the official QANet but we added a project layer on the output of the context-query attention layer to reduce memory usage. We also were able to produce the performance gains from adding character-level encoding, replacing RNN with multi-head self-attention and convolutions, and applying layer-wise dropout (stochastic depth).