Transformer Exploration

In this project we we build a question answering model for the SQuAD 2.0 dataset. Beginning with a baseline BiDAF model we make two extensions to improve the model. In the first extension we add character embeddings to match the model in the original BiDAF paper. Next we swap out the LSTM encoder for, the more parallelizable, Transformer block. After creating our word and character embeddings we add in positional encodings. Next we apply a single transformer encoder block featuring convolution and self attention to the embeddings of the context and the query. We then perform BiDirectional attention, before applying three more transformer blocks in the modeling layer. Finally we output a prediction of the answer or no answer if one does not exist.