Question Answering on SQuAD 2.0 using QANet with Performer FastAttention

Transformers are excellent but scale quadratically with sequence length, resulting in bottlenecks with long sequences. Performers introduce a provably accurate and practical approximation of regular attention, with linear space and time complexity. In this project, we implement the QANet model for the SQuAD 2.0 challenge, then replace self-attention layers in the encoders with Performer Fast Attentions to improve training speed by 18%.