Recursive Transformer: A Novel Neural Architecture for Generalizable Mathematical Reasoning

Recent works in deep learning investigate whether neural models can learn to reason mathematically. A common finding is that models seem to perform non-logical shortcuts when generating an answer, causing them to fail to generalize to more complex arithmetic problems. We create a model that is capable of reducing a complex problem into its respective subparts, taking logical intermediate steps to arrive at an answer. We do so by introducing a recursive framework to the traditional transformer architecture in two different approaches: 1) a strongly supervised variant which teacher forces each recursive step and 2) a weakly supervised approach which does not constrain the model's intermediate solutions. The strongly supervised approach not only successfully learns complex addition and subtraction but also demonstrates its ability to extrapolate by performing well when the number of operators increases. We also found that some of the models trained using our approach learned human-interpretable representations of numbers as well attention parameters that illustrate their problem-solving process. These results are a testament to the promise of the recursive transformer approach.