Exploring RoBERTa's theory of mind through textual entailment

Can transformer models reason about the thoughts of other humans, the same way we can? Within psychology, philosophy, and cognitive science, theory of mind refers to the cognitive ability to reason about the mental states of other people, thus recognizing them as having beliefs, knowledge, intentions and emotions of their own. In this project, we construct a natural language inference (NLI) dataset that involves theory of mind inferences related to knowledge and belief. We test the dataset on RoBERTa-large finetuned on the MNLI dataset. Experimental results show that the model struggles with such inferences, including after attempts for further finetuning.