SleepTalk: Textual DeepDream for NLP Model Interpretability

img
We propose SleepTalk, a technique for improving interpretability of pre-trained NLP models. Greater interpretability of black box neural networks is imperative for mitigating bias, developing trust in implemented models, and advancing intuition for better transfer learning. Thus, SleepTalk provides an approach to gain human-interpretable data on the learned representations of neurons in large neural networks. We also augment SleepTalk to be used on the adjacent task of unsupervised textual style transfer; synthesizing output text from just a content reference input and a style reference input. We assess the interpretations of SleepTalk and its behavior at different layers and qualify its resulting outputs of pre-trained NLP models. Finally, via results from SleepTalk, we suggest similarities between NLP neural network models and layers of the human brain's temporal and parietal lobe, structures critical to the formation of thoughts.