Selectively Editable Language Models

Pretrained language models abound in NLP applications, but they are largely static in time and retain stale representations of the world around them. We seek to selectively edit knowledge learned by a language model without affecting its outputs on unrelated samples. This project explores approaches that alter model understanding of named entities through novel training techniques applied to DistilGPT2, a pretrained language model with 82M parameters. We build on methods first developed in general Model-Agnostic Meta-Learning (MAML) frameworks, which allow us to train model parameters on base language model objectives as well as a secondary "adaptability" task. Our results show that this technique improves knowledge editing with less performance degradation on unrelated samples than standard fine-tuning approaches.