Automatically Neutralizing Ableist Language in Text

Ableism involves the systemic oppression or discrimination against people with disabilities. It is often reinforced through language that perpetuates harmful biases and stigmatizes those with disabilities. However, such language can often be difficult to detect due to its pervasiveness in mainstream media. To address this issue, we introduce the first parallel corpus of ableist language, as well as a model for natural language generation that automatically brings ableist text into a neutral point of view. Our corpus contains 1500 sentence pairs that originate from movie scripts, news articles, and speech transcripts. Our language generation model is a concurrent system that utilizes a BERT encoder to identify and replace ableist words and phrases as part of the language generation process. In addition, we contribute a self-training pipeline that can generate more training data for the task of neutralizing ableism, as well as a novel evaluation method to more quantitatively assess a model's prowess at reducing bias. Human evaluation and our novel evaluation method suggest that these data and models are a first step towards the automatic identification and reduction of ableism in text.