GLARE: Generative Left-to-right AdversaRial Examples

img
Although well-studied in computer vision, adversarial examples are difficult to produce in the NLP domain, partially due to the discrete nature of text. Previous approaches (drawing on constrained methods such as rule-based heuristics and synonym substitution) have attained relatively limited success, largely because these approaches consider neither syntactic nor semantic structure. As a result, adversarial examples yielded by these models often suffer from a lack of grammaticality, idiomaticity, and overall fluency. Recently, transformer models have successfully been applied to adversarial example generation, vastly outperforming previous state-of-the-art approaches. Current transformer-based textual adversarial frameworks use the masked language models (MLM) BERT or RoBERTa to generate word-level replacements, essentially re-purposing their pretext tasks (masked token prediction). Yet this is not an ideal fit-ultimately, an MLM's objective is not text generation. This is, however, precisely the explicit objective of another class of models: generative language models. Therefore, we propose a novel textual adversarial example generation framework based on generative LMs (rather than MLMs). Those familiar with GPT-2 may ask: doesn't it (unlike BERT) not benefit from bidirectional context? To address this shortcoming, we adopt the ILM (infilling language model) framework introduced by Donahue et al. 2020, which allows the model to read the entire sentence before infilling. Our method (GLARE) generates word- and span-level perturbations of input examples using a fine-tuned ILM model compounded with a word importance ranking algorithm. Notably, our algorithm is able to easily insert spans of arbitrary length, something that neither CLARE nor previous approaches achieve. Armed with the best of both worlds-ease of generation and context-GLARE is able to outperform CLARE, the current SOTA, on a variety of metrics (attack success rate and cosine similarity between the perturbed & original text) while simultaneously increasing output fluency.