Consistent Estimation of the Average Treatment Effect with Text as Confounder
I show that embedding representations of text can be used to construct a root-n consistent estimator of the average treatment effect under confounding. I explore using both GloVE embeddings as well as transformer-based document embeddings to integrate text data in the double machine learning framework for causal inference. Using a large dataset of consumer complaints from 2018-2021 published by the CFPB, I estimate the causal effect of a complainant identifying themselves as an older American on the probability their complaint is resolved with monetary or non-monetary compensation. I show that including a representation of text reduces the treatment effect estimate.