Data-Driven Policy Learning: Generalization and Optimization

Zhengyuan Zhou
Graduate Student, Stanford University
Date: Oct. 5th, 2018


The problem of learning good treatment assignment rules from observational data lies at the heart of many challenges in data-driven decision making. While there is a growing body of literature devoted to this problem, most existing results are focused on the binary-action case (i.e., where one action corresponds to assignment to control and to assignment to treatment). In this paper, we study the offline multi-action policy learning problem with observational data and, building on the theory of efficient semi-parametric inference, propose and implement a policy learning algorithm that achieves asymptotically minimax-optimal regret. To the best of our knowledge, this is the first result of this type in the multi-action setup and provides a substantial performance improvement over the existing learning algorithms. We additionally investigate the application aspects of policy learning by working with decision trees, and discuss two different approaches for solving the key step of the learning algorithm to exact optimality, one using a mixed integer program formulation and the other using a tree-search based algorithm.

This is joint work with Susan Athey and Stefan Wager.


Zhengyuan Zhou is a 6th-year PhD candidate in Electrical Engineering and has received a B.E. in Electrical Engineering and Computer Sciences and a B.A. in mathematics from UC Berkeley. His research interests include learning, optimization, control, game theory and applied probability.