Learning to Optimize via Information Directed Sampling

Daniel Russo
Graduate Student, Stanford University
Date: May. 16th, 2013

Abstract

We propose information directed sampling -- a new algorithm for balancing between exploration and exploitation in online optimization problems in which a decision-maker must learn from partial feedback. The algorithm quantifies the amount learned by selecting an action through an information theoretic measure: the mutual information between the true optimal action and the algorithm's next observation. Actions are then selected by optimizing an objective that balances earning high immediate reward and acquiring information.

We establish general regret bounds for this algorithm that scale with the entropy of the optimal action distribution. In addition, for several widely studied problem settings, we demonstrate state of the art performance in simulation trials. Finally, we present simple and transparent examples in which information directed sampling dramatically outperforms popular approaches like upper confidence bound algorithms and Thompson sampling, which don't quantify the information provided by actions.

This talk is based on joint work with Benjamin Van Roy.

Bio

Daniel Russo is a third year PhD student in the department of Management Science and Engineering at Stanford. He is advised by Prof. Benjamin Van Roy.