Multi-Agent Online Learning under Imperfect Information: Algorithms, Theory and Applications

Zhengyuan Zhou
Graduate Student, Stanford University
Date: Oct. 6th, 2017

Abstract

We consider a model of multi-agent online learning under imperfect information, where the reward structures of agents are given by a general continuous game. After introducing a general equilibrium stability notion for continuous games, called variational stability, we examine the well-known online mirror descent (OMD) learning algorithm and show that the “last iterate” (that is, the actual sequence of actions) of OMD converges to variationally stable Nash equilibria provided that the feedback delays faced by the agents are synchronous and bounded. We then extend the result to almost sure convergence to variationally stable Nash equilibria under both unbiased noise and synchronous and bounded delays. Subsequently, to tackle fully decentralized, asynchronous environments with unbounded feedback delays, we propose a variant of OMD which we call delayed mirror descent (DMD), and which relies on the repeated leveraging of past information. With this modification, the algorithm converges to variationally stable Nash equilibria, with no feedback synchronicity assumptions, and even when the delays grow super-linearly relative to the game’s horizon. We then again extend it to the case where there are both delays and noise.

In the second part of the talk, we present two applications of the multi-agent online learning framework. The first application is on non-convex stochastic optimization, where we characterize almost sure convergence of the well-known stochastic mirror descent algorithm to global optima for a large class of non-convex stochastic optimization problems (strictly including convex, quasi-convex and start-convex problems). A step further, our results also include as a special case the large-scale stochastic optimization problem, where stochastic mirror descent is applied in a distributed, asynchronous manner across multiple machines/processors. Time permitting, we will discuss how these results help (at least in part) clarify and affirm the recent successes of mirror-descent type algorithms in large-scale machine learning. The second application concerns power management on random wireless networks, where we use a game-design approach to derive robust power control algorithms that converge (almost surely) to the optimal power allocation in the presence of randomly fluctuating networks.

This is joint work with Nick Bambos, Stephen Boyd, Panayotis Mertikopoulos, Peter Glynn and Claire Tomlin.

Bio

Zhengyuan Zhou is a 5th-year PhD candidate in Electrical Engineering and has received a B.E. in Electrical Engineering and Computer Sciences and a B.A. in mathematics from UC Berkeley. His research interests include learning, optimization, control, game theory and applied probability.