About Me

I am a PhD candidate in Management Science & Engineering department at Stanford University, where I work on machine learning. My research focuses on reinforcement learning, contextual bandits and causal inference. I am advised by Benjamin Van Roy and Susan Athey. My PhD is funded by the "Arvanitidis" Stanford Graduate Fellowship in Memory of William K. Linvill and by the Onassis Foundation Graduate Fellowship.

In 2015, I received a MSc in Operations Research at Stanford University, graduating first of my class. In 2014, I received a BSc and MSc in Electrical Engineering & Computer Science from National Technical University of Athens (NTUA), Greece, graduating with the highest GPA in NTUA's 200-year history. Between 2012 and 2015, I spent time at Google Research, where I worked on the design and deployment of large-scale optimization algorithms for Google Technical Infrastructure and Google Ad Exchange. In 2016, I led the design and launched the multi-touch attribution product of Krux (now Salesforce Einstein). In 2018, I joined the Machine Learning group at Microsoft Research NYC, where I worked with Miroslav Dudik and Robert Schapire on reinforcement learning decomposition and off-policy evaluation for high-dimensional contextual bandits.

I have been the recipient of the Intel Honorary Award, the Google Anita Borg Memorial Award, the Google Excellence Award, the Stanford Graduate Fellowship in Science and Engineering, and the Stanford Outstanding Academic Achievement Award.

I enjoy swimming, travelling across the world with great company, and exploring impressionist and surrealist art.


Scalable Coordinated Exploration in Concurrent Reinforcement Learning

Maria Dimakopoulou, Ian Osband, Benjamin Van Roy (NIPS 2018)

We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on seed sampling (Dimakopoulou and Van Roy, 2018) and randomized value function learning (Osband et al., 2016). We demonstrate that, for simple tabular contexts, the approach is competitive with previously proposed tabular model learning methods (Dimakopoulou and Van Roy, 2018). With a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes.

Coordinated Exploration in Concurrent Reinforcement Learning

Maria Dimakopoulou, Benjamin Van Roy (ICML 2018; long talk)

We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment, while sharing data in real-time. We identify three properties that are essential to efficient coordinated exploration: real-time adaptivity to shared observations, commitment to carry through with action sequences that reveal new information, and diversity across learning opportunities pursued by different agents. We demonstrate that optimism-based approaches fall short with respect to diversity, while naive extensions of Thompson sampling lack commitment. We propose seed sampling that offers a general approach to designing effective coordination algorithms for concurrent reinforcement learning and has substantial advantages over alternative exploration schemes.
[Paper] [Demo] [ICML 2018 Slides] [ICML 2018 Video]

Estimation Considerations in Contextual Bandits

Maria Dimakopoulou, Susan Athey, Guido Imbens

We study a new consideration for the exploration vs. exploitation framework which is that the way exploration is conducted in the present may affect the bias and variance in the potential outcome model estimation in subsequent stages of learning. We show that contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We propose new contextual bandit designs, combining parametric and nonparametric statistical estimation methods with causal inference methods in order to reduce the estimation bias and provide empirical evidence that guides the choice among the alternatives in different scenarios.

Market-based dynamic service mode switching in wireless networks

Maria Dimakopoulou, Nicholas Bambos, Martin Valdez-Vivas, John Apostolopoulos (PIMRC 2017)

We consider a virtualized wireless networking architecture, where infrastructure access points of different carriers form a marketplace of resources and bid service deals to a mobile device. At each point in time the mobile evaluates the available service deals and dynamically decides which one to accept and use in the next transmission interval. Its objective is to minimize the long term cumulative service cost and latency cost to transmit packets in its buffer. We develop a model of this architecture, which allows for the formulation and computation of the optimal control for the mobile to accept an offered deal amongst many and switch into the corresponding service mode. The performance of the optimal and low-complexity heuristic controls is probed via simulation.

Reliable and Efficient Performance Monitoring in Linux

Maria Dimakopoulou, Stéphane Eranian, Nectarios Koziris, Nicholas Bambos (Supercomputing 2016)

We address a published eratum in the Performance Monitoring Unit (PMU) of Intel Sandy Bridge, Ivy Bridge and Haswell processors with hyper-threading enabled which causes cross hyper-thread hardware counter corruption and may produce unreliable results. We propose a cache-coherence style protocol, which we implement in the Linux kernel to address the issue by introducing cross hyper-thread dynamic event scheduling. Additionally, we improve event scheduling efficiency by introducing a bipartite graph matching algorithm which optimally schedules events onto hardware counters consistently. The improvements have been contributed to the upstream Linux kernel v4.1.



I defended my PhD thesis!
The committee of my defense was Benjamin Van Roy, Susan Athey, Emma Brunskill, Balaji Prabhakar and Guido Imbens. [Photo]


The paper "Scalable Coordinated Exploration in Concurrent Reinforcement Learning" has been accepted to NIPS 2018.


A new demo has been uploaded showcasing seed sampling with generalization from the paper
"Scalable Coordinated Exploration in Concurrent Reinforcement Learning".


The long talk I gave in ICML 2018 on "Coordinated Exploration in Concurrent Reinforcement Learning". [Slides] [Video]


The slides from my seminar talk at Netflix can be found here.


An animated demo has been uploaded showcasing seed sampling from the paper
"Coordinated Exploration in Concurrent Reinforcement Learning".


From June to September 2018, I will join the Machine Learning group at Microsoft Research NYC.

Get in touch