Assignment 2: Playing Pong with Deep Q Learning

Part 1: Question 0-4: Due Date: 2/5 (Fri) 6:00 PM (18:00) PST.
Part 2: Question 5-6: Due Date: 2/12 (Fri) 6:00 PM (18:00) PST.
See course webpage for the late day policy.

In this assignment, you will use Q-learning with function approximators (deep Q-network) to play games! We will use Open AI gym for the Atari environment. Please note that it takes more than 8 hours for Pong to train so please start early.

  • understand the basic Q-learning method.
  • understand the use of experience replay , target network and the selection of hyperparameter tuning.
  • implement and apply a linear approximator on the Pong game.
  • implement and apply the DQN on the Pong game.
  • understand the differences and tradeoffs between these two approximators.

Tasks

There are 2 parts to this assignment: written and code components.

The coding assignment, including some starter code, is available here.

The written homework questions are available here.

If you would like to typeset your solution, we have provided our assignment2 LaTex file here. PLEASE DO NOT REDISTRIBUTE THIS FILE WITH ANYONE OUTSIDE THE CLASS!

Setup

Working remotely on Azure

(highly recommended for training on Pong)

As part of this course, you can use Azure for your assignments. We recommend this route for anyone who is having trouble with installation set-up, or if you would like to use better CPU/GPU resources than you may have locally. Please see the set-up tutorial here for more details.

Working locally

While you are waiting for the Azure subscriptions to be set up, you can get started on all problems except q5 and q6 (Pong).

Note: Please be sure you have Anaconda or Miniconda installed. The following instructions should work on all platforms. If you have any trouble getting set up, please come to office hours and the TAs will be happy to help.

Create conda environment on your local system: replace <your-system> with your system, either mac or windows

    cd starter_code_torch
    conda env create -f cs234-torch-<your-system>.yml
    conda activate cs234-torch
  

Notes

NOTE 1: Please start early!

Submitting your work

Submit both your written assignment and coding assignment by following the Submission Instructions.

Reference papers

  • Human-level control through deep reinforcement learning, V. Mnih et al., Nature, 2015.
  • Playing Atari with Deep Reinforcement Learning, V. Mnih et al., NIPS Workshop, 2013.