Hardware Accelerators for Training Deep Neural Networks

Ardavan Pedram & Kunle Olukotun
Stanford Univeristy

The 46th International Symposium on Computer Architecture (ISCA-2019)

Sunday Afternoon, June 23rd, 2019;
Room 102A, Phoenix Convention Center, Phoenix, Arizona, USA

Funding for this Tutorial was provided by the National Science Foundation Division of Computing and Communication Foundations under award number 1563113.



In this tutorial we cover the challenges when training deep neural networks. We will cover the architectural techniques used to design accelerators for training systems. Training neural networks has demanding computational and memory complexity compared to inference. Henceforth, several techniques have been exploited by state of the art researchers to overcome these challenges. We will discuss the limitations of current solutions and the approachs to overcome those. We will introduce the problem of training by examining the ubiquitous synchronous stochastic gradient descent and asynchronous variants. We will cover tricks like lowering the precision and their impact on network convergence and performance. We will also cover the challenges with respect to exploiting sparsity specially while training. We look at the problem of scaling DNN training on distributed multiprocessors and its attendant problems of increasing batch size and balancing computation and communication. We then broadly survey the architectures to support training, including special purpose accelerators.

Recommended Background