Just this year, deep learning has fueled significant progress in computer vision, speech recognition, and natural language processing. We have seen a computer beat the world champion in Go with help from deep learning, and a single deep learning algorithm learn to recognize two vastly different languages, English and Mandarin. At Baidu, we think that this is just the beginning, and high performance computing is poised to help.
It turns out that deep learning is compute limited, even on the fastest machines in the world. This talk will provide empirical evidence from our Deep Speech work that application level performance (e.g. recognition accuracy) scales with data and compute, transforming some hard AI problems into problems of computational scale.
It will describe the performance characteristics of Baidu's deep learning workloads in detail, focusing on the recurrent neural networks used in Deep Speech as a case study. It will cover challenges to further improving performance, describe techniques that have allowed us to sustain 250 TFLOP/s when training a single model on a cluster of 128 GPUs, and discuss straightforward improvements that are likely to deliver even better performance.
Our three big hammers are improving algorithmic efficiency, building faster and more power efficient processors, and strong scaling training to larger clusters. The talk will conclude with open problems in these areas, and suggest directions for future work.
Download the slides for this presentation in PDF format.
About the speaker:
|Greg Diamos is a senior researcher at Baidu's Silicon Valley AI Lab (SVAIL). Previously he was on the research team at NVIDIA, where he contributed to the Volta GPU. Greg holds a PhD from the Georgia Institute of Technology, where he contributed to the development of the GPU-Ocelot dynamic compiler, which targeted CPUs and GPUs from the same program representation. His PhD thesis pioneered execution models for heterogeneous processors.|