From Murmann Mixed-Signal Group
SBEE, Massachusetts Institute of Technology, 2012
MSEE, Stanford University, 2015
Admitted to Ph.D. Candidacy: 2013-2014
Email: dbankman AT stanford DOT edu
Research: Charge Domain Signal Processing for Machine Learning
During the past decade, advancements in machine learning have catalyzed an artificial intelligence renaissance. Researchers using the latest parallel computing hardware have trained neural networks surpassing human performance in image classification , among other recent successes. Image classification, speech recognition, and machine translation have reached the accuracy necessary to be deployed as cloud services. As cloud-based artificial intelligence becomes increasingly prevalent on mobile devices, it is foreseeable that latency and battery life will limit applicability. One solution is to move certain algorithms into embedded hardware, thereby eliminating the latency of cloud access and the large energy cost per bit transmission through the cloud.
The deep neural networks achieving state of the art classification accuracy utilize several orders of magnitude more neurons than their predecessors from ten years ago, with the number of neurons approximately doubling every 2.4 years . In order to accommodate this growing complexity while maintaining battery life, it is necessary to design more energy-efficient architectures and circuits for the realization of neural networks. Our research focuses on exploiting the error tolerance, data parallelism, and structural regularity of neural networks in the design of low energy switched-capacitor neuron arrays [3, 4]. We have tested an 8-bit, 16 input switched-capacitor (SC) dot product circuit for use in a three-layer neural network as a proof of concept. The SC dot product achieves 3.2 pJ per dot product operation while classifying images from the MNIST handwritten digit dataset at 98.0% accuracy. The chip demonstrates that in addition to 8-bit quantization, the three-layer neural network can tolerate the systematic nonlinearity and ADC input-referred thermal noise that limit the precision of the SC dot product. Using the switched-capacitor approach, the energy per dot product at large vector dimension N scales as 70 fJ per vector element, assuming that the energy of the 8-bit ADC is a fixed cost. In comparison, the energy per dot product using a digital static CMOS combinational block scales as 270 fJ per vector element . Furthermore, at large N, the area of the SC multiplier array can occupy a narrow column of fixed width, whereas the area of the adder tree expands in two dimensions. Thus, SC neurons can be packed more densely, reducing the per neuron wire energy and memory access energy in a data parallel array of fixed width.
 K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in arXiv preprint: 1512.03385v1, 2015.
 I. Goodfellow, Y. Bengio, and A. Courville, "Deep Learning," book in preparation for MIT Press, 2016.
 D. Bankman and B. Murmann, "Passive charge redistribution digital-to-analogue multiplier," Electronics Letters, vol. 51, no. 5, pp. 386-388, March 5 2015.
 B. Murmann, D. Bankman, E. Chai, D. Miyashita, and L. Yang, "Mixed-Signal Circuits for Embedded Machine-Learning Applications," Asilomar Conference on Signals, Systems and Computers, Asilomar, CA, Nov. 2015.
 M. Horowitz, “Computing’s energy problem (and what we can do about it),” in ISSCC Dig. Tech. Papers, 2014, pp. 10–14.