Danny Bankman

From Murmann Mixed-Signal Group

Jump to: navigation, search

Bankman.png

SBEE, Massachusetts Institute of Technology, 2012
MSEE, Stanford University, 2015
Admitted to Ph.D. Candidacy: 2013-2014
Email: dbankman AT stanford DOT edu

Google Scholar Profile: scholar.google.com/citations

Research: Mixed-Signal Processing for Machine Learning

Motivation

Deep learning has several emerging applications that require very low latency and very high bandwidth, such as conversational agents and augmented reality [1]. These applications have created demand for hardware capable of running deep-learning-based inference on-device, rather than as a cloud-based service. Because deep neural networks with the capacity for practical tasks require millions of parameters and perform billions of arithmetic operations per inference, they can quickly drain a battery running on a general-purpose microprocessor, which is designed for programmability first and energy-efficiency second. My research focuses on custom circuits and microarchitectures for deep-learning-based inference, with energy-efficiency as the first priority.

Microarchitecture

A key challenge in DNN hardware architecture is managing the energy cost of memory access, which can exceed the energy cost of arithmetic operations by an order of magnitude for on-chip memory and three orders of magnitude for off-chip memory [2]. For small-scale deep learning applications with memory-efficient DNN architectures, such as image classification with binarized convolutional neural networks, all memory can be integrated on chip [3]. We have demonstrated a weight-stationary, parallel-processing architecture for binary CNNs (Fig. 1) that overcomes the memory energy bottleneck by integrating all memory on chip and amortizing the energy cost of access across many computations [4, 5]. The remaining energy bottleneck is arithmetic computation.

Mixed-Signal Circuits

In the low SNR regime where neural networks operate, mixed-signal processing can operate at lower energy consumption than digital [6, 7, 8]. In our binary CNN processor, we demonstrated a switched-capacitor neuron array that operates at an order of magnitude lower energy consumption than an equivalent digital neuron array designed with an RTL-to-GDSII flow [9]. The SC neuron (Fig. 2) achieves this energy savings owing to the precise matching between metal-oxide-metal fringe capacitors in the 28 nm technology, which allows a unit capacitor as small as 1 fF to meet BinaryNet's tolerable limit on statistical variation without degrading accuracy. With this unit capacitor size, the SC neuron consumes significantly lower dynamic energy than the equivalent digital neuron. The binary CNN processor has additionally been used to study how bit errors due to memory voltage over-scaling affect top-level CNN accuracy, and for the development of digital circuit techniques that render the CNN robust to the variations exhibited by carbon nanotube FETs [10, 11].


Fig_datapath.png

Fig. 1. Weight-stationary architecture with input reuse and binary-CNN-specific sliced datapath.


Fig_scneu.png

Fig. 2. SC neuron multiplies with XNOR gates, adds using voltage superposition, and resolves its 1-bit output using a voltage comparator.


References

[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, May 2015.

[2] M. Horowitz, "1.1 Computing's energy problem (and what we can do about it)," ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2014, pp. 10-14.

[3] M. Courbariaux and Y. Bengio, “BinaryNet: Training deep neural networks with weights and activations constrained to +1 or -1,” CoRR, vol. abs/1602.02830, 2016.

[4] D. Bankman, L. Yang, B. Moons, M. Verhelst and B. Murmann, "An Always-On 3.8μJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor with All Memory on Chip in 28nm CMOS," ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2018, pp. 222-223.

[5] D. Bankman, L. Yang, B. Moons, M. Verhelst and B. Murmann, "An Always-On 3.8 μJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor with All Memory on Chip in 28 nm CMOS," IEEE J. Solid-State Circuits, vol. 54, no. 1, Jan. 2019.

[6] B. Murmann, D. Bankman, E. Chai, D. Miyashita, and L. Yang, "Mixed-Signal Circuits for Embedded Machine-Learning Applications," Asilomar Conference on Signals, Systems and Computers, Asilomar, CA, Nov. 2015.

[7] D. Bankman and B. Murmann, "Passive charge redistribution digital-to-analogue multiplier," Electronics Letters, vol. 51, no. 5, pp. 386-388, March 5 2015.

[8] D. Bankman and B. Murmann, "An 8-Bit, 16 Input, 3.2 pJ/op Switched-Capacitor Dot Product Circuit in 28-nm FDSOI CMOS," Proc. IEEE Asian Solid-State Circuits Conf., Toyama, Japan, Nov. 2016, pp. 21-24.

[9] B. Moons, D. Bankman, L. Yang, B. Murmann, and M. Verhelst, "BinarEye: An Always-On Energy-Accuracy-Scalable Binary CNN Processor With All Memory On Chip In 28nm CMOS" Proc. CICC, San Diego, CA, Apr. 2018.

[10] L. Yang, D. Bankman, B. Moons, M. Verhelst, and B. Murmann, "Bit Error Tolerance of a CIFAR-10 Binarized Convolutional Neural Network Processor," in Proc. IEEE Int. Symp. Circuits Syst., Florence, Italy, May 2018.

[11] G. Hills, D. Bankman, B. Moons, L. Yang, J. Hillard, A.B. Kahng, R. Park, M. B. Murmann, M. Shulaker, H.-S.P. Wong, and S. Mitra, "TRIG: Hardware Accelerator for Inference-Based Applications and Experimental Demonstration Using Carbon Nanotube FETs," Design Automation Conference (DAC), San Francisco, CA, Jun. 2018.

Personal tools