Lita Yang

From Murmann Mixed-Signal Group

(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
[[Image:LitaYang.jpg|160px|LitaYang.jpg]]  
[[Image:LitaYang.jpg|160px|LitaYang.jpg]]  
-
BSEE, California Institute of Technology, 2012  
+
BSEE, California Institute of Technology, 2012<br>
-
'''Email''': [mailto:yanglita@stanford.edu yanglita AT stanford DOT edu]<br> '''Research''': A Design Space Exploration of Implementing Deep Convolutional Neural Networks in Hardware<br>
+
MSEE, Stanford University, 2015
-
Machine learning algorithms have yielded state-of-the art performance in applications such as object recognition, autonomous systems, and data mining. Unfortunately, practical use of these feature extraction/classification algorithms is limited by time-consuming, power-hungry computations and the large dimensionality of data. To overcome this, we propose implementation in dedicated hardware, which can be orders of magnitude more efficient in terms of speed, energy, and area.<br>
+
Admitted to Ph.D. Candidacy: 2013-2014
-
Current top performing machine learning/computer vision algorithms are deep Convolutional Neural Networks (CNNs), which can achieve a unique combination of high efficiency and broad application scope. Since CNNs are inherently stochastic, we can leverage both the advantages of using analog/mixed-signal circuits for efficient computation and the tolerability of noise in these networks. Due to difficulty in implementing a large CNN in hardware, many researchers have resorted to either small-scale ASICs with little to no scalable potential, or inefficient FPGA/GPU implementations because it would be too time-consuming to build a custom IC to improve efficiency. There is little analysis on why certain hardware implementations would be useful for a particular application and what are the design trade-offs to consider when designing these networks. Our goal is to provide a realistic assessment on the hardware design space (digital vs. analog) for the critical components of this algorithm, while being aware of how these components will fit into the full-scale system.
+
'''Email''': [mailto:yanglita@stanford.edu yanglita AT stanford DOT edu]<br> '''Research''': Energy-efficient, Approximate Memory for Error Tolerant Systems<br>
<br>  
<br>  
-
[[Image:Hardware Design Space.PNG]]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;[[Image:Weighted Sum.PNG]]
+
As transistor scaling is coming to a halt, systems today are becoming more and more power limited. Given recent trends in increasing network sizes and the need to process more data (such as Deep Learning and Big Data applications), the cost to store and move data around in a system can far exceed the computation cost by energy overheads over 80%.
 +
 
 +
<br>Recently, there has been an emergence of interest in the field of Approximate Computing, which explores the performance (accuracy) of an algorithm with reduced precision. Convolutional Neural Networks (ConvNets) are one example of a class of stochastic algorithms which can tolerate reduced precision for little degradation in algorithmic performance. Recent work in hardware accelerators for ConvNets and simulations in fixed-point representation indicate we can use much lower bit precisions than conventional GPU 64/32-bit floating point precision. Since power and area scale with precision (number of bits), this implies we can achieve significant energy/area savings by exploiting the algorithm’s tolerance to noise.
 +
 
 +
<br>We propose to reduce the system energy by exploiting error tolerance of the algorithm using approximate memory and interconnect communication design. From a memory designer’s perspective, this is rarely considered a viable option since most general purpose systems require robust storage and communication. By designing application-specific memory, however, we can achieve orders of magnitude improvement in energy, area, and performance. We propose to further improve the system’s classification performance by embedding known circuit nonidealities (i.e noise and coupling) into the algorithm’s training phase to better model translation from software to hardware. <br><br>
 +
 
 +
<br>
<br>  
<br>  
-
We have targeted the weighted sum function, which is a major computational primitive in feed-forward CNNs, as our primary design objective. Along with designing a highly efficient analog weighted sum, we want to generate weighted sum architectures in digital to 1) provide a fair comparision between the two hardware design choices, and 2) create highly efficient computational units for portions of the algorithm which require higher precision.&nbsp;With this assesment, we plan to develop a mixed-signal framework for interfacing with a full-scale digital system, with which we can better understand how to reduce the energy-intensive memory requirements of the network and optimize wiring for high energy efficiency.&nbsp;
+
<br>

Revision as of 11:55, 8 August 2015

LitaYang.jpg

BSEE, California Institute of Technology, 2012

MSEE, Stanford University, 2015

Admitted to Ph.D. Candidacy: 2013-2014

Email: yanglita AT stanford DOT edu
Research: Energy-efficient, Approximate Memory for Error Tolerant Systems


As transistor scaling is coming to a halt, systems today are becoming more and more power limited. Given recent trends in increasing network sizes and the need to process more data (such as Deep Learning and Big Data applications), the cost to store and move data around in a system can far exceed the computation cost by energy overheads over 80%.


Recently, there has been an emergence of interest in the field of Approximate Computing, which explores the performance (accuracy) of an algorithm with reduced precision. Convolutional Neural Networks (ConvNets) are one example of a class of stochastic algorithms which can tolerate reduced precision for little degradation in algorithmic performance. Recent work in hardware accelerators for ConvNets and simulations in fixed-point representation indicate we can use much lower bit precisions than conventional GPU 64/32-bit floating point precision. Since power and area scale with precision (number of bits), this implies we can achieve significant energy/area savings by exploiting the algorithm’s tolerance to noise.


We propose to reduce the system energy by exploiting error tolerance of the algorithm using approximate memory and interconnect communication design. From a memory designer’s perspective, this is rarely considered a viable option since most general purpose systems require robust storage and communication. By designing application-specific memory, however, we can achieve orders of magnitude improvement in energy, area, and performance. We propose to further improve the system’s classification performance by embedding known circuit nonidealities (i.e noise and coupling) into the algorithm’s training phase to better model translation from software to hardware.




Personal tools