From Murmann Mixed-Signal Group
BSEE, Stanford University, 2010
MSEE, Stanford University, 2011
Admitted to Ph.D. Candidacy: 2011-2012
Towards Always-On Mobile Object Detection: Energy vs. Performance Tradeoffs for Embedded HOG Feature Extraction
Recent vision applications such as augmented reality and advanced driver assistance systems (ADAS) require real-time object detection. As stated in , energy-efficiency is crucial for both applications due to the limited battery life of mobile devices and the heat dissi-pation limits of automotive systems. To identify promising areas for energy reduction, we must analyze object detection from a system level. We focus on Histograms of Oriented Gradients (HOG) features , which capture localized pixel gradient information, as they are suitable for hardware implementation and have been shown to achieve high object detection performance.
Object detection can be divided into three main steps—image capture, feature-extraction, and detection. Conventionally, these steps are partitioned as shown in Fig. 1 (a). Since the feature-extraction and detection steps performed by the backend digital processor are computationally complex, memory intensive, and highly parallelizable, significant energy savings can be achieved through custom ASIC design. Reference  presents such an ASIC, which performs HOG feature-extraction and detection on 1080HD 60 fps video, and consumes only 45.3 mW (0.36 nJ/pixel). However, the lowest power commercial 1080HD 60 fps image sensor currently consumes nearly twice as much power at 86.7 mW (0.70 nJ/pixel) . Therefore additional system-level energy savings may be achieved by reducing the energy requirements of frontend image capture.
It has been shown in  that typical commercial mobile CMOS image sensors consume over 200 mW of power, of which the two dominant sources are analog-to-digital conversion (70-85%) and chip-to-chip I/O (10-15%). Reducing pixel bitdepth below the conventional 12-bit value and compressing data output by performing partial feature-extraction (histogram generation) on-chip could reduce both ADC and I/O energy in the image capture step. Furthermore, partial on-chip feature extraction could reduce the memory, computation, and energy requirements of the backend digital processor, as shown in Fig. 1. (b) and demonstrated in .
Leveraging the fact that HOG features are based on gradients, we go one step further and propose the pipeline shown in Fig. 1. (c), which uses analog memory to store 3 rows of pixel values and a ratio-to-digital converter (“RDC”) to digitize ratios of neighboring pixels (representing gradients) rather than absolute pixel values. This approach allows scenes of a given dynamic range to be represented with fewer bits per pixel than in a standard imager, further reducing I/O energy, backend computation, and memory requirements, without sacrificing object detection performance.
To simulate the object-detection performance of our proposed system, we have created a database of over 4,000 annotated RAW images, modeled after the PASCAL VOC database . Unlike processed JPEG images, RAW images are composed of 12-bit photosensor outputs, which approximate analog scene illumination levels. We plan to open source this database to the academic community through the Stanford Digital Repository.
Fig. 1. Conventional object detection pipeline (a), generic object detection pipeline with embedded feature extraction (b), and pro-posed object detection pipeline with ratio-to-digital converter (RDC) and embedded feature extraction (c).
 A. Suleiman and V. Sze, "Energy-efficient HOG-based object detection at 1080HD 60 fps with multi-scale support," 2014 IEEE Workshop on Signal Processing Systems (SiPS), Belfast, 2014
 N. Dalal and B. Triggs, “Histogram of Oriented Gradients for Human Detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognition, Jun. 2005, pp. 886–893.
 OmniVision OV2740. http://www.ovt.com/products/sensor.php?id=153
 LiKamWa, et al. “Energy Characterization and Optimization of Image Sensing Toward Continuous Mobile Vision,” Proc. Conf. Mobile Syst. Applicat. and Services, pages 69–82, Jun. 2013.
 J.Choi, J.Cho, S.Park, and E.Yoon, “A 3.4 uW Object-Adaptive CMOS Image Sensor with Embedded Feature Extraction Algorithm for Motion-Triggered Object-of-Interest Imaging,” IEEE J. Solid-State Circuits, vol. 49, no. 1, pp. 289–300, Jan. 2014.
We would like to acknowledge undergraduate researcher David Ta (firstname.lastname@example.org) for his contributions to this project.
Email: alexoz AT stanford DOT edu