ARDAVAN PEDRAM

email: Perdavan at Stanford dot edu

Office: Gates 456

 

CV

 

 

 

 

 

Research Associate

Department of Electrical Engineering

Stanford University

 

I am a member of the Pervasive Prallelism Laboratory (PPL) working with Professor Kunle Olukotun.

 

My work on algorithm/architecture codesign of specialized accelerators originated two National Science Foundation Awards in [2012 and 2016] and is a core part of a third one [2014].

I specifically work on hardware-software co-design (algorithm for architecture) of special purposed accelerators for high performance linear algebra, machine-learning, and signal processing applications.

Here you find the poster of my research that I presented in the Stanford's Science Teaching through ARt (STAR) exhibition for high school students.

 

I received my PhD in Computer Engineering from the department of Electrical and Computer Engineering at The University of Texas at Austin in 2013.
My PhD supervisors are professors Robert van de Geijn and Andreas Gerstlauer.

 


 

PRISM Project:

 

Before I joined Stanford, my goal was to integrate the Linear Algebra Processor~(LAP) concept into a parametrized and customizable accelerator design platform that encapsulates the details of architecture design as low as floating-point units and up to high-level algorithmic mappings and optimization. My work on mapping FFTs and Matrix Factorization are example efforts towards this goal.

 

At Stanford with the help of Professor Mark Horowitz we matured this idea by emphasizing that specialization tends to achieve orders of magnitude better efficiency for applications with high locality or compute intensive applications (see the Dark Memory Paper). Many of these compute intensive applications such as Convolutional Neural Networks (CNNs) share the same building blocks namely the Linear Algebra kernels.

 

While many algorithms within machine-learning do indeed contain compute intensive operations of the kind that we know how to accelerate efficiently, others have sparse datasets. Sparse computation is much more difficult to handle in the context of traditional accelerator frameworks because of the memory wall; our challenge going forward will therefore include finding novel ways to increase the efficiency of sparse computation and managing the sparsity in hardware in harmony with the dense kernels using intuitions from the work in Elemental and my friend Jack Poulson.

 

Download the presentation

Award: National Science Foundation (NSF) Grant

 


 

Dissertation:

 

“Algorithm/Architecture Codesign of Low Power and High Performance Linear Algebra Compute Fabrics”

 

Download the PDF

Poster: TCPP Best Poster Award in IPDPS 2013 conference PhD Forum

Award: National Science Foundation (NSF) Grant

 


 

Software:

 

The Linear Algebra Processor (LAP) simulator is available under free BSD license.

The cycle accurate simulator engine is functional and performs the actual computations on the simulated hardware. Therefore, debugging under this environment is easy.

 


 

Dark Memory:

 

Looking at the power dissipation in modern architectures the memory system contributes well over 50% of the total system power. So, given Amdahl’s Law, changing the compute engine without improving the memory can only have a modest (less than 2x) change in energy efficiency.

Large gains in efficiency are only possible if the DRAM and memory hierarchy are mostly idle. We refer to this desirable state as Dark Memory, and it only occurs for applications with an extreme form of locality.

 


 

Refereed Publications:

 

1-      Yuanfang Li and Ardavan Pedram:

CATERPILLAR: Coarse Grain Reconfigurable Architecture for Accelerating the Training of Deep Neural Networks,

The 28th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP2017).

Best Paper Award

 

2-      Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun:

Plasticine: A Reconfigurable Architecture For Parallel Patterns,

The 44th International Symposium on Computer Architecture (ISCA 2017).

 

3-      Ardavan Pedram, Stephen Richardson, Sameh Galal, Shahar Kvatinsky, and Mark A. Horowitz:

Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era,

IEEE Design and Test Magazine April, 2017.

 

4-      Artem Vasilyev, Nikhil Bhagdikar, Ardavan Pedram, Stephen E Richardson, Shahar Kvatinsky, and Mark Horowitz:

Evaluating Programmable Architectures for ISP and Computer Vision!,

The 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016.

 

5-      Heonjae Ha, Ardavan Pedram, Stephen Richardson, Shahar Kvatinsky, and Mark Horowitz:

Improving Energy Efficiency of DRAM by Exploiting Half Page Row Access,

The 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016.

 

6-      Mochamad Asri, Ardavan Pedram, Lizy K. John, and Andreas Gerstlauer:

Simulator Calibration for Accelerator-Rich Architecture Studies,

International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS 2016).

 

7-      Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally:

EIE: Efficient Inference Engine on Compressed Deep Neural Network,

The 43rd International Symposium on Computer Architecture (ISCA 2016).

 

8-      Ardavan Pedram, John McCalpin, and Andreas Gerstlauer:

A Highly Efficient Multicore Floating-Point FFT Architecture Based on Hybrid Linear Algebra/FFT Cores,

The Journal of Signal Processing Systems, Springer, 2014.

 

9-      Ardavan Pedram, Andreas Gerstlauer, and Robert van de Geijn:

"Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator,"

IEEE Transactions on Computers (TC) Special Section on Computer Arithmetic, August 2014.

 

10-      Ardavan Pedram, John McCalpin, and Andreas Gerstlauer:

Transforming a Linear Algebra Core to an FFT Accelerator,”

The 24th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP2013).

 

11-      Ardavan Pedram, Andreas Gerstlauer, and Robert van de Geijn:

"Floating Point Architecture Extensions for Optimized Matrix Factorization,"

The 21st IEEE International Symposium on Computer Arithmetic (ARITH21).

 

12-      Ardavan Pedram, Robert van de Geijn, and Andreas Gerstlauer:

"Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures,"

IEEE Transactions on Computers (TC) Special Issue on Energy Efficient Computing, Volume 61, Issue 12, Page(s) 1724 – 1736, December 2012. 

 

13-      Ardavan Pedram, Andreas Gerstlauer, and Robert van de Geijn:

"On the Efficiency of Register File versus Broadcast Interconnect for Collective Communications in Data-Parallel Hardware Accelerators,"

The 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2012). (Acceptance rate 28%)

 

14-   Ardavan Pedram, Syed Gilani, Nam Sung Kim, Robert van de Geijn, Mike Schulte,and Andreas Gerstlauer:

"A Linear Algebra Core Design For Efficient Level-3 BLAS,"

The 23rd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP2012).

 

15-   Ardavan Pedram, Andreas Gerstlauer, and Robert van de Geijn:

"A High-performance, Low-power Linear Algebra Core,"

The 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP2011). (Acceptance rate 25%)

 

16-   Ardavan Pedram, David Craven, and Andreas Gerstlauer:

"Modeling Cache Effects at the Transaction Level,"

International Embedded Systems Symposium (IESS2009).

Best Paper Runner Up

 

17-   Ardavan Pedram, Mohammad Reza Jamali, Caro Lucas, and Syed Mehdi Fakhraie:

"Local Linear Model Tree (LOLIMOT) Reconfigurable Parallel Hardware,"

Transactions on Engineering, Computing and Technology, Volume 13, Page(s) 96-101, May 2006.

 


 

 

 

Photography