ARDAVAN PEDRAM

email: Perdavan
at Stanford dot edu

Office: Gates 456

*Research Associate*

Department of Electrical Engineering

Stanford University

I am a member of the Pervasive Prallelism Laboratory (PPL)
working with Professor Kunle Olukotun.

My work on algorithm/architecture codesign of specialized accelerators originated two National Science Foundation Awards in [2012 and 2016] and is a core part of a third one [2014].

I specifically work on hardware-software co-design (algorithm for
architecture) of special purposed accelerators for high performance linear
algebra, machine-learning, and signal processing
applications.

Here you find the poster of my
research that I presented in the Stanford's Science Teaching through ARt
(STAR) exhibition for high school students.

I received my PhD in Computer Engineering from the department of
Electrical and Computer Engineering at The University of Texas at Austin in 2013.

My PhD supervisors are professors Robert
van de Geijn and Andreas Gerstlauer.

Before I joined Stanford, my goal was to integrate the Linear Algebra Processor~(LAP) concept into a parametrized and customizable accelerator design platform that encapsulates the details of architecture design as low as floating-point units and up to high-level algorithmic mappings and optimization. My work on mapping FFTs and Matrix Factorization are example efforts towards this goal.

At Stanford with the help of Professor Mark Horowitz we matured this idea by emphasizing that specialization tends to achieve orders of magnitude better efficiency for applications with high locality or compute intensive applications (see the Dark Memory Paper). Many of these compute intensive applications such as Convolutional Neural Networks (CNNs) share the same building blocks namely the Linear Algebra kernels.

While many algorithms within machine-learning do indeed contain compute intensive operations of the kind that we know how to accelerate efficiently, others have sparse datasets. Sparse computation is much more difficult to handle in the context of traditional accelerator frameworks because of the memory wall; our challenge going forward will therefore include finding novel ways to increase the efficiency of sparse computation and managing the sparsity in hardware in harmony with the dense kernels using intuitions from the work in Elemental and my friend Jack Poulson.

Award:
National Science Foundation (NSF) Grant

**Dissertation: **

“Algorithm/Architecture Codesign of Low Power and High Performance
Linear Algebra Compute Fabrics”

Poster: TCPP Best Poster Award in
IPDPS 2013 conference PhD Forum

Award:
National Science Foundation (NSF) Grant

**Software: **

The Linear
Algebra Processor (LAP) simulator is available under free BSD license.

The cycle accurate simulator engine is functional and performs the
actual computations on the simulated hardware. Therefore, debugging under this
environment is easy.

**Dark Memory: **

Looking at the power dissipation in modern architectures the
memory system contributes well over 50% of the total system power. So, given
Amdahl’s Law, changing the compute engine without improving the memory can only
have a modest (less than 2x) change in energy efficiency.

Large gains in efficiency are only possible if the DRAM and memory
hierarchy are mostly idle. We refer to this desirable state as Dark Memory, and it only occurs for
applications with an extreme form of locality.

**Refereed Publications:**

1-
Yuanfang Li and **Ardavan Pedram**:

The 28th
IEEE International Conference on Application-specific Systems, Architectures
and Processors (ASAP2017).

**Best Paper Award**

2-
Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, **Ardavan Pedram**, Christos Kozyrakis, and Kunle Olukotun:

“
Plasticine: A Reconfigurable Architecture For
Parallel Patterns,”

The 44th International
Symposium on Computer Architecture (ISCA 2017).

3-
**Ardavan Pedram**, Stephen Richardson, Sameh Galal,
Shahar Kvatinsky, and Mark A. Horowitz:

“Dark Memory
and Accelerator-Rich System Optimization in the Dark Silicon Era,”

IEEE Design and Test Magazine April, 2017.

4-
Artem Vasilyev, Nikhil Bhagdikar, **Ardavan Pedram**, Stephen E Richardson, Shahar Kvatinsky, and Mark Horowitz:

“
Evaluating Programmable Architectures for ISP and Computer Vision!,”

The 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016.

5-
Heonjae Ha, **Ardavan Pedram**, Stephen Richardson, Shahar Kvatinsky, and Mark Horowitz:

“
Improving Energy Efficiency of DRAM by Exploiting Half Page Row Access,”

The 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016.

6-
Mochamad
Asri, **Ardavan
Pedram**, Lizy K. John, and Andreas Gerstlauer:

“
Simulator Calibration for Accelerator-Rich Architecture Studies,”

International
Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS 2016).

7-
Song
Han, Xingyu Liu, Huizi Mao,
Jing Pu, **Ardavan
Pedram**, Mark A. Horowitz, and William J. Dally:

“EIE:
Efficient Inference Engine on Compressed Deep Neural Network,”

The 43rd International
Symposium on Computer Architecture (ISCA 2016).

8-
**Ardavan Pedram**, John McCalpin,
and Andreas Gerstlauer:

The Journal of Signal Processing Systems,
Springer, 2014.

9-
**Ardavan Pedram**, Andreas Gerstlauer,
and Robert van de Geijn:

"Algorithm,
Architecture, and Floating-Point Unit Codesign of a Matrix Factorization
Accelerator,"

IEEE Transactions on Computers (TC) Special Section on Computer
Arithmetic, August 2014.

10-
**Ardavan Pedram**, John McCalpin,
and Andreas Gerstlauer:

“Transforming a Linear
Algebra Core to an FFT Accelerator,”

The 24th
IEEE International Conference on Application-specific Systems, Architectures
and Processors (ASAP2013).

11-
**Ardavan Pedram**, Andreas Gerstlauer,
and Robert van de Geijn:

"Floating Point
Architecture Extensions for Optimized Matrix Factorization,"

The 21st IEEE International Symposium on Computer Arithmetic (ARITH21).

12-
**Ardavan Pedram**, Robert van de Geijn,
and Andreas Gerstlauer:

"Codesign
Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures,"

IEEE Transactions on Computers (TC) Special Issue on Energy Efficient Computing, Volume** **61, Issue 12,
Page(s) 1724 – 1736,
December 2012.

13-
**Ardavan Pedram**, Andreas Gerstlauer,
and Robert van de Geijn:

The 24th International Symposium on Computer Architecture and High
Performance Computing (SBAC-PAD 2012).
(Acceptance rate 28%)

14-
**Ardavan Pedram**, Syed Gilani,
Nam Sung Kim, Robert van de Geijn, Mike Schulte,and Andreas Gerstlauer:

__"A
Linear Algebra Core Design For Efﬁcient Level-3 BLAS,"__

The 23rd IEEE
International Conference on Application-specific Systems, Architectures and
Processors (ASAP2012).

15-
**Ardavan Pedram**, Andreas Gerstlauer,
and Robert van de Geijn:

"A
High-performance, Low-power Linear Algebra Core,"

The 22nd IEEE International Conference
on Application-specific Systems, Architectures and Processors (ASAP2011). (Acceptance rate 25%)

16-
**Ardavan Pedram**, David Craven, and Andreas Gerstlauer:

"Modeling
Cache Effects at the Transaction Level,"

International Embedded Systems Symposium (IESS2009).

**Best Paper Runner Up**

17-
**Ardavan Pedram**, Mohammad Reza Jamali,
Caro Lucas, and Syed Mehdi Fakhraie:

"Local
Linear Model Tree (LOLIMOT) Reconfigurable Parallel Hardware,"

Transactions on Engineering, Computing
and Technology, Volume 13, Page(s) 96-101, May 2006.