Course Description

In parallel computing, the current research emphasis is on multi-core processing, but there are a variety of alternative array-type processors that have demonstrated significant potential in different applications. Recent state-of-the-art coprocessors, such as Google's Pixel Visual Core (PVC) and Tensor Processing Unit (TPU), are leveraging these architectures to drastically improve the power/performance ratio, especially for mobile applications. The problem in using these alternatives is the scope of applications that readily map to these processors. This course introduces several parallel architectures, and then focuses on the Single Instruction-Multiple Data (SIMD) massively-parallel processor arrays that offers order-of-magnitude improvements in performance and efficiency over standard multi-core processors, but also present a significant programming challenge. This course explores SIMD programming by means of a hands-on component with weekly programming assignments on a SIMD simulator. This gives the students an exciting first-hand experience of the power of such machines and of how one needs to completely re-think the algorithms to solve known problems when making them SIMD-specific. The course concludes with lectures on another exciting architecture: the dataflow (or Multiple Instruction-Single Data, MISD) machine. These lectures are taught by an expert from Maxeler Technologies ( and include step-by-step hands-on tutorials and a programming assignment on the actual commercial development platform. Recently, Maxeler technology has become available to program FPGA accelerators on Amazon Web Services.

Course Staff


Andrea Di Blas

Andrea Di Blas is a software engineer at Google, working on the Pixel Visual Core. Until recently he was briefly a software development engineer at Amazon, and before he was a research scientist at Oracle Labs, working on massively-parallel machines for big data processing. Previously he was an assistant adjunct professor at the School of Engineering at the University of California, Santa Cruz, where he taught computer architecture and parallel programming. His main interests revolve around parallel programming models and parallel computing architectures, and combinatorial optimization problems on graphs. He received his M.S. in Electrical Engineering and his Ph.D. in Computer Engineering from Politecnico di Torino, Italy.

Richard Veitch

Richard Veitch graduated from University of Aberdeen with a degree in Electrical and Electronic Engineering in 2005. For the last 10 years Richard has worked in various roles related to the application of FPGA technology to HPC problems such as speech recognition, digital holography and seismic exploration. Currently Richard is leading the California office of Maxeler Technologies.

Michael J. Flynn

Michael J. Flynn, professor of electrical engineering at Stanford University, is best known for the SIMD, SISD, MISD, MIMD classification of computer architectures, and for the first detailed discussion of super-scalar design. He was founder and senior consultant to Palyn Associates, a leading computer design company; founder and vice president of American Supercomputers; a partner at Paragon Partners, a venture capital partnership, and chairman emeritus of Maxeler Technologies. Prof. Flynn received the IEEE/ACM Eckert-Mauchly and Harry Goode Memorial Awards in 1992 and 1995, respectively.

Guest Speakers

David Patterson

David Patterson was a Professor of Computer Science at the University of California at Berkeley, which he joined after graduating from UCLA. He retired after 40 years and became a Distinguished Engineer at Google in 2016. He is working on domain-specific computer architectures for machine learning. He is also Google’s representative and on the Board of Directors of the RISC-V Foundation, whose goal is to make the free and open RISC-V instruction set architecture as popular for hardware as Linux is for operating systems.

Teaching Assistant

Rakesh Ramesh

Rakesh Ramesh is a PhD Student in Electrical Engineering at Stanford University working in the Stanford MobiSocial Computing Lab under the guidance of Dr. Monica S. Lam. His research interests are broadly in the field of designing end-user programmable systems and developing techniques for semantic understanding of end-user tasks from natural language. He has previous experience in designing architecture solutions for heterogenous memory and cache systems.

Course Schedule

  • Schedule: Tuesday/Thursday 4:30PM - 5:50PM, Hewlett 103
  • TA Office Hours: Monday/Friday 3:30 - 5PM, Packard 104


  • Week 1: course introduction, Flynn’s Taxonomy, survey of parallel computer architectures, introduction to Kestrel architecture, ISA, and simulator, principles of SIMD programming, first SIMD program.
  • Week 2: debugging, SIMD conditionals and active set, comparisons, parallel sorting algorithms, sorting networks, parallel sorting on SIMD.
  • Week 3: multi­‐precision arithmetic, selection and comparison, SRAM addressing modes, controller scratch register and loop counter, problems on strings, string matching algorithms, edit distance and Smith & Waterman algorithms on SIMD.
  • Week 4: multiplication, multiprecision signed/unsigned multiplication, matrix multiplication on SIMD.
  • Week 5: parallel communication and global reduction, image filters, convolutional image filters, Gaussian filter on SIMD.
  • Week 6: Analysis of parallel performance, speedup and efficiency, Amdhal's law, Gustafson-­Barsis' law, synchronous implementation of Mandelbrot set on SIMD.
  • Week 7: Implementation of asynchronous problems on SIMD, the “SIMD Phase Programming Model”, optimization in SPPM, asynchronous implementation of Mandelbrot set on SIMD.
  • Week 8: analysis of parallel performance, Karp­‐Flatt metric, isoefficiency metric, invited talk.
  • Week 9: MISD and dataflow processing, first dataflow program.
  • Week 10: MISD and dataflow processing, convolutional image filters on a dataflow processor.