Course Description

In parallel computing, the current research emphasis is on multi-core processing, but there are a variety of alternative array-type processors that have demonstrated significant potential in different applications. Recent state-of-the-art coprocessors, such as Google's Pixel Visual Core (PVC) and Tensor Processing Unit (TPU), are leveraging these architectures to drastically improve the power/performance ratio, especially for mobile applications. The problem in using these alternatives is the scope of applications that readily map to these processors. This course introduces several parallel architectures, and then focuses on the Single Instruction-Multiple Data (SIMD) massively-parallel processor arrays that offers order-of-magnitude improvements in performance and efficiency over standard multi-core processors, but also present a significant programming challenge. This course explores SIMD programming by means of a hands-on component with weekly programming assignments on a SIMD simulator. This gives the students an exciting first-hand experience of the power of such machines and of how one needs to completely re-think the algorithms to solve known problems when making them SIMD-specific. The course concludes with lectures on another exciting architecture: the dataflow (or Multiple Instruction-Single Data, MISD) machine. These lectures are taught by the CEO and founder of Maxeler Technologies (, a company specialized in MISD/dataflow machines, and include step-by-step hands-on tutorials and a programming assignment on the actual commercial development platform.

Course Staff


Andrea Di Blas

Andrea Di Blas is a computer architect and software engineer at Google, working on the computational architecture of Google Pixel phones of the future and on Ambient Computing machines. Previously, he was briefly a software development engineer at Amazon, and before he was a research scientist at Oracle Labs, working on massively-parallel machines for big data processing. Previously, he was an assistant adjunct professor at the School of Engineering at the University of California, Santa Cruz, where he taught computer architecture and parallel programming. His main interests revolve around parallel programming models and parallel computing architectures, and combinatorial optimization problems on graphs. He received his M.S. in Electrical Engineering and his Ph.D. in Computer Engineering from Politecnico di Torino, Italy.

Oskar Mencer

Oskar Mencer

Oskar Mencer is CEO and Founder of Maxeler Technologies and Professor of Practice at Imperial College London. Two decades ago, Oskar was Member of Technical Staff at the Computing Sciences Center at Bell Labs in Murray Hill, leading the effort in "Stream Computing", and for a summer, HIVIPS scholar at Hitachi Central Research Laboratories in Tokyo. He joined Bell Labs after receiving a PhD from Stanford University. Besides driving Maximum Performance Computing (MPC) at Maxeler, Oskar was Consulting Professor in Geophysics at Stanford. Oskar has been building computer hardware and software since he was 11 years old. At Maxeler Technologies he oversaw the delivery of production systems for mission critical challenges to Chevron, ENI, JP Morgan, Citi, CME Group, Juniper, Amazon Webservices, and the supercomputer centers Juelich (Germany) and Daresbury Labs (UK).

Original course sponsor: Michael J. Flynn

Michael J. Flynn, professor of electrical engineering at Stanford University (now retired), is best known for the SIMD, SISD, MISD, MIMD classification of computer architectures, and for the first detailed discussion of super-scalar design. He was founder and senior consultant to Palyn Associates, a leading computer design company; founder and vice president of American Supercomputers; a partner at Paragon Partners, a venture capital partnership, and chairman emeritus of Maxeler Technologies. Prof. Flynn received the IEEE/ACM Eckert-Mauchly and Harry Goode Memorial Awards in 1992 and 1995, respectively.


David Patterson

David Patterson was a Professor of Computer Science at the University of California at Berkeley, which he joined after graduating from UCLA. He retired after 40 years and became a Distinguished Engineer at Google in 2016. He is working on domain-specific computer architectures for machine learning. He is also Google’s representative and on the Board of Directors of the RISC-V Foundation, whose goal is to make the free and open RISC-V instruction set architecture as popular for hardware as Linux is for operating systems.


  • Week 1: course introduction, Flynn’s Taxonomy, survey of parallel computer architectures, introduction to Kestrel architecture, ISA, and simulator, principles of SIMD programming, first SIMD program.
  • Week 2: debugging, SIMD conditionals and active set, comparisons, parallel sorting algorithms, sorting networks, parallel sorting on SIMD.
  • Week 3: multi­‐precision arithmetic, selection and comparison, SRAM addressing modes, controller scratch register and loop counter, problems on strings, string matching algorithms, edit distance and Smith & Waterman algorithms on SIMD.
  • Week 4: multiplication, multiprecision signed/unsigned multiplication, matrix multiplication on SIMD.
  • Week 5: parallel communication and global reduction, image filters, convolutional image filters, Gaussian filter on SIMD.
  • Week 6: Analysis of parallel performance, speedup and efficiency, Amdhal's law, Gustafson-­Barsis' law, synchronous implementation of Mandelbrot set on SIMD.
  • Week 7: Implementation of asynchronous problems on SIMD, the “SIMD Phase Programming Model”, optimization in SPPM, asynchronous implementation of Mandelbrot set on SIMD.
  • Week 8: analysis of parallel performance, Karp­‐Flatt metric, isoefficiency metric, invited talk.
  • Week 9: MISD and dataflow processing, first dataflow program.
  • Week 10: MISD and dataflow processing, convolutional image filters on a dataflow processor.