This offering of CS315B will be a course in advanced topics and new paradigms in programming supercomputers, with a focus on modern tasking runtimes. As supercomputers have grown much larger and more complex, tasking as emerged as one of the leading alternatives to current bulk synchronous programming models, with the promise of both higher performance and more productive software development.

The course will consist of lectures on supercomputer architectures, a survey of current standard and alternative programming models, and in-depth lectures on proposals for tasking. Students will be given the opportunity to program in Regent (regent-lang.org), a high-level language for the Legion tasking model. Regent provides a high level of programming abstraction while still enabling programmers to target and efficiently exploit massive supercomputers. There will be a number of lectures on and programming exercises in Regent, and there will also be a course project in which students will write a significant supercomputer application of their own choosing.

The course is open to both computer scientists and computational scientists who are interested in learning about new approaches to programming modern supercomputers. Because it is desirable to have such a mix of students, the course will not assume much background, though good programming skills will be needed to get the most out of the course.


  • Time: Tuesdays and Thursdays, 3:00pm - 4:20pm
  • Location: 200(Lane History Corner) - 202
  • Readings: There is no textbook, but there will be readings assigned from lecture notes and research papers.
  • Assignments: Four short programming assignments and a final project.
  • Exams: There will be no exams.

Tentative Schedule

Date Topic Readings Lecture Notes
9/27 (Tue) Course Introduction Lecture 1
9/29 (Thr) Bulk Synchronous and SPMD Programming Lecture 2
10/4 (Tue) Introduction to Tasking and Regent Programming Regent examples Lecture 3
10/6 (Thr) Structured Regions Regent examples Lecture 4
10/11 (Tue) Unstructured Regions Regent examples Lecture 5
10/13 (Thr) Circuit Example & Mapping Lecture 6
10/18 (Tue) Metaprogramming Regent examples
Terra paper
Lecture 7
10/20 (Thr) I/O Lecture 8
10/25 (Tue) Explicit Parallel Programming Lecture 9
10/27 (Thr) Sequoia Programming the Memory Hierarchy Lecture 10
11/1 (Tue) Charm++ Charm++ paper and tutorial Lecture 11
11/3 (Thr) StarPU Paper 1 and Paper 2 Lecture 12
11/8 (Tue) Project Meetings
11/10 (Thr) Chapel and X10 Chapel paper, X10 paper Lecture 13
11/15 (Tue) OpenMP Lecture 14
11/17 (Thr) Pro Regent Tips & Liszt-Regent Liszt paper Lecture 15
11/22 (Tue) Thanksgiving break
11/24 (Thr) Thanksgiving break
11/29 (Tue) Guest Lecture on Regent (by Elliott Slaughter) Regent paper Lecture 16
12/1 (Thr) Programming Systems for Big Data MapReduce, Spark, TensorFlow Lecture 17
12/6 (Tue) Project presentations
12/8 (Thr) Project presentations


Students are given access to the Certainty cluster for programming assignments. Once the access is granted, it is highly recommended students start assignment 0 early so that students get themselves familiar to the programming environment. Students can also run Regent programs locally on their own machines. For setting up and running Regent, please refer to regent-lang.org.

Title Due Code
Assignment 0: Running a Regent program on Certainty 10/11 (Tue)
Assignment 1: Edge Detection 10/18 (Tue) assignment1.tar.gz
Assignment 2: Parallel Edge Detection 10/25 (Tue) assignment2.tar.gz
Assignment 3: PageRank 11/01 (Tue) assignment3.tar.gz
Assignment 4: Parallel PageRank 11/08 (Tue) assignment4.tar.gz


New: Tips for Successful Projects

Important: students should send a short project proposal (about 2 paragraphs) to cs315b-aut1617-staff@lists.stanford.edu by Wednesday 11/2.

For their final projects, students are expected to write a significant Regent program that can scale up to tens or hundreds of nodes on Certainty. Students are welcome to come up with their own ideas, especially the ones related to their own research. For students who do not have a specific topic in mind, here is a list of some project ideas:

Compressible Navier-Stokes Code (density-based)

Create a code for the solution of the compressible Euler / Navier-Stokes equations, which are statements of mass, momentum, and energy conservation in a fluid, on uniform structured grids (2D or 3D). Use a second-order discretization in space (centered or upwind schemes for the convective term, e.g., JST or Roe, and a centered scheme for the viscous fluxes) and an explicit time integration scheme (e.g., classical fourth-order Runge-Kutta). For an example of this type of fluid solver (and some test cases), see the Soleil-X code.

Incompressible Navier-Stokes Code (pressure-based)

Port a code for solving the incompressible Navier-Stokes equations (Boussinesq approximation) for computing the flow within a thermally driven 2D cavity. The code uses structured, collocated grids with an implicit discretization (SIMPLE fractional step). Time integration is accomplished through a first- or second-order backward difference formula (BDF). An incomplete LU decomposition is used for solving the linear systems, including a pressure Poisson solve. Examples and details can be found in Computational Methods for Fluid Dynamics, Ferziger & Peric.

Supplementary materials

Conjugate Gradient / Krylov Linear Solver

Create a linear solver for symmetric, positive definite matrices using the Conjugate Gradient (CG) method. For example, the poisson equation with a known source term on a uniform 2D mesh can be solved. Other Krylov methods can be considered instead, such as GMRES or Bi-CGSTAB, which handle more general matrix types. Similarly, unstructured meshes could also be considered here, leading to sparse matrices that can be stored in Compressed Sparse Row (CSR) format.

Supplementary materials

Multigrid Poisson Solver

Solve the Poisson equation on a regular 2D grid with a multigrid method. Use a second-order finite difference discretization with Dirichlet boundary conditions. The discretization results in a linear system with a banded matrix structure, which can be solved with classical iterative methods alone or a geometric multigrid method that uses the classical iterative methods as smoothers on the various mesh levels.

Supplementary materials

Discrete Ordinates Solver

Solve the radiative transfer equation on a structured 2D grid using the discrete ordinates method. Implement the source iteration solver with full-domain sweeps. Angular discretization is accomplished using level symmetric quadrature sets or quadrature sets constructed on the fly. Spatial discretization will use the finite volume method with a first- or second-order accurate method (step or diamond differencing).

Supplementary materials

Fast Multipole Method

Implement a fast multipole algorithm in Regent. Fast multipole algorithm is a scalable algorithm for N-body simulation whose time complexity is O(N) and has a wide range of applications including electrostatics, fluid simulations, and astrophysics.

Supplementary materials

Parallel Fast Fourier Transform

Implement parallel fast Fourier transform in Regent. An efficient implementation of parallel FFT has many applications such as dark matter, plasma, and incompressible fluid simulations (to name just a few!). Oddly, the widely used implementations parallelize 3D FFTs in only one dimension, resulting in limited scalability. A 3D FFT implementation that parallelizes in two dimensions should be clean to express in Regent and an interesting computation to map to heterogeneous hardware.

Supplementary materials

Parallel Wavelet Compression

Implement parallel wavelet compression. Wavelet transforms are one of the most popular time-frequency-transformations and are widely used for data compression, especially image compression; notable applications include JPEG 2000 and DjVu. For good resolution on high-frequency terms, the wavelet compression shows a better compression performance for images that have transient signals.

"Two-threshold" Peak Finder

Implement a two-threshold peak finder algorithm for locating "peaks" in images. This algorithm is currently used for processing data produced by the Linac Coherent Light Source (LCLS) at SLAC. Peak finding is a data processing step that locates regions of charge in the image produced by a detector. The most common peak finding algorithm consists of two steps: First, it finds the initial peaks, the pixels whose intensity is above the higher threshold. Then, it searches for neighboring pixels within a given radius whose intensity is above the lower threshold.

Supplementary materials

Characterize the response of an imaging detector


Today’s X-ray free-electron lasers can generate pulses at 120 Hz, so the pixel-array imaging detectors need to also work at that speed. Furthermore since all the photons are detected in 40 fs, we cannot use the more accurate method of counting each photon on each pixel individually, rather we have to compromise and use the “integrating” approach: each pixel has independent circuitry to count electrons, and the sensor material (silicon) develops a negative charge that is proportional to the number of X-ray photons striking the pixel. It is thus absolutely critical for us to know the gain of the detector, that is, the conversion factor between electronic signal in “counts” and number of photons. (signal counts = gain x photon count, in our definition). Unfortunately, knowledge of the gain has been very difficult to acquire. It differs from pixel to pixel within the pixel array, and is a function of temperature and X-ray energy.

Basic method

To calibrate the gain field we use a “flood field” source: somehow we rig it up so that several photons will hit each pixel on each image. We don’t know the number of incident photons, and it is not even uniform over the image, but the flux is slowly varying in space, so we can assume that the true flux at pixel A is very similar to the true flux on each of the surrounding pixels. We are assuming Poisson statistics here — the variance in the photon count is equal to the number of photons. The signals we measure are probably in the range of 20 photons per pixel per image. As stated above the critical complication is that the gain varies from pixel to pixel, so for example, the true gain is probably in the range of 8 counts/photon to 30 counts/photon. So we need to determine a gain for each pixel, not a constant value over the whole detector. Also, critically, for each imaging event the detector electronics imparts a “common mode” offset, or DC-offset, such that a random integer number of counts is added to each pixel value. The common mode offset is a random value for each shot (for example it might have a Gaussian distribution with a mean of zero and a standard deviation of 10) but it is constant across all pixels within the shot. But we need to determine it’s unknown value for each shot. So the unknowns to determine are the gain for each pixel, and the common mode correction for each shot. How do we determine these unknowns from the image data?

Problem scope

The basic sensor is a 370 x 190 pixel array. The entire CSPAD detector consists of 32 of these sensor panels. Each has a separate common mode correction for each image, so they should be treated as separate detectors. Assume that we collect a lot of images, say 1000. In the analysis of the data, we should only accept signals that vary smoothly from pixel to pixel where we can assume that the true photon field is locally constant. In other words, Bragg spots and other edge-type features probably have to be rejected on each image.

Mini-Apps from National Labs

Port one of the mini-apps developed in the national labs. These mini-apps represent an interesting subset of requirements from the real scientific workloads, so it is worth to explore how well the Regent programming model can fit into those requirements. Here are some mini-apps that are interesting to port:

  • LULESH, an unstructured Lagrangian explicit shock hydrodynamics proxy application.
  • CoMD, a classical molecular dynamics proxy application.
  • CloverLeaf, a Lagrangian-Eulerian hydrodynamics benchmark.

Ray Tracer

Write a ray tracer in Regent. Choose on your own how much physical phenomena to capture in the ray tracer, but try to make it better than the accompanied reference in terms of image quality.

Supplementary materials


Alex Aiken, Instructor

Wonchan Lee, TA