For many high performance applications the alternative to the multicore rack is to use an accelerator assist to each multicore node. There are a number of instances of these accelerators: GPGPU, Specialized processors (E.G.IBM's Cell) and FPGAs.
At Maxeler we've found that the FPGA array technology wins out on performance for most relevant applications. Given the initial area-time-power disadvantage of the FPGA in (say) a custom designed adder this is a surprising result. The sheer magnitude of the available FPGA parallelism overcomes the initial disadvantage.
Using Maxeler's FPGA compiler toolkit, it is now feasible to transform a software application into a data flow graph mapped to an unconstrained systolic array. The array structure can be matched to the applications structure and is not constrained to nearest neighbor communications as the FPGA provides a generalized interconnect.
As an example we consider modeling problems in seismic data processing. In a typical problem we realize a 2000 node systolic array on 2 FPGA's, each node performing an operation each 4 ns.
Errata: Slide numbered 42 (showing the various geophysics algorithms) is courtesy of Bob Clapp (of Stanford Geophysics); not as indicated.
Download the slides from this presentation in PDF format. The Errata above has been incorporated into the PDF version of the slides.
About the speaker:
Michael Flynn is now Professor Emeritus of EE at Stanford. He directed the Architecture and Arithmetic group in CSL for many years.
He is now Senior Adviser and Board Chairman at Maxeler.
Michael J Flynn