SFOtoHKGin5min

From FarmShare

(Difference between revisions)

Revision as of 15:34, 8 January 2012

San Francisco to Hong Kong in 5 minutes

Introduction

This is a followon article to CheapFlights, and while the metaphor may be showing stress cracks, please bear with me. In the previous article we made the most of a single threaded, and hence single core program, by taking advantage of the embarrassingly parallel nature of moving the camera. The time taken to render any given frame, however, was completely unchanged from running povray directly on a corn. To break the 15 minutes barrier (for this particular scene file) we need to employee an HPC specific technology - MPI.

MPI stands for Message Passing Interface and is typically used as a library to a compiled language (C/C++/Fortran) or an interpreted (byte compiled actually) language such as Python. In this example we will explore a parallel raytracer called Tachyon which uses MPI as one of the available parallel options. MPI allows Tachyon access to the distributed compute and memory of all cores participating in the job. It is up to the code in Tachyon, then, to decide how to break up the task of rendering a frame such that all of the cores in the job can execute their portion and communicate their results back. Since this architectural decision has been made in the case of Tachyon, we are free to run Tachyon jobs on the barley cluster using as few or many cores as long as the number fits within those constraints. This number becomes the size of our Tachyon job we submit to the scheduler.

Let's explore what kind of speedup Tachyon can achieve on the barley cluster. This cluster is 10 gigabit ethernet connected, which plays an important part, as all of the intermediate processing steps are communicated within all nodes participating in the job.

Executive Summary

We explore the possibilites presented with a parallel raytracer (Tachyon) using OpenMPI (an MPI library). Given a sample scene file, a singe core job (similar to the POVray example) takes 8.8 minutes to render a single frame. Rendering the same scene file using 208 cores as an MPI job takes 3.4 seconds. This is a speedup of 156 times. This is like reducing the flight to Hong Kong from 13hours to 5 minutes (also 156 times).

@@ Line 5: / Line 5: @@
 This is a followon article to [[CheapFlights]], and while the metaphor may be showing stress cracks, please bear with me. In the previous article we made the most of a single threaded, and hence single core program, by taking advantage of the embarrassingly parallel nature of moving the camera. The time taken to render any given frame, however, was completely unchanged from running povray directly on a corn. To break the 15 minutes barrier (for this particular scene file) we need to employee an HPC specific technology - MPI.
-MPI stands for Message Passing Interface and is typically used as a library to a compiled language (C/C++/Fortran) or a interpreted (byte compiled actually) language such as Python. In this example we will explore a parallel raytracer called Tachyon which uses MPI as one of the available parallel options. MPI allows Tachyon access to the distributed compute and memory of all cores participating in the job. It is up to the code in Tachyon, then, to decide how to break up the task of rendering a frame such that all of the cores in the job can execute some set of
+MPI stands for Message Passing Interface and is typically used as a library to a compiled language (C/C++/Fortran) or an interpreted (byte compiled actually) language such as Python. In this example we will explore a parallel raytracer called Tachyon which uses MPI as one of the available parallel options. MPI allows Tachyon access to the distributed compute and memory of all cores participating in the job. It is up to the code in Tachyon, then, to decide how to break up the task of rendering a frame such that all of the cores in the job can execute their portion and communicate their results back. Since this architectural decision has been made in the case of Tachyon, we are free to run Tachyon jobs on the barley cluster using as few or many cores as long as the number fits within those constraints. This number becomes the size of our Tachyon job we submit to the scheduler.
+Let's explore what kind of speedup Tachyon can achieve on the barley cluster. This cluster is 10 gigabit ethernet connected, which plays an important part, as all of the intermediate processing steps are communicated within all nodes participating in the job.
+<br>
+=== Executive Summary  ===
+We explore the possibilites presented with a parallel raytracer (Tachyon) using OpenMPI (an MPI library). Given a sample scene file, a singe core job (similar to the POVray example) takes 8.8 minutes to render a single frame. Rendering the same scene file using 208 cores as an MPI job takes 3.4 seconds. This is a speedup of 156 times. This is like reducing the flight to Hong Kong from 13hours to 5 minutes (also 156 times).
+== Methodology  ==
+=== Assessing Cardinality ===

FarmShare

SFOtoHKGin5min

From FarmShare

Revision as of 15:34, 8 January 2012

Contents

San Francisco to Hong Kong in 5 minutes

Introduction

Executive Summary

Methodology

Assessing Cardinality

Views

Personal tools

search this wiki

Navigation

Search

Toolbox

LANGUAGES

Toolbox