The parallelization of the timeloop of a partial differential
equation solver, or timeparallelism, has in general received little
attention in the literature compared to the parallelization of its
spaceloop. This is because timeparallelism  where the solution
is computed simultaneously at different timeinstances  is harder
to achieve efficiently than spaceparallelism, due to the inherently
sequential nature of the timeintegration process. However,
timeparallelism can be of paramount importance to many applications.
These include those timedependent problems where the underlying
computational models have either very few spatial degrees of freedom
(dof) to enable any significant amount of spaceparallelism, or not
enough spatial dof to efficiently exploit a given large number of
processors. Problems in robotics and protein folding, and more
generally, timedependent problems arising from reducedorder models
usually fall in the first category. Many structural dynamics problems,
even those with hundreds of thousands of dof, can fall in the second
category when massively parallel computations are desired.
Accelerating the timetosolution of the first category of problems is
crucial for realtime or nearrealtime applications, and
timeparallelism can be a key factor for achieving this objective.
With the advent of massively parallel systems with thousands of cores,
timeparallelism also provides a venue for reducing the
timetosolution of fixed size problems of the second category below
the minimum attainable using spaceparallelism alone on such systems.
To achieve the aforementioned objective, the ParallelInTime
Algorithm (or PITA) relies on classical domain decomposition principles
that are usually applied to the spatial component of a PDE. The
timedomain is first divided into timeslices to be processed
independently. This decomposition can also be seen as defining a
coarse timegrid as opposed to the underlying fine timegrid, chosen
such as to allow convergence with the soughtafter accuracy. Starting
with initial approximate seed values defined on the coarse timegrid,
the system evolution is then independently computed on each timeslice
using a classical timestepping integrator. This solution exhibits
discontinuities or jumps at the timeslice boundaries if the initial
guess is not accurate enough. Applying a Newtonlike approach to the
global timedependent equation, a correction function is then computed
so that its values on the coarse timegrid are added to the current
seeds to improve their accuracy. The solution can thus be iteratively
refined until convergence, which can be monitored by the reduction of
the jumps.

While most steps of this iteration process are naturally parallel, the
key challenge is to perform efficiently the sequential correction
computation. For structural dynamics the best strategy consists in
propagating only a relevant part of the jump on the fine timegrid
while retaining an acceptable computational efficiency. Typically
speedup factors of 3 or more can be achieved for both linear and
nonlinear dynamics problems.

Figure 2: Second timeparallel strategy for largescale systems and reducedorder models. 
