From FarmShare

Jump to: navigation, search



Abinit 7.10.4 non-MPI is available on FarmShare


The below info used:



Parallel Abinit is available on FarmShare. This install uses MPI and ACML. To use it you need to submit a parallel job to the barley cluster.

MPI example on barley cluster

Abinit comes with some sample input files. Here is an excerpt from $ABINITHOME/share/abinit-test/tutoparal/README_dfpt.txt which we will use.

 Second test : BaTiO3 slab (29 atoms), 
 computation of the phonon frequencies at qpt 0.0 0.375 0.0
 This test, with 29 atom, is quite slow, but scales very well.
 There is one preparatory step, before running the DFPT calculation.
 The preparatory step can be run on 16 processors at most with the current
 input file. It might use more processors as well, with the kgb parallelism
 (but the input file has to be modified).
 On 8 processors, the preparatory step is about three hours.
 It generates well-converged wavefunctions. For a quick trial,
 simply set nstep 1   instead of nstep 50 ,
 this will run in about 6 minutes.
 The test case itself is an underconverged calculation of the response with
 respect to one perturbation (atomic displacement). It is underconverged
 because nstep has been set to 10, while more than 30 are needed.
 Moreover, obtaining the interatomic force constants would need computing
 many more perturbations than the present one.
 In any case, the present test case run in about 45 minutes on a 8 core
 Since the number of k points to be kept for the present perturbation is is 8x8x1 with 4 symmetries,
 that is 16, and the number of bands is 120, the perfectly scalable part of the
 test case should have a maximum speed up of 1920.
 From tests for the 8 core case, on a total of 20200 secs, there
 were 305 secs for vtorho3:synchro (sequential) and
 260.460 for inwffil (sequential).
 The latter will not increase with a bigger value of nstep, and for more
 perturbations, while the former will increase proportionally.
 Hence, in the present status, for 8 cores, the sequential part is about 3%,
 leading to a maximum speed-up with respect to sequential, of about 240.
 For a larger test case (bigger nstep, more perturbations), the maximum speed up might
 be twice bigger.
 Preparatory step 1
 (mpirun ...)  abinit < tdfpt_03.files > tdfpt_03.log
 cp tdfpt_03.o_WFK tdfpt_04.i_WFK
 cp tdfpt_03.o_WFK tdfpt_04.i_WFQ
 Test case, step 2 (DFPT calculation)
 (mpirun ...)  abinit < tdfpt_04.files > tdfpt_04.log

The lines under Preparatory step 1 and Test case, step2 translates to this job submission script:


#$ -cwd
#$ -S /bin/bash
#$ -N abinittest
#$ -M bishopj@stanford.edu
#$ -m beas
#$ -R y
#$ -l mem_free=1G
#$ -pe orte 8

echo "Got $NSLOTS slots"
echo "jobid $JOB_ID"

awk '{ for (i=0; i < $2; ++i) { print $1} }' $PE_HOSTFILE > $tmphosts

echo "pwd"

echo ""
echo "nslots: $NSLOTS"
echo ""

module load abinit acml

mpirun -np $NSLOTS -machinefile $tmphosts -x LD_LIBRARY_PATH /farmshare/software/free/abinit/7.4.2/bin/abinit < tdfpt_03.files > tdfpt_03.log

cp tdfpt_03.o_WFK tdfpt_04.i_WFK
cp tdfpt_03.o_WFK tdfpt_04.i_WFQ

mpirun -np $NSLOTS -machinefile $tmphosts -x LD_LIBRARY_PATH /farmshare/software/free/abinit/7.4.2/bin/abinit < tdfpt_04.files > tdfpt_04.log

Here is an example run:

$ module load abinit
$ mkdir abinittest
$ cd abinittest
$ cp -rp $ABINITHOME/share/abinit-test .
$ cd abinit-test/tutoparal/Input/

Save the job submission script above to abinit.submit in this directory.

Now we will submit the job:

$ qsub abinit.submit 
Your job 1143544 ("abinit") has been submitted
bishopj@scorn:~/abinittest/abinit-test/tutoparal/Input$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
1143544 0.39219 abinit     bishopj      r     10/11/2013 22:11:12 raring.q@barley07.Stanford.EDU     8

Wait for a few minutes for job to complete.

$ cat abinit.o1143544
Got 8 slots
jobid 1143544

nslots: 8

Fri Oct 11 22:11:27 PDT 2013
Fri Oct 11 22:11:36 PDT 2013
Fri Oct 11 22:11:50 PDT 2013
Fri Oct 11 22:24:13 PDT 2013

scaling behavior

Looking at the runtimes for 4, 8, and 16 core jobs shows good scaling behavior. These are the times I observed:

  • 4 cores: 40 minutes
  • 8 cores: 13 minutes
  • 16 cores: 7 minutes

To try this, simply replace -pe orte 8 with the desired number of cores in place of 8 in the job submission script

Got 4 slots
jobid 1143471

nslots: 4

Fri Oct 11 23:03:12 PDT 2013
Fri Oct 11 23:22:40 PDT 2013
Fri Oct 11 23:22:53 PDT 2013
Fri Oct 11 23:43:49 PDT 2013

Got 8 slots
jobid 1143469

nslots: 8

Fri Oct 11 22:11:27 PDT 2013
Fri Oct 11 22:11:36 PDT 2013
Fri Oct 11 22:11:50 PDT 2013
Fri Oct 11 22:24:13 PDT 2013

Got 16 slots
jobid 1143470

nslots: 16

Fri Oct 11 22:39:00 PDT 2013
Fri Oct 11 22:39:08 PDT 2013
Fri Oct 11 22:39:21 PDT 2013
Fri Oct 11 22:46:08 PDT 2013
Personal tools