R on Xgrid via Gridstuffer

This procedure was used to run redistricting simulations with an R module called BARD (Better Automated ReDistricting). Since the same code needed to be run repeatedly for 2-200 seats and for all US states, a parallel computing environment was used. DISCLAIMER: This assumes that one has some control over the Xgrid agents -- we did not deal with sending R or necessary libraries over the grid, instead we had all the necessary packages installed on all the agents.

1. Adapt the R script to run from command line

To run the simulations "embarrassingly parallel", R is invoked in batch mode from the command line for each single simulation. An R script with the simulation code (redistrict_sim.R) is provided as input. In this particular example, the R script takes the location and name of a shapefile without extension (FL/FLpop) and the number of seats (2) as arguments. The following command would run one simulation with 2 seats from the command line prompt:

CODE:
  1. R CMD BATCH --no-save --no-restore '--args FL/FLpop 2' redistrict_sim.R

Here is the chunk of code in the R script (redistrict_sim.R) to deal with those arguments:

R:
  1. ...
  2. # read in the arguments, this creates a character vector
  3. # set trailingOnly = TRUE to only read the arguments supplied with --args
  4. args <- commandArgs(trailingOnly = TRUE)
  5.  
  6. # use the first argument to point to the location of shapefile
  7. # to read it in with a BARD command
  8. library(BARD)
  9. fl.map <- importBardShape(args[1])
  10.  
  11. # convert the second argument from character to integer and assign
  12. seats <- as.integer(args[2])
  13. # now we do what we need to do
  14. ...

Notes:

  • --args is space delimited, so a single argument cannot contain spaces.
  • R CMD BATCH is not a true batch command, as it does not run in the background unless
    you add an ampersand (&) at the end of the command.
  • no-save means do not save the workspace when quitting R
  • no-restore means do not restore from prior sessions when launching R

2 Submit the jobs via Gridstuffer

Gridstuffer (*) provides a graphical interface to conveniently submit and retrieve multiple jobs to the Xgrid. It runs on Mac only. Basically, Gridstuffer takes a simple text file, also called MetaJob text file, with all the command line instructions and it needs to be told where to send the results. So to recap, what we need is:

  • a text file with the R code
  • shapefiles (or other input files)
  • Gridstuffer
  • a text file with the MetaJob instructions

(*) I am very grateful to Charles Parnot for his help with Gridstuffer in this project.

2.1 Set up a directory structure

This setup is for the local machine, from where you send the jobs to the Xgrid. Here is how I set up my folders. This makes sense for a number of reasons, but you can do it or name them any other way.

- top folder: r_xgrid (contains redistrict_sim.R and R_MetaJob.txt)
--- subfolder results (empty)
--- subfolder input (contains a directory called FL with all the shapefiles)

2.2 Write the instructions for the meta job

The MetaJob (R_MetaJob.txt) for Gridstuffer contains:

CODE:
  1. -in /Users/cengel/Desktop/r_xgrid/input /usr/bin/R CMD BATCH --no-save --no-restore '--args FL/FLpop2 2' redistrict_sim.R
  2. /usr/bin/R CMD BATCH --no-save --no-restore '--args FL/FLpop2 3' redistrict_sim.R
  3. /usr/bin/R CMD BATCH --no-save --no-restore '--args FL/FLpop2 4' redistrict_sim.R
  4. .. etc

Here is an easy way to create that textfile above from the command line (adapt to your settings):

CODE:
  1. echo "-in /Users/cengel/Desktop/r_xgrid/input /usr/bin/R CMD BATCH --no-save --no-restore '--args FL/FLpop2 2' redistrict_sim.R"> R_Meta.txt
  2. for (( i=3; i<=200; i++ )) 
  3.        do  echo "/usr/bin/R CMD BATCH --no-save --no-restore '--args FL/FLpop2 $i' redistrict_sim.R">> R_Meta.txt
  4. done

2.3 Instructions for Gridstuffer

Submit like this:

  1. Start Gridstuffer
  2. Connect to the controller
  3. Add a MetaJob and point to R_Meta.txt via GUI
  4. Point to the results directory via GUI
  5. On the Submission Tab set Limits on grids > Max pending jobs 200 (adjust if you have more)
  6. Hit run
  7. Wait until all your jobs show up as either running or pending under the Jobs Tab (not the Commands Tab)
  8. Shut down the computer or Gridstuffer if so desired and walk away.

Retrieve like this:

  1. Start Gridstuffer
  2. Connect to the controller
  3. Wait for the results to download to your local results directory.

3 About performance

As a ballpark figure, to run one MetaJob for the state of Florida for simulations of 2-199 seats we measured a total processing time of 259.4 h. On our Xgrid we had 28 CPUs available, which reduced time to approximately 9.3 h. The figure below shows the processing times taken from each run per number of seats.

No TweetBacks yet. (Be the first to Tweet this post)

One thought on “R on Xgrid via Gridstuffer”

  1. Hi,

    thanks for the tutorial! It solved some issues I had with running a model on Xgrid with external input files.

    Nice site btw. hope to see some more code snippets soon.

    Cheers,
    Koen

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>