This assignment will walk you through the steps to run Regent on the class GCP cluster. After completing this assignment, you should be able to log in to the cluster, write a simple job script, and run Regent on mulitple nodes.
You should have received an email with your username and password. Please log in to the cluster using the password provided and the following command: ssh <your-user-name>@cs315b.regent-lang.org
. You may want to change your password after logging in. You can do so with the passwd
command.
Regent is already installed and ready to run on the cluster. However, if you would like to install it locally on your laptop for easier debugging and development, you can follow the instructions below. If you plan on doing all of your development on the cluster, you can skip this section.
Most of your Macs do not have Clang installed with the necessary header files, so you have to manually install Clang. Go to the LLVM download page and download the pre-built binary for Mac OS X. Clang 3.5.2 works the best, but other versions would also work. Then, uncompress the file and move the created directory wherever you want. Finally, add the sub-directory bin
to your PATH
setting. Here is one possible scenario:
curl http://llvm.org/releases/3.5.2/clang+llvm-3.5.2-x86_64-apple-darwin.tar.xz > clang.tar.xz
tar -Jxvf clang.tar.gz
mv clang+llvm-3.5.2-x86_64-apple-darwin ~/clang-3.5.2
export PATH=$PATH:~/clang-3.5.2/bin # you might add this line to your .bashrc if you want to build Regent multiple times
Once you installed Clang, you can follow the same instructions to build Regent:
git clone -b master https://github.com/StanfordLegion/legion.git
cd legion/language
./install.py
If you recently updated to macOS Sierra and see this error message when you run ./regent.py examples/circuit.rg
:
<buffer>:4:10: fatal error: 'stdio.h' file not found
#include <stdio.h>
^
compilation of included c code failed
stack traceback:
src/terralib.lua:3386: in function 'includecstring'
...
then you have to re-install the command-line tools with xcode-select --install
.
The easiest way is to follow this quickstart: https://github.com/StanfordLegion/legion/blob/master/language/README.md. If this goes wrong, you will probably need to install Clang manually. You can find a pre-built binary and the source code at the LLVM download page. Clang 3.5.2 works the best, but other versions would also work.
There should be an examples
folder in your home directory. This includes all of the examples from lecture 3, as well as another more complicated example called circuit.rg
. The circuit example is the same one you'll find in language/examples
if you clone Legion. You do not have to worry about what this code is doing right now; we will just be using it to demonstrate how to run jobs.
First, try running regent examples/circuit.rg
. This runs the Regent compiler on examples/circuit.rg
and then executes the program. Running Regent like this will execute the program on the login node, so it is useful for debugging but not for doing larger runs. Next, we'll look at how to run the program on multiple compute nodes. Here is an example PBS script to run the circuit example:
#!/bin/bash -l
#PBS -l nodes=1
#PBS -l walltime=00:05:00
#PBS -d .
regent examples/circuit.rg
In this assignment, you just copy and paste these commands to a script file, say run_circuit.sh
. However, you should become comfortable with writing PBS scripts for your future assignments. You can find the complete list of options here: https://linux.die.net/man/1/qsub-torque.
Now, you can submit the job script with this command:
qsub ./run_circuit.sh
Once the job has finished, you will get two output files, run_circuit.sh.o*
and run_circuit.sh.e*
(asterisks would be replaced with your job id), which record the standard output and error stream, respectively. If your job was successful, you should see nothing in run_circuit.sh.e*
and see output like the following in run_circuit.sh.o*
:
circuit settings: loops=5 pieces=4 nodes/piece=4 wires/piece=8 pct_in_piece=80 seed=12345
Circuit memory usage:
Nodes : 16 * 16 bytes = 256 bytes
Wires : 32 * 120 bytes = 3840 bytes
Total 4096 bytes
Starting main simulation loop
...
SUCCESS!
ELAPSED TIME = 0.041 s
GFLOPS = 3.763 GFLOPS
simulation complete - destroying regions
Otherwise, you can probably find the reason of failure in either of the output files.
The regent
command reads the number of nodes from the PBS script and sets up Regent to run the multinode execution correctly. All you have to do is change the node count in the script. See the example below for running on 2 nodes.
#!/bin/bash -l
#PBS -l nodes=2
#PBS -l walltime=00:05:00
#PBS -d .
regent examples/circuit.rg
When you run a Regent program, the runtime is configured with one CPU and 512MB of system memory by default. However, this is often not sufficient because of these reasons:
If the default configuration is not enough for your program, you can change the machine configuration by giving some command-line flags, as in the following example command:
regent.py examples/circuit.rg -ll:cpu 2 -ll:csize 1024
In this command, the runtime is configured with two CPUs (-ll:cpu 2
) and 1024MB of system memory (-ll:csize 1024
).
If you ran Regent on multiple nodes, the command-line flags would configure each of those nodes and not the entire list of nodes. Let's say you wrote this PBS script:
#!/bin/bash -l
#PBS -l nodes=2
#PBS -l walltime=00:05:00
#PBS -d .
regent examples/circuit.rg -ll:cpu 2
This script will launch your program on two nodes, each of which is configured with two CPUs. Therefore, there will be four CPUs in total on which you can launch your tasks.
For the complete list of flags for the machine configuration, please refer to this link: http://legion.stanford.edu/profiling/#machine-configuration.
Legion Prof and Legion Spy are two important tools to visualize Regent program's execution. Legion Prof gives you a profiling result and Legion Spy renders the data dependence structure of the program. To use these tools, you first get the logging output and pass it to the post-processing scripts. Here are the commands to enable the logging:
./regent.py (your regent program) -lg:prof <number-of-nodes> -lg:prof_logfile prof_%.gz
./regent.py (your regent program) -lg:spy -logfile spy_%.log
If the filename contains %
(e.g. prof_%.gz
above), it will be replaced with the node id (e.g. prof_0.gz
, prof_1.gz
, ...). Once you have the log files, now you can pass them to the scripts legion_prof.py
or legion_spy.py
(they are in your path).
These links will give you more information about Legion Prof and Legion Spy:
Regent and Legion have other features and options that this document does not cover. Here are some useful links to find more about Regent and Legion:
Send the following two files to :
run_script.sh.o*
) from a job running the circuit example on two nodes. You can use the script given above.The submission will be examined just to check whether you actually follow the steps in this document and not be graded otherwise.