Rye-GPU
From FarmShare
Contents |
Nvidia GPU
Farmshare has GPU's via two systems, rye01 and rye02. You can use these two systems as you would a corn system:
ssh rye01.stanford.edu
or
ssh rye02.stanford.edu
hardware
rye01 and rye02 are Intel CPU systems with 8 GPU's each.
- 8 core (2x E5620)
- 48GB Ram
- 250GB local disk
- 4x C2070 and 4x GTX 480 (rye01)
- 8x GTX 480 (rye02)
FarmShare would like to extend special thanks to Jon Pilat, Brian Tempero and Margot Gerritsen for their support.
software
Cuda 5.5 is installed along with the toolkit samples (in /usr/local/cuda)
example 1
An easy first thing to try is to login, load the cudasamples module and run deviceQuery to see what kind of cuda device it is. Then we run a matrix multiply sample that comes with the cuda toolkit.
$ ssh rye01.stanford.edu rye01.stanford.edu - Ubuntu 13.04, amd64 8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap Puppet environment: rec_master; kernel 3.8.0-30-generic (x86_64) --*-*- Stanford University Research Computing -*-*-- _____ ____ _ | ___|_ _ _ __ _ __ ___ / ___|| |__ __ _ _ __ ___ | |_ / _` | '__| '_ ` _ \\___ \| '_ \ / _` | '__/ _ \ | _| (_| | | | | | | | |___) | | | | (_| | | | __/ |_| \__,_|_| |_| |_| |_|____/|_| |_|\__,_|_| \___| http://farmshare.stanford.edu ### ## # new to Ubuntu 13.04 Farmshare? # follow this link to get started: # https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu13TransitionGuide ## ### Last login: Sun Sep 15 20:52:27 2013 from scorn.stanford.edu your cuda device is: CUDA_VISIBLE_DEVICES=6 device last used: unused bishopj@rye01:~$ module load cuda bishopj@rye01:~$ module load cudasamples bishopj@rye01:~$ deviceQuery /usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Tesla C2070" CUDA Driver Version / Runtime Version 5.5 / 5.5 CUDA Capability Major/Minor version number: 2.0 Total amount of global memory: 5375 MBytes (5636554752 bytes) (14) Multiprocessors, ( 32) CUDA Cores/MP: 448 CUDA Cores GPU Clock rate: 1147 MHz (1.15 GHz) Memory Clock rate: 1494 Mhz Memory Bus Width: 384-bit L2 Cache Size: 786432 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (65535, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 132 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = Tesla C2070 Result = PASS bishopj@rye01:~$ matrixMulCUBLAS [Matrix Multiply CUBLAS] - Starting... GPU Device 0: "Tesla C2070" with compute capability 2.0 MatrixA(320,640), MatrixB(320,640), MatrixC(320,640) Computing result using CUBLAS...done. Performance= 507.76 GFlop/s, Time= 0.258 msec, Size= 131072000 Ops Computing result using host CPU...done. Comparing CUBLAS Matrix Multiply with CPU results: PASS
example 2 - matlab
We can also login, load the cuda module and run the same deviceQuery and matrix multiply sample in matlab
it has its own page, click MatlabGPUDemo1
example 3 - R
Example of using cuda enabled R library to do Hierarchical Linear Regressions
example 4 - PyMOL
- login to rye02.stanford.edu
- start a VNC session as described in FarmVNC
- from the menu search pymol and then click on it
- from file menu open /farmshare/software/examples/pymol/4GD3.pdb
- click the box to full-size the model and then press the down arrow to start the animation
- I also clicked the (from upper right) S -> surface to draw the surface version (it shows up better here)