Rye-GPU
From FarmShare
(→hardware) |
(→software) |
||
Line 37: | Line 37: | ||
=== software === | === software === | ||
- | Cuda | + | Cuda is installed along with the toolkit samples (in /usr/local/cuda) |
- | + | ||
== example 1 == | == example 1 == |
Latest revision as of 16:59, 31 May 2015
Contents |
Nvidia GPU
Farmshare has GPU's via two systems, rye01 and rye02. You can use these two systems as you would a corn system:
ssh rye01.stanford.edu
or
ssh rye02.stanford.edu
hardware and software
rye01 and rye02 are Intel CPU systems with following config:
rye01:
- 8 core (2x E5620) cpu
- 48GB ram
- 250GB local disk
- 6x C2070
- Ubuntu 13.10
- CUDA 6.0
rye02:
- 8 core (2x E5620) cpu
- 48GB ram
- 250GB local disk
- 8x GTX 480
- Ubuntu 14.04
- CUDA 7.0
FarmShare would like to extend special thanks to Jon Pilat, Brian Tempero and Margot Gerritsen for their support.
software
Cuda is installed along with the toolkit samples (in /usr/local/cuda)
example 1
An easy first thing to try is to login, load the cudasamples module and run deviceQuery to see what kind of cuda device it is. Then we run a matrix multiply sample that comes with the cuda toolkit.
Welcome to Ubuntu 13.10 (GNU/Linux 3.11.0-20-generic x86_64) Linux rye01.stanford.edu x86_64 GNU/Linux rye01.stanford.edu - Ubuntu 13.10, amd64 8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap Puppet environment: rec_master; kernel 3.11.0-20-generic (x86_64) --*-*- Stanford University Research Computing -*-*-- _____ ____ _ | ___|_ _ _ __ _ __ ___ / ___|| |__ __ _ _ __ ___ | |_ / _` | '__| '_ ` _ \\___ \| '_ \ / _` | '__/ _ \ | _| (_| | | | | | | | |___) | | | | (_| | | | __/ |_| \__,_|_| |_| |_| |_|____/|_| |_|\__,_|_| \___| http://farmshare.stanford.edu ### ## # welcome to corn-new # please report any problems to research-computing-support@stanford.edu # # new features: ubuntu 13.10, matlab2013b, matlab2014a, intel c/c++/fortran compilers, cuda 6.0 # Check out https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu1310 # ## ### Last login: Sun May 11 14:31:56 2014 from c-24-130-183-161.hsd1.ca.comcast.net your cuda device is: CUDA_VISIBLE_DEVICES=5 device last used: Tue May 13 09:54:05 2014 bishopj@rye01:~$ module load cuda cudasamples bishopj@rye01:~$ deviceQuery deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Tesla C2070" CUDA Driver Version / Runtime Version 6.0 / 6.0 CUDA Capability Major/Minor version number: 2.0 Total amount of global memory: 5375 MBytes (5636554752 bytes) (14) Multiprocessors, ( 32) CUDA Cores/MP: 448 CUDA Cores GPU Clock rate: 1147 MHz (1.15 GHz) Memory Clock rate: 1494 Mhz Memory Bus Width: 384-bit L2 Cache Size: 786432 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (65535, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 131 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = Tesla C2070 Result = PASS bishopj@rye01:~$ matrixMulCUBLAS [Matrix Multiply CUBLAS] - Starting... GPU Device 0: "Tesla C2070" with compute capability 2.0 MatrixA(320,640), MatrixB(320,640), MatrixC(320,640) Computing result using CUBLAS...done. Performance= 492.46 GFlop/s, Time= 0.266 msec, Size= 131072000 Ops Computing result using host CPU...done. Comparing CUBLAS Matrix Multiply with CPU results: PASS
example 2 - smokeParticles cuda sample program
- login to rye01 or rye02
- run FarmVNC
- module load cuda cudasamples
- cd /usr/local/cuda/samples/bin/x86_64/linux/release
- smokeParticles
example 3 - matlab
We can also login, load the cuda module and run the same deviceQuery and matrix multiply sample in matlab
it has its own page, click MatlabGPUDemo1
example 4 - R
Example of using cuda enabled R library to do Hierarchical Linear Regressions
example 5 - PyMOL
- login to rye02.stanford.edu
- start a VNC session as described in FarmVNC
- from the menu search pymol and then click on it
- from file menu open /farmshare/software/examples/pymol/4GD3.pdb
- click the box to full-size the model and then press the down arrow to start the animation
- I also clicked the (from upper right) S -> surface to draw the surface version (it shows up better here)