Rye-GPU

From FarmShare

Revision as of 15:59, 31 May 2015 by Chekh (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Nvidia GPU

Farmshare has GPU's via two systems, rye01 and rye02. You can use these two systems as you would a corn system:

ssh rye01.stanford.edu

or

ssh rye02.stanford.edu

hardware and software

rye01 and rye02 are Intel CPU systems with following config:

rye01:

  • 8 core (2x E5620) cpu
  • 48GB ram
  • 250GB local disk
  • 6x C2070
  • Ubuntu 13.10
  • CUDA 6.0

rye02:

  • 8 core (2x E5620) cpu
  • 48GB ram
  • 250GB local disk
  • 8x GTX 480
  • Ubuntu 14.04
  • CUDA 7.0

FarmShare would like to extend special thanks to Jon Pilat, Brian Tempero and Margot Gerritsen for their support.

software

Cuda is installed along with the toolkit samples (in /usr/local/cuda)

example 1

An easy first thing to try is to login, load the cudasamples module and run deviceQuery to see what kind of cuda device it is. Then we run a matrix multiply sample that comes with the cuda toolkit.

Welcome to Ubuntu 13.10 (GNU/Linux 3.11.0-20-generic x86_64)
Linux rye01.stanford.edu x86_64 GNU/Linux
rye01.stanford.edu - Ubuntu 13.10, amd64
8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap
Puppet environment: rec_master; kernel 3.11.0-20-generic (x86_64)
 --*-*- Stanford University Research Computing -*-*--

  _____                    ____  _
 |  ___|_ _ _ __ _ __ ___ / ___|| |__   __ _ _ __ ___
 | |_ / _` | '__| '_ ` _ \\___ \| '_ \ / _` | '__/ _ \
 |  _| (_| | |  | | | | | |___) | | | | (_| | | |  __/
 |_|  \__,_|_|  |_| |_| |_|____/|_| |_|\__,_|_|  \___|


    http://farmshare.stanford.edu

###
##
# welcome to corn-new
# please report any problems to research-computing-support@stanford.edu
#
# new features:  ubuntu 13.10, matlab2013b, matlab2014a, intel c/c++/fortran compilers, cuda 6.0
# Check out https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu1310
#
##
###

Last login: Sun May 11 14:31:56 2014 from c-24-130-183-161.hsd1.ca.comcast.net

your cuda device is:
CUDA_VISIBLE_DEVICES=5
device last used: Tue May 13 09:54:05 2014

bishopj@rye01:~$ module load cuda cudasamples
bishopj@rye01:~$ deviceQuery
deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla C2070"
  CUDA Driver Version / Runtime Version          6.0 / 6.0
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 5375 MBytes (5636554752 bytes)
  (14) Multiprocessors, ( 32) CUDA Cores/MP:     448 CUDA Cores
  GPU Clock rate:                                1147 MHz (1.15 GHz)
  Memory Clock rate:                             1494 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 786432 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           131 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = Tesla C2070
Result = PASS

bishopj@rye01:~$ matrixMulCUBLAS 
[Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "Tesla C2070" with compute capability 2.0

MatrixA(320,640), MatrixB(320,640), MatrixC(320,640)
Computing result using CUBLAS...done.
Performance= 492.46 GFlop/s, Time= 0.266 msec, Size= 131072000 Ops
Computing result using host CPU...done.
Comparing CUBLAS Matrix Multiply with CPU results: PASS

example 2 - smokeParticles cuda sample program

Smokeparticlesstill1.small.png
  • login to rye01 or rye02
  • run FarmVNC
  • module load cuda cudasamples
  • cd /usr/local/cuda/samples/bin/x86_64/linux/release
  • smokeParticles








example 3 - matlab

We can also login, load the cuda module and run the same deviceQuery and matrix multiply sample in matlab

it has its own page, click MatlabGPUDemo1

example 4 - R

Example of using cuda enabled R library to do Hierarchical Linear Regressions

example 5 - PyMOL

Pymolgiffy2.gif
Pymolgiffy.gif
  • login to rye02.stanford.edu
  • start a VNC session as described in FarmVNC
  • from the menu search pymol and then click on it
  • from file menu open /farmshare/software/examples/pymol/4GD3.pdb
  • click the box to full-size the model and then press the down arrow to start the animation
  • I also clicked the (from upper right) S -> surface to draw the surface version (it shows up better here)
Personal tools
Toolbox
LANGUAGES