Rye-GPU

From FarmShare

Revision as of 22:38, 15 September 2013 by Bishopj (Talk | contribs)
Jump to: navigation, search

Contents

Nvidia GPU

Farmshare has GPU's via two systems, rye01 and rye02. You can use these two systems as you would a corn system:

ssh rye01.stanford.edu

or

ssh rye02.stanford.edu

hardware

rye01 and rye02 are Intel CPU systems with 8 GPU's each.

  • 8 core (2x E5620)
  • 48GB Ram
  • 250GB local disk

FarmShare would like to extend special thanks to Jon Pilat, Brian Tempero and Margot Gerritsen for their support.


example 1

An easy first thing to try is to login, load the cuda module and run deviceQuery to see what kind of cuda device it is. Then we run a matrix multiply sample that comes with the cuda toolkit.

$ ssh rye01.stanford.edu
rye01.stanford.edu - Ubuntu 13.04, amd64
8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap
Puppet environment: rec_master; kernel 3.8.0-30-generic (x86_64)
 --*-*- Stanford University Research Computing -*-*--

  _____                    ____  _
 |  ___|_ _ _ __ _ __ ___ / ___|| |__   __ _ _ __ ___
 | |_ / _` | '__| '_ ` _ \\___ \| '_ \ / _` | '__/ _ \
 |  _| (_| | |  | | | | | |___) | | | | (_| | | |  __/
 |_|  \__,_|_|  |_| |_| |_|____/|_| |_|\__,_|_|  \___|


    http://farmshare.stanford.edu

###
##
# new to Ubuntu 13.04 Farmshare?
# follow this link to get started:
# https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu13TransitionGuide
##
###

Last login: Sun Sep 15 20:52:27 2013 from scorn.stanford.edu

your cuda device is:
CUDA_VISIBLE_DEVICES=6
device last used: unused


bishopj@rye01:~$ module load cuda
bishopj@rye01:~$ /usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery
/usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla C2070"
  CUDA Driver Version / Runtime Version          5.5 / 5.5
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 5375 MBytes (5636554752 bytes)
  (14) Multiprocessors, ( 32) CUDA Cores/MP:     448 CUDA Cores
  GPU Clock rate:                                1147 MHz (1.15 GHz)
  Memory Clock rate:                             1494 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 786432 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           132 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = Tesla C2070
Result = PASS

bishopj@rye01:~$ /usr/local/cuda/samples/bin/x86_64/linux/release/matrixMulCUBLAS 
[Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "Tesla C2070" with compute capability 2.0

MatrixA(320,640), MatrixB(320,640), MatrixC(320,640)
Computing result using CUBLAS...done.
Performance= 507.76 GFlop/s, Time= 0.258 msec, Size= 131072000 Ops
Computing result using host CPU...done.
Comparing CUBLAS Matrix Multiply with CPU results: PASS

example 2

We can also login, load the cuda module and run the same deviceQuery and matrix multiply sample in matlab

it has its own page, so click MatlabGPUDemo1

Personal tools
Toolbox
LANGUAGES