Rye-GPU

From FarmShare

(Difference between revisions)
Jump to: navigation, search
(software)
 
(17 intermediate revisions not shown)
Line 13: Line 13:
</source>
</source>
-
=== hardware ===
+
=== hardware and software ===
-
rye01 and rye02 are Intel CPU systems with 8 GPU's each.
+
rye01 and rye02 are Intel CPU systems with following config:
-
*8 core (2x E5620)
+
 
-
*48GB Ram
+
rye01:
 +
*8 core (2x E5620) cpu
 +
*48GB ram
*250GB local disk
*250GB local disk
-
*4x C2070 and 4x GTX 480 (rye01)
+
*6x C2070
-
*8x GTX 480 (rye02)
+
*Ubuntu 13.10
 +
*CUDA 6.0
 +
 
 +
rye02:
 +
*8 core (2x E5620) cpu
 +
*48GB ram
 +
*250GB local disk
 +
*8x GTX 480
 +
*Ubuntu 14.04
 +
*CUDA 7.0
FarmShare would like to extend special thanks to Jon Pilat, Brian Tempero and Margot Gerritsen for their support.
FarmShare would like to extend special thanks to Jon Pilat, Brian Tempero and Margot Gerritsen for their support.
 +
=== software ===
 +
 +
Cuda is installed along with the toolkit samples (in /usr/local/cuda)
== example 1 ==
== example 1 ==
-
An easy first thing to try is to login, load the cuda module and run deviceQuery to see what kind of cuda device it is.  Then we run a matrix multiply sample that comes with the cuda toolkit.
+
An easy first thing to try is to login, load the cudasamples module and run deviceQuery to see what kind of cuda device it is.  Then we run a matrix multiply sample that comes with the cuda toolkit.
<source lang="sh">
<source lang="sh">
-
$ ssh rye01.stanford.edu
+
Welcome to Ubuntu 13.10 (GNU/Linux 3.11.0-20-generic x86_64)
-
rye01.stanford.edu - Ubuntu 13.04, amd64
+
Linux rye01.stanford.edu x86_64 GNU/Linux
 +
rye01.stanford.edu - Ubuntu 13.10, amd64
8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap
8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap
-
Puppet environment: rec_master; kernel 3.8.0-30-generic (x86_64)
+
Puppet environment: rec_master; kernel 3.11.0-20-generic (x86_64)
  --*-*- Stanford University Research Computing -*-*--
  --*-*- Stanford University Research Computing -*-*--
Line 47: Line 62:
###
###
##
##
-
# new to Ubuntu 13.04 Farmshare?
+
# welcome to corn-new
-
# follow this link to get started:
+
# please report any problems to research-computing-support@stanford.edu
-
# https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu13TransitionGuide
+
#
 +
# new features: ubuntu 13.10, matlab2013b, matlab2014a, intel c/c++/fortran compilers, cuda 6.0
 +
# Check out https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu1310
 +
#
##
##
###
###
-
Last login: Sun Sep 15 20:52:27 2013 from scorn.stanford.edu
+
Last login: Sun May 11 14:31:56 2014 from c-24-130-183-161.hsd1.ca.comcast.net
your cuda device is:
your cuda device is:
-
CUDA_VISIBLE_DEVICES=6
+
CUDA_VISIBLE_DEVICES=5
-
device last used: unused
+
device last used: Tue May 13 09:54:05 2014
-
 
+
bishopj@rye01:~$ module load cuda cudasamples
-
bishopj@rye01:~$ module load cuda
+
bishopj@rye01:~$ deviceQuery
-
bishopj@rye01:~$ /usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery
+
deviceQuery Starting...
-
/usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery Starting...
+
  CUDA Device Query (Runtime API) version (CUDART static linking)
  CUDA Device Query (Runtime API) version (CUDART static linking)
Line 69: Line 86:
Device 0: "Tesla C2070"
Device 0: "Tesla C2070"
-
   CUDA Driver Version / Runtime Version          5.5 / 5.5
+
   CUDA Driver Version / Runtime Version          6.0 / 6.0
   CUDA Capability Major/Minor version number:    2.0
   CUDA Capability Major/Minor version number:    2.0
   Total amount of global memory:                5375 MBytes (5636554752 bytes)
   Total amount of global memory:                5375 MBytes (5636554752 bytes)
Line 97: Line 114:
   Device has ECC support:                        Enabled
   Device has ECC support:                        Enabled
   Device supports Unified Addressing (UVA):      Yes
   Device supports Unified Addressing (UVA):      Yes
-
   Device PCI Bus ID / PCI location ID:          132 / 0
+
   Device PCI Bus ID / PCI location ID:          131 / 0
   Compute Mode:
   Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
-
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = Tesla C2070
+
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = Tesla C2070
Result = PASS
Result = PASS
-
bishopj@rye01:~$ /usr/local/cuda/samples/bin/x86_64/linux/release/matrixMulCUBLAS  
+
bishopj@rye01:~$ matrixMulCUBLAS  
[Matrix Multiply CUBLAS] - Starting...
[Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "Tesla C2070" with compute capability 2.0
GPU Device 0: "Tesla C2070" with compute capability 2.0
Line 110: Line 127:
MatrixA(320,640), MatrixB(320,640), MatrixC(320,640)
MatrixA(320,640), MatrixB(320,640), MatrixC(320,640)
Computing result using CUBLAS...done.
Computing result using CUBLAS...done.
-
Performance= 507.76 GFlop/s, Time= 0.258 msec, Size= 131072000 Ops
+
Performance= 492.46 GFlop/s, Time= 0.266 msec, Size= 131072000 Ops
Computing result using host CPU...done.
Computing result using host CPU...done.
Comparing CUBLAS Matrix Multiply with CPU results: PASS
Comparing CUBLAS Matrix Multiply with CPU results: PASS
 +
</source>
</source>
-
== example 2 - matlab ==
+
== example 2 - smokeParticles cuda sample program ==
 +
 
 +
[[Image:smokeparticlesstill1.small.png|frame]]
 +
 
 +
* login to rye01 or rye02
 +
* run [[FarmVNC]]
 +
* module load cuda cudasamples
 +
* cd /usr/local/cuda/samples/bin/x86_64/linux/release
 +
* smokeParticles
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
 
 +
== example 3 - matlab ==
We can also login, load the cuda module and run the same deviceQuery and matrix multiply sample in matlab
We can also login, load the cuda module and run the same deviceQuery and matrix multiply sample in matlab
Line 121: Line 163:
it has its own page, click [[MatlabGPUDemo1]]
it has its own page, click [[MatlabGPUDemo1]]
-
== example 3 - R ==
+
== example 4 - R ==
Example of using cuda enabled R library to do [[HierRegressions|Hierarchical Linear Regressions]]
Example of using cuda enabled R library to do [[HierRegressions|Hierarchical Linear Regressions]]
 +
 +
== example 5 - PyMOL ==
 +
 +
[[Image:pymolgiffy2.gif|frame]]
 +
[[Image:pymolgiffy.gif|frame]]
 +
 +
* login to rye02.stanford.edu
 +
* start a VNC session as described in [[FarmVNC]]
 +
* from the menu search pymol and then click on it
 +
* from file menu open /farmshare/software/examples/pymol/4GD3.pdb
 +
* click the box to full-size the model and then press the down arrow to start the animation
 +
* I also clicked the (from upper right) S -> surface to draw the surface version (it shows up better here)

Latest revision as of 16:59, 31 May 2015

Contents

Nvidia GPU

Farmshare has GPU's via two systems, rye01 and rye02. You can use these two systems as you would a corn system:

ssh rye01.stanford.edu

or

ssh rye02.stanford.edu

hardware and software

rye01 and rye02 are Intel CPU systems with following config:

rye01:

  • 8 core (2x E5620) cpu
  • 48GB ram
  • 250GB local disk
  • 6x C2070
  • Ubuntu 13.10
  • CUDA 6.0

rye02:

  • 8 core (2x E5620) cpu
  • 48GB ram
  • 250GB local disk
  • 8x GTX 480
  • Ubuntu 14.04
  • CUDA 7.0

FarmShare would like to extend special thanks to Jon Pilat, Brian Tempero and Margot Gerritsen for their support.

software

Cuda is installed along with the toolkit samples (in /usr/local/cuda)

example 1

An easy first thing to try is to login, load the cudasamples module and run deviceQuery to see what kind of cuda device it is. Then we run a matrix multiply sample that comes with the cuda toolkit.

Welcome to Ubuntu 13.10 (GNU/Linux 3.11.0-20-generic x86_64)
Linux rye01.stanford.edu x86_64 GNU/Linux
rye01.stanford.edu - Ubuntu 13.10, amd64
8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap
Puppet environment: rec_master; kernel 3.11.0-20-generic (x86_64)
 --*-*- Stanford University Research Computing -*-*--

  _____                    ____  _
 |  ___|_ _ _ __ _ __ ___ / ___|| |__   __ _ _ __ ___
 | |_ / _` | '__| '_ ` _ \\___ \| '_ \ / _` | '__/ _ \
 |  _| (_| | |  | | | | | |___) | | | | (_| | | |  __/
 |_|  \__,_|_|  |_| |_| |_|____/|_| |_|\__,_|_|  \___|


    http://farmshare.stanford.edu

###
##
# welcome to corn-new
# please report any problems to research-computing-support@stanford.edu
#
# new features:  ubuntu 13.10, matlab2013b, matlab2014a, intel c/c++/fortran compilers, cuda 6.0
# Check out https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu1310
#
##
###

Last login: Sun May 11 14:31:56 2014 from c-24-130-183-161.hsd1.ca.comcast.net

your cuda device is:
CUDA_VISIBLE_DEVICES=5
device last used: Tue May 13 09:54:05 2014

bishopj@rye01:~$ module load cuda cudasamples
bishopj@rye01:~$ deviceQuery
deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla C2070"
  CUDA Driver Version / Runtime Version          6.0 / 6.0
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 5375 MBytes (5636554752 bytes)
  (14) Multiprocessors, ( 32) CUDA Cores/MP:     448 CUDA Cores
  GPU Clock rate:                                1147 MHz (1.15 GHz)
  Memory Clock rate:                             1494 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 786432 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           131 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 1, Device0 = Tesla C2070
Result = PASS

bishopj@rye01:~$ matrixMulCUBLAS 
[Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "Tesla C2070" with compute capability 2.0

MatrixA(320,640), MatrixB(320,640), MatrixC(320,640)
Computing result using CUBLAS...done.
Performance= 492.46 GFlop/s, Time= 0.266 msec, Size= 131072000 Ops
Computing result using host CPU...done.
Comparing CUBLAS Matrix Multiply with CPU results: PASS

example 2 - smokeParticles cuda sample program

Smokeparticlesstill1.small.png
  • login to rye01 or rye02
  • run FarmVNC
  • module load cuda cudasamples
  • cd /usr/local/cuda/samples/bin/x86_64/linux/release
  • smokeParticles








example 3 - matlab

We can also login, load the cuda module and run the same deviceQuery and matrix multiply sample in matlab

it has its own page, click MatlabGPUDemo1

example 4 - R

Example of using cuda enabled R library to do Hierarchical Linear Regressions

example 5 - PyMOL

Pymolgiffy2.gif
Pymolgiffy.gif
  • login to rye02.stanford.edu
  • start a VNC session as described in FarmVNC
  • from the menu search pymol and then click on it
  • from file menu open /farmshare/software/examples/pymol/4GD3.pdb
  • click the box to full-size the model and then press the down arrow to start the animation
  • I also clicked the (from upper right) S -> surface to draw the surface version (it shows up better here)
Personal tools
Toolbox
LANGUAGES