MatlabGPUDemo1

From FarmShare

(Difference between revisions)
Jump to: navigation, search
 
(13 intermediate revisions not shown)
Line 3: Line 3:
GPU devices in Matlab are supported by the parallel computing toolbox.  No special setup is required.  Matlab will discover and use Cuda devices automatically.
GPU devices in Matlab are supported by the parallel computing toolbox.  No special setup is required.  Matlab will discover and use Cuda devices automatically.
-
Information can be found here: http://www.mathworks.com/products/parallel-computing/index.html
+
Resources:
-
For a list of examples, see: http://www.mathworks.com/products/parallel-computing/examples.html?s_tid=brdcrb
+
*Information can be found here: http://www.mathworks.com/products/parallel-computing/index.html
 +
*For a list of examples, see: http://www.mathworks.com/products/parallel-computing/examples.html?s_tid=brdcrb
 +
*These matlab functions have GPU support: http://www.mathworks.com/help/distcomp/using-gpuarray.html#bsloua3-1
 +
*Example scritpts: http://www.mathworks.com/help/distcomp/examples/index.html#gpu
 +
*matlab file exchange: http://www.mathworks.com/matlabcentral/fileexchange/34080-gpubench
-
Here, we will run the Benchmarking A\b on the GPU one found here: [[http://www.mathworks.com/help/distcomp/examples/benchmarking-a-b-on-the-gpu.html?prodcode=DM&language=en||Benchmarking A\b on the GPU]]
+
In this example we will run the Benchmarking A\b on the GPU one found here in the official doc: [http://www.mathworks.com/help/distcomp/examples/benchmarking-a-b-on-the-gpu.html Benchmarking A\b on the GPU]
 +
matlab commands used below:
 +
<source lang="m">
 +
paralleldemo_gpu_devices
 +
paralleldemo_gpu_backslash(.75);
 +
</source>
-
=== example output ===
+
=== example output - CLI version ===
Here we launch Matlab, run paralleldemo_gpu_devices to print out the Cuda device discovered by Matlab.  Then we run the A\b demo.
Here we launch Matlab, run paralleldemo_gpu_devices to print out the Cuda device discovered by Matlab.  Then we run the A\b demo.
<source lang="sh">
<source lang="sh">
-
$ module load matlab
+
$ ssh rye01.stanford.edu
-
$ matlab -nodesktop
+
rye01.stanford.edu - Ubuntu 13.04, amd64
 +
8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap
 +
Puppet environment: rec_master; kernel 3.8.0-30-generic (x86_64)
 +
--*-*- Stanford University Research Computing -*-*--
 +
 
 +
  _____                    ____  _
 +
|  ___|_ _ _ __ _ __ ___ / ___|| |__  __ _ _ __ ___
 +
| |_ / _` | '__| '_ ` _ \\___ \| '_ \ / _` | '__/ _ \
 +
|  _| (_| | |  | | | | | |___) | | | | (_| | | |  __/
 +
|_|  \__,_|_|  |_| |_| |_|____/|_| |_|\__,_|_|  \___|
 +
 
 +
 
 +
    http://farmshare.stanford.edu
 +
 
 +
###
 +
##
 +
# new to Ubuntu 13.04 Farmshare?
 +
# follow this link to get started:
 +
# https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu13TransitionGuide
 +
##
 +
###
 +
 
 +
Last login: Sun Sep 15 22:01:08 2013 from scorn.stanford.edu
 +
 
 +
your cuda device is:
 +
CUDA_VISIBLE_DEVICES=0
 +
device last used: Sun Sep 15 21:25:34 2013
 +
 
 +
bishopj@rye01:~$ module load matlab
 +
bishopj@rye01:~$ matlab -nodesktop
Warning: No display specified.  You will not be able to display graphics on the screen.
Warning: No display specified.  You will not be able to display graphics on the screen.
Warning: No window system found.  Java option 'MWT' ignored.
Warning: No window system found.  Java option 'MWT' ignored.
-
                            < M A T L A B (R) >
+
                                                          < M A T L A B (R) >
-
                  Copyright 1984-2013 The MathWorks, Inc.
+
                                                Copyright 1984-2013 The MathWorks, Inc.
-
                    R2013a (8.1.0.604) 64-bit (glnxa64)
+
                                                  R2013a (8.1.0.604) 64-bit (glnxa64)
-
                            February 15, 2013
+
                                                            February 15, 2013
No window system found.  Java option 'MWT' ignored.
No window system found.  Java option 'MWT' ignored.
Line 41: Line 79:
   CUDADevice with properties:
   CUDADevice with properties:
-
                       Name: 'Tesla C2070'
+
                       Name: 'GeForce GTX 480'
                     Index: 1
                     Index: 1
         ComputeCapability: '2.0'
         ComputeCapability: '2.0'
Line 52: Line 90:
               MaxGridSize: [65535 65535 65535]
               MaxGridSize: [65535 65535 65535]
                 SIMDWidth: 32
                 SIMDWidth: 32
-
               TotalMemory: 5.6366e+09
+
               TotalMemory: 1.6103e+09
-
                 FreeMemory: 5.5344e+09
+
                 FreeMemory: 1.5101e+09
-
       MultiprocessorCount: 14
+
       MultiprocessorCount: 15
-
               ClockRateKHz: 1147000
+
               ClockRateKHz: 1401000
               ComputeMode: 'Default'
               ComputeMode: 'Default'
       GPUOverlapsTransfers: 1
       GPUOverlapsTransfers: 1
Line 68: Line 106:
   CUDADevice with properties:
   CUDADevice with properties:
-
                       Name: 'Tesla C2070'
+
                       Name: 'GeForce GTX 480'
                     Index: 1
                     Index: 1
         ComputeCapability: '2.0'
         ComputeCapability: '2.0'
Line 79: Line 117:
               MaxGridSize: [65535 65535 65535]
               MaxGridSize: [65535 65535 65535]
                 SIMDWidth: 32
                 SIMDWidth: 32
-
               TotalMemory: 5.6366e+09
+
               TotalMemory: 1.6103e+09
-
                 FreeMemory: 5.5344e+09
+
                 FreeMemory: 1.5101e+09
-
       MultiprocessorCount: 14
+
       MultiprocessorCount: 15
-
               ClockRateKHz: 1147000
+
               ClockRateKHz: 1401000
               ComputeMode: 'Default'
               ComputeMode: 'Default'
       GPUOverlapsTransfers: 1
       GPUOverlapsTransfers: 1
Line 94: Line 132:
ranging from 1024-by-1024 to 13312-by-13312.
ranging from 1024-by-1024 to 13312-by-13312.
Creating a matrix of size 1024-by-1024.
Creating a matrix of size 1024-by-1024.
-
Gigaflops on CPU: 34.472190
+
Gigaflops on CPU: 5.566165
-
Gigaflops on GPU: 56.288799
+
Gigaflops on GPU: 37.670697
Creating a matrix of size 2048-by-2048.
Creating a matrix of size 2048-by-2048.
-
Gigaflops on CPU: 49.891778
+
Gigaflops on CPU: 33.638140
-
Gigaflops on GPU: 106.760173
+
Gigaflops on GPU: 143.898457
Creating a matrix of size 3072-by-3072.
Creating a matrix of size 3072-by-3072.
-
Gigaflops on CPU: 64.997307
+
Gigaflops on CPU: 40.107724
-
Gigaflops on GPU: 197.257665
+
Gigaflops on GPU: 223.183271
Creating a matrix of size 4096-by-4096.
Creating a matrix of size 4096-by-4096.
-
Gigaflops on CPU: 70.944260
+
Gigaflops on CPU: 55.753796
-
Gigaflops on GPU: 266.873255
+
Gigaflops on GPU: 327.146632
Creating a matrix of size 5120-by-5120.
Creating a matrix of size 5120-by-5120.
-
Gigaflops on CPU: 84.640804
+
Gigaflops on CPU: 54.888358
-
Gigaflops on GPU: 319.151358
+
Gigaflops on GPU: 292.626007
Creating a matrix of size 6144-by-6144.
Creating a matrix of size 6144-by-6144.
-
Gigaflops on CPU: 92.799236
+
Gigaflops on CPU: 72.191110
-
Gigaflops on GPU: 355.467871
+
Gigaflops on GPU: 452.020228
Creating a matrix of size 7168-by-7168.
Creating a matrix of size 7168-by-7168.
-
Gigaflops on CPU: 98.141367
+
Gigaflops on CPU: 80.896917
-
Gigaflops on GPU: 388.194551
+
Gigaflops on GPU: 498.172535
Creating a matrix of size 8192-by-8192.
Creating a matrix of size 8192-by-8192.
-
Gigaflops on CPU: 102.462204
+
Gigaflops on CPU: 84.840500
-
Gigaflops on GPU: 405.167131
+
Gigaflops on GPU: 506.676184
Creating a matrix of size 9216-by-9216.
Creating a matrix of size 9216-by-9216.
-
Gigaflops on CPU: 98.400070
+
Gigaflops on CPU: 68.652257
-
Gigaflops on GPU: 419.867571
+
Gigaflops on GPU: 533.858153
Creating a matrix of size 10240-by-10240.
Creating a matrix of size 10240-by-10240.
-
Gigaflops on CPU: 96.734765
+
Gigaflops on CPU: 73.660056
-
Gigaflops on GPU: 434.993371
+
Gigaflops on GPU: 541.269779
Creating a matrix of size 11264-by-11264.
Creating a matrix of size 11264-by-11264.
-
Gigaflops on CPU: 112.294056
+
Gigaflops on CPU: 93.310377
-
Gigaflops on GPU: 439.164558
+
Gigaflops on GPU: 560.362334
Creating a matrix of size 12288-by-12288.
Creating a matrix of size 12288-by-12288.
-
Gigaflops on CPU: 115.434767
+
Gigaflops on CPU: 89.056557
-
Gigaflops on GPU: 440.911860
+
Gigaflops on GPU: 558.393444
Creating a matrix of size 13312-by-13312.
Creating a matrix of size 13312-by-13312.
-
Gigaflops on CPU: 115.826290
+
Gigaflops on CPU: 102.489253
-
Gigaflops on GPU: 460.198654
+
Gigaflops on GPU: 574.326117
Starting benchmarks with 9 different double-precision matrices of sizes
Starting benchmarks with 9 different double-precision matrices of sizes
ranging from 1024-by-1024 to 9216-by-9216.
ranging from 1024-by-1024 to 9216-by-9216.
Creating a matrix of size 1024-by-1024.
Creating a matrix of size 1024-by-1024.
-
Gigaflops on CPU: 14.479196
+
Gigaflops on CPU: 14.504665
-
Gigaflops on GPU: 21.906035
+
Gigaflops on GPU: 24.855377
Creating a matrix of size 2048-by-2048.
Creating a matrix of size 2048-by-2048.
-
Gigaflops on CPU: 27.758668
+
Gigaflops on CPU: 19.376792
-
Gigaflops on GPU: 70.264055
+
Gigaflops on GPU: 74.501813
Creating a matrix of size 3072-by-3072.
Creating a matrix of size 3072-by-3072.
-
Gigaflops on CPU: 35.325472
+
Gigaflops on CPU: 29.208044
-
Gigaflops on GPU: 110.924771
+
Gigaflops on GPU: 106.253927
Creating a matrix of size 4096-by-4096.
Creating a matrix of size 4096-by-4096.
-
Gigaflops on CPU: 41.316066
+
Gigaflops on CPU: 35.060889
-
Gigaflops on GPU: 151.816138
+
Gigaflops on GPU: 121.734819
Creating a matrix of size 5120-by-5120.
Creating a matrix of size 5120-by-5120.
-
Gigaflops on CPU: 47.203079
+
Gigaflops on CPU: 40.079125
-
Gigaflops on GPU: 182.013352
+
Gigaflops on GPU: 133.176539
Creating a matrix of size 6144-by-6144.
Creating a matrix of size 6144-by-6144.
-
Gigaflops on CPU: 50.618165
+
Gigaflops on CPU: 43.513209
-
Gigaflops on GPU: 203.495957
+
Gigaflops on GPU: 139.033109
Creating a matrix of size 7168-by-7168.
Creating a matrix of size 7168-by-7168.
-
Gigaflops on CPU: 53.713014
+
Gigaflops on CPU: 45.878316
-
Gigaflops on GPU: 220.657206
+
Gigaflops on GPU: 146.538608
Creating a matrix of size 8192-by-8192.
Creating a matrix of size 8192-by-8192.
-
Gigaflops on CPU: 54.993392
+
Gigaflops on CPU: 48.424626
-
Gigaflops on GPU: 225.368964
+
Gigaflops on GPU: 147.271608
Creating a matrix of size 9216-by-9216.
Creating a matrix of size 9216-by-9216.
-
Gigaflops on CPU: 56.978938
+
Gigaflops on CPU: 45.145666
-
Gigaflops on GPU: 237.973215
+
Gigaflops on GPU: 151.482486
</source>
</source>
 +
 +
 +
=== example output - GUI version ===
 +
 +
If you run the above commands in a VNC session on rye02 it will run and also pop a graph of the speedups
 +
 +
[[Image:runmatlabcudarye021.png]]

Latest revision as of 23:00, 4 June 2014

Matlab GPU demos

GPU devices in Matlab are supported by the parallel computing toolbox. No special setup is required. Matlab will discover and use Cuda devices automatically.

Resources:


In this example we will run the Benchmarking A\b on the GPU one found here in the official doc: Benchmarking A\b on the GPU

matlab commands used below:

paralleldemo_gpu_devices
paralleldemo_gpu_backslash(.75);

example output - CLI version

Here we launch Matlab, run paralleldemo_gpu_devices to print out the Cuda device discovered by Matlab. Then we run the A\b demo.

$ ssh rye01.stanford.edu
rye01.stanford.edu - Ubuntu 13.04, amd64
8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap
Puppet environment: rec_master; kernel 3.8.0-30-generic (x86_64)
 --*-*- Stanford University Research Computing -*-*--

  _____                    ____  _
 |  ___|_ _ _ __ _ __ ___ / ___|| |__   __ _ _ __ ___
 | |_ / _` | '__| '_ ` _ \\___ \| '_ \ / _` | '__/ _ \
 |  _| (_| | |  | | | | | |___) | | | | (_| | | |  __/
 |_|  \__,_|_|  |_| |_| |_|____/|_| |_|\__,_|_|  \___|


    http://farmshare.stanford.edu

###
##
# new to Ubuntu 13.04 Farmshare?
# follow this link to get started:
# https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu13TransitionGuide
##
###

Last login: Sun Sep 15 22:01:08 2013 from scorn.stanford.edu

your cuda device is:
CUDA_VISIBLE_DEVICES=0
device last used: Sun Sep 15 21:25:34 2013

bishopj@rye01:~$ module load matlab
bishopj@rye01:~$ matlab -nodesktop
Warning: No display specified.  You will not be able to display graphics on the screen.
Warning: No window system found.  Java option 'MWT' ignored.

                                                           < M A T L A B (R) >
                                                 Copyright 1984-2013 The MathWorks, Inc.
                                                   R2013a (8.1.0.604) 64-bit (glnxa64)
                                                            February 15, 2013

No window system found.  Java option 'MWT' ignored.
 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
>> paralleldemo_gpu_devices

numDevices =

     1


origDevice = 

  CUDADevice with properties:

                      Name: 'GeForce GTX 480'
                     Index: 1
         ComputeCapability: '2.0'
            SupportsDouble: 1
             DriverVersion: 5.5000
            ToolkitVersion: 5
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [65535 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 1.6103e+09
                FreeMemory: 1.5101e+09
       MultiprocessorCount: 15
              ClockRateKHz: 1401000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 0
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1


device = 

  CUDADevice with properties:

                      Name: 'GeForce GTX 480'
                     Index: 1
         ComputeCapability: '2.0'
            SupportsDouble: 1
             DriverVersion: 5.5000
            ToolkitVersion: 5
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [65535 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 1.6103e+09
                FreeMemory: 1.5101e+09
       MultiprocessorCount: 15
              ClockRateKHz: 1401000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 0
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1

>> paralleldemo_gpu_backslash(.75);
Starting benchmarks with 13 different single-precision matrices of sizes
ranging from 1024-by-1024 to 13312-by-13312.
Creating a matrix of size 1024-by-1024.
Gigaflops on CPU: 5.566165
Gigaflops on GPU: 37.670697
Creating a matrix of size 2048-by-2048.
Gigaflops on CPU: 33.638140
Gigaflops on GPU: 143.898457
Creating a matrix of size 3072-by-3072.
Gigaflops on CPU: 40.107724
Gigaflops on GPU: 223.183271
Creating a matrix of size 4096-by-4096.
Gigaflops on CPU: 55.753796
Gigaflops on GPU: 327.146632
Creating a matrix of size 5120-by-5120.
Gigaflops on CPU: 54.888358
Gigaflops on GPU: 292.626007
Creating a matrix of size 6144-by-6144.
Gigaflops on CPU: 72.191110
Gigaflops on GPU: 452.020228
Creating a matrix of size 7168-by-7168.
Gigaflops on CPU: 80.896917
Gigaflops on GPU: 498.172535
Creating a matrix of size 8192-by-8192.
Gigaflops on CPU: 84.840500
Gigaflops on GPU: 506.676184
Creating a matrix of size 9216-by-9216.
Gigaflops on CPU: 68.652257
Gigaflops on GPU: 533.858153
Creating a matrix of size 10240-by-10240.
Gigaflops on CPU: 73.660056
Gigaflops on GPU: 541.269779
Creating a matrix of size 11264-by-11264.
Gigaflops on CPU: 93.310377
Gigaflops on GPU: 560.362334
Creating a matrix of size 12288-by-12288.
Gigaflops on CPU: 89.056557
Gigaflops on GPU: 558.393444
Creating a matrix of size 13312-by-13312.
Gigaflops on CPU: 102.489253
Gigaflops on GPU: 574.326117
Starting benchmarks with 9 different double-precision matrices of sizes
ranging from 1024-by-1024 to 9216-by-9216.
Creating a matrix of size 1024-by-1024.
Gigaflops on CPU: 14.504665
Gigaflops on GPU: 24.855377
Creating a matrix of size 2048-by-2048.
Gigaflops on CPU: 19.376792
Gigaflops on GPU: 74.501813
Creating a matrix of size 3072-by-3072.
Gigaflops on CPU: 29.208044
Gigaflops on GPU: 106.253927
Creating a matrix of size 4096-by-4096.
Gigaflops on CPU: 35.060889
Gigaflops on GPU: 121.734819
Creating a matrix of size 5120-by-5120.
Gigaflops on CPU: 40.079125
Gigaflops on GPU: 133.176539
Creating a matrix of size 6144-by-6144.
Gigaflops on CPU: 43.513209
Gigaflops on GPU: 139.033109
Creating a matrix of size 7168-by-7168.
Gigaflops on CPU: 45.878316
Gigaflops on GPU: 146.538608
Creating a matrix of size 8192-by-8192.
Gigaflops on CPU: 48.424626
Gigaflops on GPU: 147.271608
Creating a matrix of size 9216-by-9216.
Gigaflops on CPU: 45.145666
Gigaflops on GPU: 151.482486


example output - GUI version

If you run the above commands in a VNC session on rye02 it will run and also pop a graph of the speedups

Runmatlabcudarye021.png

Personal tools
Toolbox
LANGUAGES