MatlabGPUDemo1

From FarmShare

Revision as of 11:24, 28 November 2013 by Bishopj (Talk | contribs)
Jump to: navigation, search

Matlab GPU demos

GPU devices in Matlab are supported by the parallel computing toolbox. No special setup is required. Matlab will discover and use Cuda devices automatically.

Resources:


In this example we will run the Benchmarking A\b on the GPU one found here: [Benchmarking A\b on the GPU]

matlab commands used below:

paralleldemo_gpu_devices
paralleldemo_gpu_backslash(.75);

example output - CLI version

Here we launch Matlab, run paralleldemo_gpu_devices to print out the Cuda device discovered by Matlab. Then we run the A\b demo.

$ ssh rye01.stanford.edu
rye01.stanford.edu - Ubuntu 13.04, amd64
8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap
Puppet environment: rec_master; kernel 3.8.0-30-generic (x86_64)
 --*-*- Stanford University Research Computing -*-*--

  _____                    ____  _
 |  ___|_ _ _ __ _ __ ___ / ___|| |__   __ _ _ __ ___
 | |_ / _` | '__| '_ ` _ \\___ \| '_ \ / _` | '__/ _ \
 |  _| (_| | |  | | | | | |___) | | | | (_| | | |  __/
 |_|  \__,_|_|  |_| |_| |_|____/|_| |_|\__,_|_|  \___|


    http://farmshare.stanford.edu

###
##
# new to Ubuntu 13.04 Farmshare?
# follow this link to get started:
# https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu13TransitionGuide
##
###

Last login: Sun Sep 15 22:01:08 2013 from scorn.stanford.edu

your cuda device is:
CUDA_VISIBLE_DEVICES=0
device last used: Sun Sep 15 21:25:34 2013

bishopj@rye01:~$ module load matlab
bishopj@rye01:~$ matlab -nodesktop
Warning: No display specified.  You will not be able to display graphics on the screen.
Warning: No window system found.  Java option 'MWT' ignored.

                                                           < M A T L A B (R) >
                                                 Copyright 1984-2013 The MathWorks, Inc.
                                                   R2013a (8.1.0.604) 64-bit (glnxa64)
                                                            February 15, 2013

No window system found.  Java option 'MWT' ignored.
 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
>> paralleldemo_gpu_devices

numDevices =

     1


origDevice = 

  CUDADevice with properties:

                      Name: 'GeForce GTX 480'
                     Index: 1
         ComputeCapability: '2.0'
            SupportsDouble: 1
             DriverVersion: 5.5000
            ToolkitVersion: 5
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [65535 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 1.6103e+09
                FreeMemory: 1.5101e+09
       MultiprocessorCount: 15
              ClockRateKHz: 1401000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 0
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1


device = 

  CUDADevice with properties:

                      Name: 'GeForce GTX 480'
                     Index: 1
         ComputeCapability: '2.0'
            SupportsDouble: 1
             DriverVersion: 5.5000
            ToolkitVersion: 5
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [65535 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 1.6103e+09
                FreeMemory: 1.5101e+09
       MultiprocessorCount: 15
              ClockRateKHz: 1401000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 0
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1

>> paralleldemo_gpu_backslash(.75);
Starting benchmarks with 13 different single-precision matrices of sizes
ranging from 1024-by-1024 to 13312-by-13312.
Creating a matrix of size 1024-by-1024.
Gigaflops on CPU: 5.566165
Gigaflops on GPU: 37.670697
Creating a matrix of size 2048-by-2048.
Gigaflops on CPU: 33.638140
Gigaflops on GPU: 143.898457
Creating a matrix of size 3072-by-3072.
Gigaflops on CPU: 40.107724
Gigaflops on GPU: 223.183271
Creating a matrix of size 4096-by-4096.
Gigaflops on CPU: 55.753796
Gigaflops on GPU: 327.146632
Creating a matrix of size 5120-by-5120.
Gigaflops on CPU: 54.888358
Gigaflops on GPU: 292.626007
Creating a matrix of size 6144-by-6144.
Gigaflops on CPU: 72.191110
Gigaflops on GPU: 452.020228
Creating a matrix of size 7168-by-7168.
Gigaflops on CPU: 80.896917
Gigaflops on GPU: 498.172535
Creating a matrix of size 8192-by-8192.
Gigaflops on CPU: 84.840500
Gigaflops on GPU: 506.676184
Creating a matrix of size 9216-by-9216.
Gigaflops on CPU: 68.652257
Gigaflops on GPU: 533.858153
Creating a matrix of size 10240-by-10240.
Gigaflops on CPU: 73.660056
Gigaflops on GPU: 541.269779
Creating a matrix of size 11264-by-11264.
Gigaflops on CPU: 93.310377
Gigaflops on GPU: 560.362334
Creating a matrix of size 12288-by-12288.
Gigaflops on CPU: 89.056557
Gigaflops on GPU: 558.393444
Creating a matrix of size 13312-by-13312.
Gigaflops on CPU: 102.489253
Gigaflops on GPU: 574.326117
Starting benchmarks with 9 different double-precision matrices of sizes
ranging from 1024-by-1024 to 9216-by-9216.
Creating a matrix of size 1024-by-1024.
Gigaflops on CPU: 14.504665
Gigaflops on GPU: 24.855377
Creating a matrix of size 2048-by-2048.
Gigaflops on CPU: 19.376792
Gigaflops on GPU: 74.501813
Creating a matrix of size 3072-by-3072.
Gigaflops on CPU: 29.208044
Gigaflops on GPU: 106.253927
Creating a matrix of size 4096-by-4096.
Gigaflops on CPU: 35.060889
Gigaflops on GPU: 121.734819
Creating a matrix of size 5120-by-5120.
Gigaflops on CPU: 40.079125
Gigaflops on GPU: 133.176539
Creating a matrix of size 6144-by-6144.
Gigaflops on CPU: 43.513209
Gigaflops on GPU: 139.033109
Creating a matrix of size 7168-by-7168.
Gigaflops on CPU: 45.878316
Gigaflops on GPU: 146.538608
Creating a matrix of size 8192-by-8192.
Gigaflops on CPU: 48.424626
Gigaflops on GPU: 147.271608
Creating a matrix of size 9216-by-9216.
Gigaflops on CPU: 45.145666
Gigaflops on GPU: 151.482486


example output - GUI version

If you run the above commands in a VNC session on rye02 it will run and also pop a graph of the speedups

Runmatlabcudarye021.png

Personal tools
Toolbox
LANGUAGES