MatlabGPUDemo1

From FarmShare

(Difference between revisions)
Jump to: navigation, search
Line 4: Line 4:
Information can be found here: http://www.mathworks.com/products/parallel-computing/index.html
Information can be found here: http://www.mathworks.com/products/parallel-computing/index.html
-
 
For a list of examples, see: http://www.mathworks.com/products/parallel-computing/examples.html?s_tid=brdcrb
For a list of examples, see: http://www.mathworks.com/products/parallel-computing/examples.html?s_tid=brdcrb

Revision as of 12:49, 8 September 2013

Matlab GPU demos

GPU devices in Matlab are supported by the parallel computing toolbox. No special setup is required. Matlab will discover and use Cuda devices automatically.

Information can be found here: http://www.mathworks.com/products/parallel-computing/index.html For a list of examples, see: http://www.mathworks.com/products/parallel-computing/examples.html?s_tid=brdcrb

Here, we will run the Benchmarking A\b on the GPU one found here: http://www.mathworks.com/help/distcomp/examples/benchmarking-a-b-on-the-gpu.html?prodcode=DM&language=en


example output

Here we launch Matlab, run paralleldemo_gpu_devices to print out the Cuda device discovered by Matlab. Then we run the A\b demo.

$ module load matlab
$ matlab -nodesktop
Warning: No display specified.  You will not be able to display graphics on the screen.
Warning: No window system found.  Java option 'MWT' ignored.

                            < M A T L A B (R) >
                  Copyright 1984-2013 The MathWorks, Inc.
                    R2013a (8.1.0.604) 64-bit (glnxa64)
                             February 15, 2013

No window system found.  Java option 'MWT' ignored.
 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
>> paralleldemo_gpu_devices

numDevices =

     1


origDevice = 

  CUDADevice with properties:

                      Name: 'Tesla C2070'
                     Index: 1
         ComputeCapability: '2.0'
            SupportsDouble: 1
             DriverVersion: 5.5000
            ToolkitVersion: 5
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [65535 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 5.6366e+09
                FreeMemory: 5.5344e+09
       MultiprocessorCount: 14
              ClockRateKHz: 1147000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 0
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1


device = 

  CUDADevice with properties:

                      Name: 'Tesla C2070'
                     Index: 1
         ComputeCapability: '2.0'
            SupportsDouble: 1
             DriverVersion: 5.5000
            ToolkitVersion: 5
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [65535 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 5.6366e+09
                FreeMemory: 5.5344e+09
       MultiprocessorCount: 14
              ClockRateKHz: 1147000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 0
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1

>> paralleldemo_gpu_backslash(.75);
Starting benchmarks with 13 different single-precision matrices of sizes
ranging from 1024-by-1024 to 13312-by-13312.
Creating a matrix of size 1024-by-1024.
Gigaflops on CPU: 34.472190
Gigaflops on GPU: 56.288799
Creating a matrix of size 2048-by-2048.
Gigaflops on CPU: 49.891778
Gigaflops on GPU: 106.760173
Creating a matrix of size 3072-by-3072.
Gigaflops on CPU: 64.997307
Gigaflops on GPU: 197.257665
Creating a matrix of size 4096-by-4096.
Gigaflops on CPU: 70.944260
Gigaflops on GPU: 266.873255
Creating a matrix of size 5120-by-5120.
Gigaflops on CPU: 84.640804
Gigaflops on GPU: 319.151358
Creating a matrix of size 6144-by-6144.
Gigaflops on CPU: 92.799236
Gigaflops on GPU: 355.467871
Creating a matrix of size 7168-by-7168.
Gigaflops on CPU: 98.141367
Gigaflops on GPU: 388.194551
Creating a matrix of size 8192-by-8192.
Gigaflops on CPU: 102.462204
Gigaflops on GPU: 405.167131
Creating a matrix of size 9216-by-9216.
Gigaflops on CPU: 98.400070
Gigaflops on GPU: 419.867571
Creating a matrix of size 10240-by-10240.
Gigaflops on CPU: 96.734765
Gigaflops on GPU: 434.993371
Creating a matrix of size 11264-by-11264.
Gigaflops on CPU: 112.294056
Gigaflops on GPU: 439.164558
Creating a matrix of size 12288-by-12288.
Gigaflops on CPU: 115.434767
Gigaflops on GPU: 440.911860
Creating a matrix of size 13312-by-13312.
Gigaflops on CPU: 115.826290
Gigaflops on GPU: 460.198654
Starting benchmarks with 9 different double-precision matrices of sizes
ranging from 1024-by-1024 to 9216-by-9216.
Creating a matrix of size 1024-by-1024.
Gigaflops on CPU: 14.479196
Gigaflops on GPU: 21.906035
Creating a matrix of size 2048-by-2048.
Gigaflops on CPU: 27.758668
Gigaflops on GPU: 70.264055
Creating a matrix of size 3072-by-3072.
Gigaflops on CPU: 35.325472
Gigaflops on GPU: 110.924771
Creating a matrix of size 4096-by-4096.
Gigaflops on CPU: 41.316066
Gigaflops on GPU: 151.816138
Creating a matrix of size 5120-by-5120.
Gigaflops on CPU: 47.203079
Gigaflops on GPU: 182.013352
Creating a matrix of size 6144-by-6144.
Gigaflops on CPU: 50.618165
Gigaflops on GPU: 203.495957
Creating a matrix of size 7168-by-7168.
Gigaflops on CPU: 53.713014
Gigaflops on GPU: 220.657206
Creating a matrix of size 8192-by-8192.
Gigaflops on CPU: 54.993392
Gigaflops on GPU: 225.368964
Creating a matrix of size 9216-by-9216.
Gigaflops on CPU: 56.978938
Gigaflops on GPU: 237.973215
Personal tools
Toolbox
LANGUAGES