# MatlabGPUDemo1

### From FarmShare

(Difference between revisions)

Line 11: | Line 11: | ||

- | In this example we will run the Benchmarking A\b on the GPU one found here: | + | In this example we will run the Benchmarking A\b on the GPU one found here in the official doc: [http://www.mathworks.com/help/distcomp/examples/benchmarking-a-b-on-the-gpu.html Benchmarking A\b on the GPU] |

matlab commands used below: | matlab commands used below: |

## Latest revision as of 22:00, 4 June 2014

## Matlab GPU demos

GPU devices in Matlab are supported by the parallel computing toolbox. No special setup is required. Matlab will discover and use Cuda devices automatically.

Resources:

- Information can be found here: http://www.mathworks.com/products/parallel-computing/index.html
- For a list of examples, see: http://www.mathworks.com/products/parallel-computing/examples.html?s_tid=brdcrb
- These matlab functions have GPU support: http://www.mathworks.com/help/distcomp/using-gpuarray.html#bsloua3-1
- Example scritpts: http://www.mathworks.com/help/distcomp/examples/index.html#gpu
- matlab file exchange: http://www.mathworks.com/matlabcentral/fileexchange/34080-gpubench

In this example we will run the Benchmarking A\b on the GPU one found here in the official doc: Benchmarking A\b on the GPU

matlab commands used below:

paralleldemo_gpu_devices paralleldemo_gpu_backslash(.75);

### example output - CLI version

Here we launch Matlab, run paralleldemo_gpu_devices to print out the Cuda device discovered by Matlab. Then we run the A\b demo.

$ ssh rye01.stanford.edu rye01.stanford.edu - Ubuntu 13.04, amd64 8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap Puppet environment: rec_master; kernel 3.8.0-30-generic (x86_64) --*-*- Stanford University Research Computing -*-*-- _____ ____ _ | ___|_ _ _ __ _ __ ___ / ___|| |__ __ _ _ __ ___ | |_ / _` | '__| '_ ` _ \\___ \| '_ \ / _` | '__/ _ \ | _| (_| | | | | | | | |___) | | | | (_| | | | __/ |_| \__,_|_| |_| |_| |_|____/|_| |_|\__,_|_| \___| http://farmshare.stanford.edu ### ## # new to Ubuntu 13.04 Farmshare? # follow this link to get started: # https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu13TransitionGuide ## ### Last login: Sun Sep 15 22:01:08 2013 from scorn.stanford.edu your cuda device is: CUDA_VISIBLE_DEVICES=0 device last used: Sun Sep 15 21:25:34 2013 bishopj@rye01:~$ module load matlab bishopj@rye01:~$ matlab -nodesktop Warning: No display specified. You will not be able to display graphics on the screen. Warning: No window system found. Java option 'MWT' ignored. < M A T L A B (R) > Copyright 1984-2013 The MathWorks, Inc. R2013a (8.1.0.604) 64-bit (glnxa64) February 15, 2013 No window system found. Java option 'MWT' ignored. To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com. >> paralleldemo_gpu_devices numDevices = 1 origDevice = CUDADevice with properties: Name: 'GeForce GTX 480' Index: 1 ComputeCapability: '2.0' SupportsDouble: 1 DriverVersion: 5.5000 ToolkitVersion: 5 MaxThreadsPerBlock: 1024 MaxShmemPerBlock: 49152 MaxThreadBlockSize: [1024 1024 64] MaxGridSize: [65535 65535 65535] SIMDWidth: 32 TotalMemory: 1.6103e+09 FreeMemory: 1.5101e+09 MultiprocessorCount: 15 ClockRateKHz: 1401000 ComputeMode: 'Default' GPUOverlapsTransfers: 1 KernelExecutionTimeout: 0 CanMapHostMemory: 1 DeviceSupported: 1 DeviceSelected: 1 device = CUDADevice with properties: Name: 'GeForce GTX 480' Index: 1 ComputeCapability: '2.0' SupportsDouble: 1 DriverVersion: 5.5000 ToolkitVersion: 5 MaxThreadsPerBlock: 1024 MaxShmemPerBlock: 49152 MaxThreadBlockSize: [1024 1024 64] MaxGridSize: [65535 65535 65535] SIMDWidth: 32 TotalMemory: 1.6103e+09 FreeMemory: 1.5101e+09 MultiprocessorCount: 15 ClockRateKHz: 1401000 ComputeMode: 'Default' GPUOverlapsTransfers: 1 KernelExecutionTimeout: 0 CanMapHostMemory: 1 DeviceSupported: 1 DeviceSelected: 1 >> paralleldemo_gpu_backslash(.75); Starting benchmarks with 13 different single-precision matrices of sizes ranging from 1024-by-1024 to 13312-by-13312. Creating a matrix of size 1024-by-1024. Gigaflops on CPU: 5.566165 Gigaflops on GPU: 37.670697 Creating a matrix of size 2048-by-2048. Gigaflops on CPU: 33.638140 Gigaflops on GPU: 143.898457 Creating a matrix of size 3072-by-3072. Gigaflops on CPU: 40.107724 Gigaflops on GPU: 223.183271 Creating a matrix of size 4096-by-4096. Gigaflops on CPU: 55.753796 Gigaflops on GPU: 327.146632 Creating a matrix of size 5120-by-5120. Gigaflops on CPU: 54.888358 Gigaflops on GPU: 292.626007 Creating a matrix of size 6144-by-6144. Gigaflops on CPU: 72.191110 Gigaflops on GPU: 452.020228 Creating a matrix of size 7168-by-7168. Gigaflops on CPU: 80.896917 Gigaflops on GPU: 498.172535 Creating a matrix of size 8192-by-8192. Gigaflops on CPU: 84.840500 Gigaflops on GPU: 506.676184 Creating a matrix of size 9216-by-9216. Gigaflops on CPU: 68.652257 Gigaflops on GPU: 533.858153 Creating a matrix of size 10240-by-10240. Gigaflops on CPU: 73.660056 Gigaflops on GPU: 541.269779 Creating a matrix of size 11264-by-11264. Gigaflops on CPU: 93.310377 Gigaflops on GPU: 560.362334 Creating a matrix of size 12288-by-12288. Gigaflops on CPU: 89.056557 Gigaflops on GPU: 558.393444 Creating a matrix of size 13312-by-13312. Gigaflops on CPU: 102.489253 Gigaflops on GPU: 574.326117 Starting benchmarks with 9 different double-precision matrices of sizes ranging from 1024-by-1024 to 9216-by-9216. Creating a matrix of size 1024-by-1024. Gigaflops on CPU: 14.504665 Gigaflops on GPU: 24.855377 Creating a matrix of size 2048-by-2048. Gigaflops on CPU: 19.376792 Gigaflops on GPU: 74.501813 Creating a matrix of size 3072-by-3072. Gigaflops on CPU: 29.208044 Gigaflops on GPU: 106.253927 Creating a matrix of size 4096-by-4096. Gigaflops on CPU: 35.060889 Gigaflops on GPU: 121.734819 Creating a matrix of size 5120-by-5120. Gigaflops on CPU: 40.079125 Gigaflops on GPU: 133.176539 Creating a matrix of size 6144-by-6144. Gigaflops on CPU: 43.513209 Gigaflops on GPU: 139.033109 Creating a matrix of size 7168-by-7168. Gigaflops on CPU: 45.878316 Gigaflops on GPU: 146.538608 Creating a matrix of size 8192-by-8192. Gigaflops on CPU: 48.424626 Gigaflops on GPU: 147.271608 Creating a matrix of size 9216-by-9216. Gigaflops on CPU: 45.145666 Gigaflops on GPU: 151.482486

### example output - GUI version

If you run the above commands in a VNC session on rye02 it will run and also pop a graph of the speedups