MatlabGPUDemo1
From FarmShare
Matlab GPU demos
GPU devices in Matlab are supported by the parallel computing toolbox. No special setup is required. Matlab will discover and use Cuda devices automatically.
Resources:
- Information can be found here: http://www.mathworks.com/products/parallel-computing/index.html
- For a list of examples, see: http://www.mathworks.com/products/parallel-computing/examples.html?s_tid=brdcrb
- These matlab functions have GPU support: http://www.mathworks.com/help/distcomp/using-gpuarray.html#bsloua3-1
- Example scritpts: http://www.mathworks.com/help/distcomp/examples/index.html#gpu
- matlab file exchange: http://www.mathworks.com/matlabcentral/fileexchange/34080-gpubench
In this example we will run the Benchmarking A\b on the GPU one found here in the official doc: Benchmarking A\b on the GPU
matlab commands used below:
paralleldemo_gpu_devices paralleldemo_gpu_backslash(.75);
example output - CLI version
Here we launch Matlab, run paralleldemo_gpu_devices to print out the Cuda device discovered by Matlab. Then we run the A\b demo.
$ ssh rye01.stanford.edu rye01.stanford.edu - Ubuntu 13.04, amd64 8-core Xeon E5620 @ 2.40GHz (FT72-B7015, empty); 47.16GB RAM, 10GB swap Puppet environment: rec_master; kernel 3.8.0-30-generic (x86_64) --*-*- Stanford University Research Computing -*-*-- _____ ____ _ | ___|_ _ _ __ _ __ ___ / ___|| |__ __ _ _ __ ___ | |_ / _` | '__| '_ ` _ \\___ \| '_ \ / _` | '__/ _ \ | _| (_| | | | | | | | |___) | | | | (_| | | | __/ |_| \__,_|_| |_| |_| |_|____/|_| |_|\__,_|_| \___| http://farmshare.stanford.edu ### ## # new to Ubuntu 13.04 Farmshare? # follow this link to get started: # https://www.stanford.edu/group/farmshare/cgi-bin/wiki/index.php/Ubuntu13TransitionGuide ## ### Last login: Sun Sep 15 22:01:08 2013 from scorn.stanford.edu your cuda device is: CUDA_VISIBLE_DEVICES=0 device last used: Sun Sep 15 21:25:34 2013 bishopj@rye01:~$ module load matlab bishopj@rye01:~$ matlab -nodesktop Warning: No display specified. You will not be able to display graphics on the screen. Warning: No window system found. Java option 'MWT' ignored. < M A T L A B (R) > Copyright 1984-2013 The MathWorks, Inc. R2013a (8.1.0.604) 64-bit (glnxa64) February 15, 2013 No window system found. Java option 'MWT' ignored. To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com. >> paralleldemo_gpu_devices numDevices = 1 origDevice = CUDADevice with properties: Name: 'GeForce GTX 480' Index: 1 ComputeCapability: '2.0' SupportsDouble: 1 DriverVersion: 5.5000 ToolkitVersion: 5 MaxThreadsPerBlock: 1024 MaxShmemPerBlock: 49152 MaxThreadBlockSize: [1024 1024 64] MaxGridSize: [65535 65535 65535] SIMDWidth: 32 TotalMemory: 1.6103e+09 FreeMemory: 1.5101e+09 MultiprocessorCount: 15 ClockRateKHz: 1401000 ComputeMode: 'Default' GPUOverlapsTransfers: 1 KernelExecutionTimeout: 0 CanMapHostMemory: 1 DeviceSupported: 1 DeviceSelected: 1 device = CUDADevice with properties: Name: 'GeForce GTX 480' Index: 1 ComputeCapability: '2.0' SupportsDouble: 1 DriverVersion: 5.5000 ToolkitVersion: 5 MaxThreadsPerBlock: 1024 MaxShmemPerBlock: 49152 MaxThreadBlockSize: [1024 1024 64] MaxGridSize: [65535 65535 65535] SIMDWidth: 32 TotalMemory: 1.6103e+09 FreeMemory: 1.5101e+09 MultiprocessorCount: 15 ClockRateKHz: 1401000 ComputeMode: 'Default' GPUOverlapsTransfers: 1 KernelExecutionTimeout: 0 CanMapHostMemory: 1 DeviceSupported: 1 DeviceSelected: 1 >> paralleldemo_gpu_backslash(.75); Starting benchmarks with 13 different single-precision matrices of sizes ranging from 1024-by-1024 to 13312-by-13312. Creating a matrix of size 1024-by-1024. Gigaflops on CPU: 5.566165 Gigaflops on GPU: 37.670697 Creating a matrix of size 2048-by-2048. Gigaflops on CPU: 33.638140 Gigaflops on GPU: 143.898457 Creating a matrix of size 3072-by-3072. Gigaflops on CPU: 40.107724 Gigaflops on GPU: 223.183271 Creating a matrix of size 4096-by-4096. Gigaflops on CPU: 55.753796 Gigaflops on GPU: 327.146632 Creating a matrix of size 5120-by-5120. Gigaflops on CPU: 54.888358 Gigaflops on GPU: 292.626007 Creating a matrix of size 6144-by-6144. Gigaflops on CPU: 72.191110 Gigaflops on GPU: 452.020228 Creating a matrix of size 7168-by-7168. Gigaflops on CPU: 80.896917 Gigaflops on GPU: 498.172535 Creating a matrix of size 8192-by-8192. Gigaflops on CPU: 84.840500 Gigaflops on GPU: 506.676184 Creating a matrix of size 9216-by-9216. Gigaflops on CPU: 68.652257 Gigaflops on GPU: 533.858153 Creating a matrix of size 10240-by-10240. Gigaflops on CPU: 73.660056 Gigaflops on GPU: 541.269779 Creating a matrix of size 11264-by-11264. Gigaflops on CPU: 93.310377 Gigaflops on GPU: 560.362334 Creating a matrix of size 12288-by-12288. Gigaflops on CPU: 89.056557 Gigaflops on GPU: 558.393444 Creating a matrix of size 13312-by-13312. Gigaflops on CPU: 102.489253 Gigaflops on GPU: 574.326117 Starting benchmarks with 9 different double-precision matrices of sizes ranging from 1024-by-1024 to 9216-by-9216. Creating a matrix of size 1024-by-1024. Gigaflops on CPU: 14.504665 Gigaflops on GPU: 24.855377 Creating a matrix of size 2048-by-2048. Gigaflops on CPU: 19.376792 Gigaflops on GPU: 74.501813 Creating a matrix of size 3072-by-3072. Gigaflops on CPU: 29.208044 Gigaflops on GPU: 106.253927 Creating a matrix of size 4096-by-4096. Gigaflops on CPU: 35.060889 Gigaflops on GPU: 121.734819 Creating a matrix of size 5120-by-5120. Gigaflops on CPU: 40.079125 Gigaflops on GPU: 133.176539 Creating a matrix of size 6144-by-6144. Gigaflops on CPU: 43.513209 Gigaflops on GPU: 139.033109 Creating a matrix of size 7168-by-7168. Gigaflops on CPU: 45.878316 Gigaflops on GPU: 146.538608 Creating a matrix of size 8192-by-8192. Gigaflops on CPU: 48.424626 Gigaflops on GPU: 147.271608 Creating a matrix of size 9216-by-9216. Gigaflops on CPU: 45.145666 Gigaflops on GPU: 151.482486
example output - GUI version
If you run the above commands in a VNC session on rye02 it will run and also pop a graph of the speedups