MatlabGPUDemo1
From FarmShare
(Difference between revisions)
Line 7: | Line 7: | ||
- | Here, we will run the Benchmarking A\b on the GPU one found here: [[http://www.mathworks.com/help/distcomp/examples/benchmarking-a-b-on-the-gpu.html?prodcode=DM&language=en|Benchmarking A\b on the GPU]] | + | Here, we will run the Benchmarking A\b on the GPU one found here: [[http://www.mathworks.com/help/distcomp/examples/benchmarking-a-b-on-the-gpu.html?prodcode=DM&language=en||Benchmarking A\b on the GPU]] |
Revision as of 14:13, 8 September 2013
Matlab GPU demos
GPU devices in Matlab are supported by the parallel computing toolbox. No special setup is required. Matlab will discover and use Cuda devices automatically.
Information can be found here: http://www.mathworks.com/products/parallel-computing/index.html For a list of examples, see: http://www.mathworks.com/products/parallel-computing/examples.html?s_tid=brdcrb
Here, we will run the Benchmarking A\b on the GPU one found here: [A\b on the GPU]
example output
Here we launch Matlab, run paralleldemo_gpu_devices to print out the Cuda device discovered by Matlab. Then we run the A\b demo.
$ module load matlab $ matlab -nodesktop Warning: No display specified. You will not be able to display graphics on the screen. Warning: No window system found. Java option 'MWT' ignored. < M A T L A B (R) > Copyright 1984-2013 The MathWorks, Inc. R2013a (8.1.0.604) 64-bit (glnxa64) February 15, 2013 No window system found. Java option 'MWT' ignored. To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com. >> paralleldemo_gpu_devices numDevices = 1 origDevice = CUDADevice with properties: Name: 'Tesla C2070' Index: 1 ComputeCapability: '2.0' SupportsDouble: 1 DriverVersion: 5.5000 ToolkitVersion: 5 MaxThreadsPerBlock: 1024 MaxShmemPerBlock: 49152 MaxThreadBlockSize: [1024 1024 64] MaxGridSize: [65535 65535 65535] SIMDWidth: 32 TotalMemory: 5.6366e+09 FreeMemory: 5.5344e+09 MultiprocessorCount: 14 ClockRateKHz: 1147000 ComputeMode: 'Default' GPUOverlapsTransfers: 1 KernelExecutionTimeout: 0 CanMapHostMemory: 1 DeviceSupported: 1 DeviceSelected: 1 device = CUDADevice with properties: Name: 'Tesla C2070' Index: 1 ComputeCapability: '2.0' SupportsDouble: 1 DriverVersion: 5.5000 ToolkitVersion: 5 MaxThreadsPerBlock: 1024 MaxShmemPerBlock: 49152 MaxThreadBlockSize: [1024 1024 64] MaxGridSize: [65535 65535 65535] SIMDWidth: 32 TotalMemory: 5.6366e+09 FreeMemory: 5.5344e+09 MultiprocessorCount: 14 ClockRateKHz: 1147000 ComputeMode: 'Default' GPUOverlapsTransfers: 1 KernelExecutionTimeout: 0 CanMapHostMemory: 1 DeviceSupported: 1 DeviceSelected: 1 >> paralleldemo_gpu_backslash(.75); Starting benchmarks with 13 different single-precision matrices of sizes ranging from 1024-by-1024 to 13312-by-13312. Creating a matrix of size 1024-by-1024. Gigaflops on CPU: 34.472190 Gigaflops on GPU: 56.288799 Creating a matrix of size 2048-by-2048. Gigaflops on CPU: 49.891778 Gigaflops on GPU: 106.760173 Creating a matrix of size 3072-by-3072. Gigaflops on CPU: 64.997307 Gigaflops on GPU: 197.257665 Creating a matrix of size 4096-by-4096. Gigaflops on CPU: 70.944260 Gigaflops on GPU: 266.873255 Creating a matrix of size 5120-by-5120. Gigaflops on CPU: 84.640804 Gigaflops on GPU: 319.151358 Creating a matrix of size 6144-by-6144. Gigaflops on CPU: 92.799236 Gigaflops on GPU: 355.467871 Creating a matrix of size 7168-by-7168. Gigaflops on CPU: 98.141367 Gigaflops on GPU: 388.194551 Creating a matrix of size 8192-by-8192. Gigaflops on CPU: 102.462204 Gigaflops on GPU: 405.167131 Creating a matrix of size 9216-by-9216. Gigaflops on CPU: 98.400070 Gigaflops on GPU: 419.867571 Creating a matrix of size 10240-by-10240. Gigaflops on CPU: 96.734765 Gigaflops on GPU: 434.993371 Creating a matrix of size 11264-by-11264. Gigaflops on CPU: 112.294056 Gigaflops on GPU: 439.164558 Creating a matrix of size 12288-by-12288. Gigaflops on CPU: 115.434767 Gigaflops on GPU: 440.911860 Creating a matrix of size 13312-by-13312. Gigaflops on CPU: 115.826290 Gigaflops on GPU: 460.198654 Starting benchmarks with 9 different double-precision matrices of sizes ranging from 1024-by-1024 to 9216-by-9216. Creating a matrix of size 1024-by-1024. Gigaflops on CPU: 14.479196 Gigaflops on GPU: 21.906035 Creating a matrix of size 2048-by-2048. Gigaflops on CPU: 27.758668 Gigaflops on GPU: 70.264055 Creating a matrix of size 3072-by-3072. Gigaflops on CPU: 35.325472 Gigaflops on GPU: 110.924771 Creating a matrix of size 4096-by-4096. Gigaflops on CPU: 41.316066 Gigaflops on GPU: 151.816138 Creating a matrix of size 5120-by-5120. Gigaflops on CPU: 47.203079 Gigaflops on GPU: 182.013352 Creating a matrix of size 6144-by-6144. Gigaflops on CPU: 50.618165 Gigaflops on GPU: 203.495957 Creating a matrix of size 7168-by-7168. Gigaflops on CPU: 53.713014 Gigaflops on GPU: 220.657206 Creating a matrix of size 8192-by-8192. Gigaflops on CPU: 54.993392 Gigaflops on GPU: 225.368964 Creating a matrix of size 9216-by-9216. Gigaflops on CPU: 56.978938 Gigaflops on GPU: 237.973215