MATLAB now has over 1,000 functions that Just Work on NVIDIA GPUs
GPU support in MATLAB started in R2010b
Back in R2010b, the first GPU enabled functions were made available in MATLAB via Parallel Computing Toolbox. The idea was then, as it is now, to overload existing MATLAB functions such that they accept the gpuArray type. If you gave a gpuArray to a function then it would automatically work on the GPU without the user having to do anything else. That is, if you had MATLAB code like this
y = fft(x); % Compute the fast fourier transform of x on the CPU
Then all you need to do to run that on the GPU is use gpuArray and gather to take care of the transfer to and from the GPU. Other than that, the MATLAB code is the same
gpuX = gpuArray(x); % Transfer the array x to the GPU
gpuY = fft(gpuX); % fft is now performed on the GPU
y = gather(gpuY); % Gather the result from the GPU
R2010b provided support for 123 such functions, most of which were element-wise math such as sin, exp along with arithmetic operators such as + and -. There were also some more interesting things such as fft, mtimes (matrix-matrix multiplication) and the all important mldivide, the full name for backslash, possibly the most famous operator in MATLAB.
It was a great start but there were also some awkward omissions, the most glaring of which were subsref and subsasgn. This meant indexing of gpuArray's was not supported. Imagine trying to do anything useful in MATLAB without ever indexing anything! My colleagues tell me that this made for an interesting trip to the SC 2010 super-computing conference where they introduced MATLAB's GPU functionality for the first time.
As of R2024b, 1195 MATLAB functions have gpuArray support
14 years later and things have changed drastically. As you browse the MATLAB documentation, you'll note that many functions have an Extended Capabilities section. Here's that section for fft in MATLAB R2024b where you can see that there are not just one but two entries related to GPUs: GPU Code Generation and GPU Arrays. We'll talk about GPU code Generation another time.
Today, we are focused on GPU Arrays, the topic of today's post. In R2024b. 1195 functions across 14 toolboxes have now got support for gpuArray, which is a lot!
The chart below shows this growth over time.
Working with gpuArray in this way can be extremely effective. One public case study I can point to is from NASA's Langley Research Center who tell us that it took them 30 minutes to get their MATLAB algorithm working on the GPU resulting in a 40x speedup.
What does gpuArray support really mean?
When you start digging down into the details, what we mean by 'support for gpuArray' is both really simple and full of complications. At the simple end of the spectrum, it means that the function in question can accept gpuArray's and do something with them. The complexity starts to creep in when we ask what that 'something' is.
Of the 1195 functions with gpuArray support, 729 of them are supported with no limitations. Anything you can do on the CPU with a normal MATLAB array, you can do on the GPU with a gpuArray. This can make it incredibly easy to switch from using the CPU to the GPU for quite a lot of MATLAB code.
The other 466 functions have some kind of restriction or caveat. The fft function is one of these. Expand the GPU Arrays section in Extended Capabilities and you'll see what they are.
Many of these caveats are minor. I don't consider the detail for fft to be a big deal, although if you disagree do let me know and why. Other restrictions might be more problematic. The restriction for the lu function, for example, is that it doesn't accept sparse gpuArray. The inv function also doesn't accept sparse gpuArray and you also have to be more careful when dealing with badly scaled or nearly singular matrices since the gpuArray version of inv won't warn you when you have them whereas the standard version will. Even mtimes, the full name of MATLAB's matrix-matrix multiplication operator has a restriction: it doesn't support int64 on the GPU.
You may reasonably ask "Why do these restrictions and caveats exist?". Indeed, the reviewer of this blog post asked exactly that! Often, it's for performance reasons. Take fft, for example. To know that the result has all zero imaginary part we'd have to run another GPU kernel to go and check all the values. We would then need to wait for this operation to finish (i.e. synchronize the device) to see what the answer is. If the imaginary parts are all zero, we would then need to run another kernel to de-interlace the array and drop the imaginary part. On a GPU, this could result in significant slowdown and yet most of the time the imaginary parts are non-zero. That is, everything would be made much slower for a convenience that's rarely needed. Of course some restrictions are there simply because it would take a lot of effort to implement them and we haven't seen the demand yet. As always, your feedback is essential here!
At the other end of the scale we have around 140-150 functions that accept a GPU array but don't actually run on the GPU! These are essentially convenience functions such as plot that have been modified so that they Just Work when you send them a gpuArray. This means that, for example, you can do
x = gpuArray.linspace(-pi,pi,1000); % Construct x directly on the GPU
y = sin(x); % Compute sin(x) on the GPU
plot(x,y); % produce the plot
Instead of
x = gpuArray.linspace(-pi,pi,1000); % Construct x directly on the GPU
y = sin(x); % Compute sin(x) on the GPU
CPUx = gather(x); % Bring x from GPU into main memory
CPUy = gather(y); % Bring y from GPU into main memory
plot(CPUx,CPUy); % Do the plot
When you send gpuArrays to plot, the gather operations are done for you behind the scenes. So, plot supports gpuArray even though the plot operation is not done by the GPU.
Pagewise backslash - the 1000th function to support gpuArray
While gathering the numbers for this post, a few of us got fixated on what the 1000th function might be. It turns out to be slightly tricky to figure this out but we are reasonably confident that the 1000th MATLAB function to support gpuArray is pagemldivide -- the pagewise version of mldivide, known more commonly as the backslash operator. That the 1000th gpuArray function is directly related the most iconic of MATLAB functions, and one of the first functions that ever received gpuArray support, seems rather fitting.
Moving beyond gpuArray
gpuArray is the easiest way of getting many applications ported to use the GPU but there's more to GPUs in MATLAB than gpuArray. Here are some pointers to where you might go next
- arrayfun - Compiles MATLAB functions to native GPU code
- Run CUDA or PTX Code on GPU - Hand write your own GPU kernels
- GPU Coder - Generates CUDA Code from MATLAB code and Simulink models
A great example that compares gpuArray with arrayfun and hand written CUDA mex functions is Illustrating Three Approaches to GPU Computing: The Mandelbrot Set
What functions do you need to be supported on the GPU?
There are several drivers behind this growth of GPU supported functions in MATLAB. In the early days, it was just a case of taking care of the obvious things: Matrix operations, fourier transforms, element-wise operations and so on. Over time we started ensuring that various workflows were taken care of such as Monte-Carlo simulations, Deep Learning, Image Processing or more specialized things such as LDPC Link Simulation from Communications Toolbox.
These days we still have some functions in the 'obvious' category: new MATLAB functions that obviously could do with a gpuArray version. The createArray function, introduced in R2024a, is an example of such a case. The greatest driver, however, concerning what additional functions get gpuArray support is user requests.
So, if you have a workflow that you think would benefit from additional gpuArray support, get in touch and let us know.
Comments
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.