Deep Learning training in MATLAB has just gotten faster on Apple Silicon Macs.
MATLAB R2025a has some great news for Apple Silicon Mac users: Apple Accelerate is the default BLAS. Since Apple Accelerate makes use of various hardware features in Apple Silicon CPUs, this means things like matrix-matrix and matrix-vector multiply are much faster. So I wondered....would this lead to a measurable improvement in Deep Learning training?
I had some code lying around for training a simple CNN model on the CIFAR-10 dataset so thought I'd give this a try.
I train for one epoch on all 50,000 images in the cifar10Test set and have some data augmentation going on too. Training for just one epoch means that the model is far from completely trained but I'm not interested in that, I just want to see how quickly I can get through the epoch.
Here are times for running my benchmark training script 3 times in a row on my 10 core M2 Macbook Pro.
- R2024b: 96.2s, 70.7s, 70.5s
- R2025a: 67.9s, 46.6s, 46.7s
The first run is always slower because that's when various optimizations are first worked out and applied but its clear that R2025a is quite a bit faster in all cases.
You may think to yourself "Well, Is Mike sure it's the BLAS? Maybe there are other things faster in R2025a that are causing this? It turns out you can switch to Apple Accelerate in R2024b so let's do that
- R2024b (After switching to Apple Accelerate): 65.3s, 47.7s, 47.4s
These numbers are close enough to those of R2025a and tell the story I think. It's the BLAS!
What to expect for larger datasets
CIFAR-10 uses images that are only 32x32 pixels in size so this is a pretty small dataset and the resulting matrix operations won't be all that large. When I looked at the speed of simpler matrix operations, I discovered that you get more of a speed-up with larger matrices when using Apple Accelerate as the BLAS. As such, I expect production data sets to see even more benefit with this change.
Let me know if you have any you can try!
How to run my example
Use the DownloadCIFAR10 script in the Deep Learning Tutorial Series from file exchange to create the cifar10Train folder. Then run the following script
rootFolder = './cifar10Train';
imds = imageDatastore(rootFolder,"IncludeSubfolders",true,'LabelSource', 'foldernames');
classNames = categories(imds.Labels);
% Define the model
layers = [
imageInputLayer([32 32 3]);
convolution2dLayer(3,32,Padding="same");
batchNormalizationLayer()
reluLayer();
convolution2dLayer(3,32,Padding="same");
batchNormalizationLayer()
reluLayer();
maxPooling2dLayer(3, Stride=2,Padding="same")
convolution2dLayer(3, 64,Padding="same")
batchNormalizationLayer()
reluLayer()
convolution2dLayer(3, 64,Padding="same")
batchNormalizationLayer()
reluLayer()
maxPooling2dLayer(3, Stride=2,Padding="same")
convolution2dLayer(3, 128,Padding="same")
batchNormalizationLayer()
reluLayer()
convolution2dLayer(3, 128, Padding="same")
batchNormalizationLayer()
reluLayer()
maxPooling2dLayer(3, Stride= 2,Padding="same")
fullyConnectedLayer(128)
reluLayer()
fullyConnectedLayer(10)
softmaxLayer()
];
%Define training options
opts = trainingOptions('sgdm', ...
'InitialLearnRate', 0.001, ...
'MaxEpochs', 1, ...
'MiniBatchSize', 200, ...
'ExecutionEnvironment','cpu',...
'Metrics','accuracy',...
'Verbose', true);
aug = imageDataAugmenter(RandXReflection=true,randXTranslation=[-2 2]);
ads = augmentedImageDatastore(layers(1).InputSize, imds,DataAugmentation=aug);
%Train
fprintf("Train using trainnet\n")
tic
[net, info] = trainnet(ads, layers,"crossentropy" ,opts);
toc
댓글
댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.