{"id":4309,"date":"2026-06-12T09:42:27","date_gmt":"2026-06-12T13:42:27","guid":{"rendered":"https:\/\/blogs.mathworks.com\/matlab\/?p=4309"},"modified":"2026-06-12T09:42:27","modified_gmt":"2026-06-12T13:42:27","slug":"direct-communication-between-multiple-gpus-using-matlabs-spmd-functions","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/matlab\/2026\/06\/12\/direct-communication-between-multiple-gpus-using-matlabs-spmd-functions\/","title":{"rendered":"Direct Communication between Multiple GPUs using MATLAB&#8217;s spmd Functions"},"content":{"rendered":"<div class = rtcContent><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\" style=' font-style: italic;'>Today's guest blogger is <\/span><a href = \"https:\/\/www.linkedin.com\/in\/weinan-chen-30bb3b50\/\"><span class=\"_richTextNode\" style=' font-style: italic;'>Weinan Chen<\/span><\/a><span class=\"_richTextNode\" style=' font-style: italic;'>. Weinan is an Application Engineer on the Parallel Computing team at MathWorks. Additional thanks to Linda Koletsou Soulti, Joss Knight, and the Parallel Computing development team for their help writing up this article<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\" style=' font-weight: bold; font-style: italic;'>TL;DR: <\/span><span class=\"_richTextNode\" style=' font-weight: bold; font-style: italic; font-family: monospace;'>spmd<\/span><span class=\"_richTextNode\" style=' font-weight: bold; font-style: italic;'> lets MATLAB users efficiently coordinate and communicate across multiple workers. Behind the scenes, GPUs can exchange data directly with each other, skipping the CPU entirely. Researchers from TU Dresden used this technology to implement a fast multi-GPU FFT.<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">When it comes to running multi-GPU simulations in MATLAB, a good starting point (depending on the task) is to use simple parallel constructs to distribute work across multiple GPUs. For example, one can parallelize deep learning training with a <\/span><a href = \"https:\/\/www.mathworks.com\/help\/deeplearning\/ug\/scale-up-deep-learning-in-parallel-on-gpus-and-in-the-cloud.html\"><span class=\"_richTextNode\" style=' text-decoration: underline;'>flip of a switch<\/span><\/a><span class=\"_richTextNode\"> or run data-parallel for-loops using <\/span><a href = \"https:\/\/www.mathworks.com\/help\/parallel-computing\/run-matlab-functions-on-multiple-gpus.html\"><span class=\"_richTextNode\" style=' text-decoration: underline;'>parfor across multiple GPUs<\/span><\/a><span class=\"_richTextNode\">.<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">But what if our computational task doesn't resemble either example? For instance, if our multi-GPU application requires frequent exchange of information during the computation, what options do we have?<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">In this case, <\/span><a href = \"https:\/\/www.mathworks.com\/help\/parallel-computing\/spmd.html\"><span class=\"_richTextNode\" style=' text-decoration: underline;'>the spmd family of MATLAB functions<\/span><\/a><span class=\"_richTextNode\"> is the right choice. For those familiar with the Message Passing Interface (MPI), the syntax of <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>spmd<\/span><span class=\"_richTextNode\"> closely resembles MPI programming. Essentially, this construct gives users the flexibility to tackle advanced distributed-memory tasks (and it also works with <\/span><a href = \"https:\/\/www.mathworks.com\/help\/parallel-computing\/parallel.threadpool.html\"><span class=\"_richTextNode\" style=' text-decoration: underline;'>shared-memory programming models<\/span><\/a><span class=\"_richTextNode\"> in MATLAB).<\/span><\/div><h2  style = 'margin: 20px 10px 5px 4px; padding: 0px; line-height: 25px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 20px; font-weight: 700; text-align: left; '><span class=\"_richTextNode\">Inter-worker communication on the CPUs<\/span><\/h2><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">In this post, I\u2019ll focus on the performance aspects of using <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>spmd<\/span><span class=\"_richTextNode\"> for inter-worker communication. Imagine our application needs to concatenate data across workers using an all-reduce-style operation, where . For example:<\/span><\/div><div style=\"background-color: #F5F5F5; margin: 10px 0 10px 0;\"><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 1px solid rgb(217, 217, 217); border-bottom: 0px none rgb(33, 33, 33); border-radius: 4px 4px 0px 0px; padding: 6px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">function <\/span><span >spmdCatCPU(numWorkers)<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">if <\/span><span >isempty(gcp(<\/span><span style=\"color: rgb(167, 9, 245);\">\"nocreate\"<\/span><span >))<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >    parpool(<\/span><span style=\"color: rgb(167, 9, 245);\">\"Processes\"<\/span><span >, numWorkers);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">end<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">spmd<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >    x = rand(10000, 2000);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >    tic<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >    <\/span><span style=\"color: rgb(14, 0, 255);\">for <\/span><span >i = 1:6<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >        z = spmdCat(x);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >    <\/span><span style=\"color: rgb(14, 0, 255);\">end<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >    y = toc;<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">end<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >y = gather(y);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(0, 128, 19);\">% Turn y from cell array to numeric array<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >y = mean(cell2mat(y));<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(0, 128, 19);\">% Display the mean execution time<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >disp([<\/span><span style=\"color: rgb(167, 9, 245);\">'Mean execution time: '<\/span><span >, num2str(y), <\/span><span style=\"color: rgb(167, 9, 245);\">' seconds'<\/span><span >])<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 1px solid rgb(217, 217, 217); border-radius: 0px 0px 4px 4px; padding: 0px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">end<\/span><\/span><\/div><\/div><\/div><div style=\"background-color: #F5F5F5; margin: 10px 0 10px 0;\"><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 1px solid rgb(217, 217, 217); border-bottom: 0px none rgb(33, 33, 33); border-radius: 4px 4px 0px 0px; padding: 6px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >numCPUWorkers = 4;<\/span><\/span><\/div><\/div><div class=\"inlineWrapper outputs\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 1px solid rgb(217, 217, 217); border-radius: 0px; padding: 0px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >spmdCatCPU(numCPUWorkers)<\/span><\/span><\/div><div  style = 'color: rgb(33, 33, 33); padding: 10px 0px 6px 17px; background: rgb(255, 255, 255) none repeat scroll 0% 0% \/ auto padding-box border-box; font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; overflow-x: hidden; line-height: 17.234px; '><div class=\"inlineElement eoOutputWrapper disableDefaultGestureHandling embeddedOutputsTextElement\" uid=\"CC0FBF33\" prevent-scroll=\"true\" data-testid=\"output_0\" style=\"width: 1289.33px; white-space: pre; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\" tabindex=\"-1\"><div class=\"textElement eoOutputContent\" tabindex=\"-1\" data-previous-available-width=\"1252\" data-previous-scroll-height=\"17\" data-hashorizontaloverflow=\"false\" role=\"article\" aria-roledescription=\"Use Browse Mode to explore \" aria-description=\"text output \" style=\"max-height: 261px; white-space: pre; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\">Mean execution time: 11.168 seconds<\/div><\/div><\/div><\/div><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">In the code snippet above, each MATLAB worker initializes a random matrix <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>x<\/span><span class=\"_richTextNode\">. We then concatenate all <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>x<\/span><span class=\"_richTextNode\"> values across workers and broadcast the result, <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>z<\/span><span class=\"_richTextNode\">, back to every worker. Here, the <\/span><a href = \"https:\/\/www.mathworks.com\/help\/parallel-computing\/spmdcat.html\"><span class=\"_richTextNode\">spmdCat<\/span><\/a><span class=\"_richTextNode\"> function performs both the computation (concatenation) and the communication in a single call.<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">So how long does this take? Running 4 MATLAB workers on <\/span><span class=\"_richTextNode\" style=' font-weight: bold;'>4 AMD EPYC 7R32 CPUs<\/span><span class=\"_richTextNode\">, the script takes <\/span><span class=\"_richTextNode\" style=' font-weight: bold;'>11.17 seconds<\/span><span class=\"_richTextNode\">.<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">We can use the MATLAB built-in <\/span><a href = \"https:\/\/www.mathworks.com\/help\/parallel-computing\/mpiprofile.html\"><span class=\"_richTextNode\" style=' font-weight: bold; text-decoration: underline;'>mpiprofile<\/span><\/a><span class=\"_richTextNode\"> function to investigate where that time is spent. In this case, communication <\/span><span class=\"_richTextNode\">between some worker pairs takes the majority of the time (around 10 seconds).<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><img class = \"imageNode\" src = \"http:\/\/blogs.mathworks.com\/matlab\/files\/2026\/06\/MATLAB_SPMD_gpu_1.png\" width = \"367\" height = \"295\" alt = \"\" style = \"vertical-align: baseline; width: 367px; height: 295px;\"><\/img><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\" style=' font-style: italic;'>Figure 1. The communication heat map between CPU workers generated by mpiprofile.<\/span><\/div><h2  style = 'margin: 20px 10px 5px 4px; padding: 0px; line-height: 25px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 20px; font-weight: 700; text-align: left; '><span class=\"_richTextNode\">Inter-worker communication on the GPUs<\/span><\/h2><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">Now, what if we perform the same operation across multiple GPUs? The only change required is to make <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>x<\/span><span class=\"_richTextNode\"> a <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>gpuArray<\/span><span class=\"_richTextNode\">, as follows. <\/span><\/div><div style=\"background-color: #F5F5F5; margin: 10px 0 10px 0;\"><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 1px solid rgb(217, 217, 217); border-bottom: 0px none rgb(33, 33, 33); border-radius: 4px 4px 0px 0px; padding: 6px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">function <\/span><span >spmdCatGPU(numWorkers)<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">if <\/span><span >isempty(gcp(<\/span><span style=\"color: rgb(167, 9, 245);\">\"nocreate\"<\/span><span >))<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >    parpool(<\/span><span style=\"color: rgb(167, 9, 245);\">\"Processes\"<\/span><span >, numWorkers);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">end<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">spmd<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >    x = rand(10000, 2000, <\/span><span style=\"color: rgb(167, 9, 245);\">'gpuArray'<\/span><span >);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >    tic  <\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >    <\/span><span style=\"color: rgb(14, 0, 255);\">for <\/span><span >i = 1:6<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >        z = spmdCat(x);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >    <\/span><span style=\"color: rgb(14, 0, 255);\">end<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >    y = toc;<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">end<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >y = gather(y);<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(0, 128, 19);\">% Turn y from cell array to numeric array<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >y = mean(cell2mat(y));<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(0, 128, 19);\">% Display the mean execution time<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >disp([<\/span><span style=\"color: rgb(167, 9, 245);\">'Mean execution time: '<\/span><span >, num2str(y), <\/span><span style=\"color: rgb(167, 9, 245);\">' seconds'<\/span><span >])<\/span><\/span><\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 0px none rgb(33, 33, 33); border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '>&nbsp;<\/div><\/div><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 1px solid rgb(217, 217, 217); border-radius: 0px 0px 4px 4px; padding: 0px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span style=\"color: rgb(14, 0, 255);\">end<\/span><\/span><\/div><\/div><\/div><div style=\"background-color: #F5F5F5; margin: 10px 0 10px 0;\"><div class=\"inlineWrapper\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 1px solid rgb(217, 217, 217); border-bottom: 0px none rgb(33, 33, 33); border-radius: 4px 4px 0px 0px; padding: 6px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >numGPUWorkers = 4;<\/span><\/span><\/div><\/div><div class=\"inlineWrapper outputs\"><div  style = 'border-left: 1px solid rgb(217, 217, 217); border-right: 1px solid rgb(217, 217, 217); border-top: 0px none rgb(33, 33, 33); border-bottom: 1px solid rgb(217, 217, 217); border-radius: 0px; padding: 0px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: rgb(33, 33, 33); font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace, Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; '><span style=\"white-space: pre\"><span >spmdCatGPU(numGPUWorkers)<\/span><\/span><\/div><div  style = 'color: rgb(33, 33, 33); padding: 10px 0px 6px 17px; background: rgb(255, 255, 255) none repeat scroll 0% 0% \/ auto padding-box border-box; font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace; font-size: 14px; overflow-x: hidden; line-height: 17.234px; '><div class=\"inlineElement eoOutputWrapper disableDefaultGestureHandling embeddedOutputsTextElement\" uid=\"8D8F8606\" prevent-scroll=\"true\" data-testid=\"output_1\" tabindex=\"-1\" style=\"width: 1289.33px; white-space: pre; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\"><div class=\"textElement eoOutputContent\" tabindex=\"-1\" data-previous-available-width=\"1252\" data-previous-scroll-height=\"17\" data-hashorizontaloverflow=\"false\" role=\"article\" aria-roledescription=\"Use Browse Mode to explore \" aria-description=\"text output \" style=\"max-height: 261px; white-space: pre; font-style: normal; color: rgb(33, 33, 33); font-size: 12px;\">Mean execution time: 1.2486 seconds<\/div><\/div><\/div><\/div><\/div><div  style = 'margin: 10px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">How long do we think this version will take? Intuitively, we might expect it to be slower than the CPU version: after all, GPUs are often viewed as co-processors, and moving data across multiple GPUs sounds like it would require staging via host memory. In practice, the GPU version is significantly faster. It completes in <\/span><span class=\"_richTextNode\" style=' font-weight: bold;'>1.24 seconds on 4 NVIDIA A10G GPUs<\/span><span class=\"_richTextNode\"> over the same PCI-e connection\u2014almost <\/span><span class=\"_richTextNode\" style=' font-weight: bold;'>9.0\u00d7 faster<\/span><span class=\"_richTextNode\"> than running the same code on <\/span><span class=\"_richTextNode\" style=' font-weight: bold;'>4 AMD EPYC 7R32 CPUs<\/span><span class=\"_richTextNode\">.<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">Running this script through <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>mpiprofile<\/span><span class=\"_richTextNode\"> reveals that transferring the same amount of data across multiple GPUs is much faster.<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><img class = \"imageNode\" src = \"http:\/\/blogs.mathworks.com\/matlab\/files\/2026\/06\/MATLAB_SPMD_gpu_2.png\" width = \"377\" height = \"304\" alt = \"\" style = \"vertical-align: baseline; width: 377px; height: 304px;\"><\/img><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\" style=' font-style: italic;'>Figure <\/span><span class=\"_richTextNode\" style=' font-style: italic;'>2<\/span><span class=\"_richTextNode\" style=' font-style: italic;'>. The communication heat map between GPU workers generated by mpiprofile.<\/span><\/div><h2  style = 'margin: 20px 10px 5px 4px; padding: 0px; line-height: 25px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 20px; font-weight: 700; text-align: left; '><span class=\"_richTextNode\">Why is inter-worker communication on the GPUs faster?<\/span><\/h2><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">Now, back to our original question, how is multi-GPU communication faster than its multi-CPU counterpart?<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">The answer is that our Parallel Computing Toolbox developers have employed a technique called <\/span><a href = \"https:\/\/docs.nvidia.com\/cuda\/cuda-programming-guide\/04-special-topics\/inter-process-communication.html\"><span class=\"_richTextNode\" style=' font-weight: bold; text-decoration: underline;'>NVIDIA CUDA Inter-Process Communication (IPC)<\/span><\/a><span class=\"_richTextNode\" style=' font-weight: bold;'> <\/span><span class=\"_richTextNode\">when implementing <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>spmd<\/span><span class=\"_richTextNode\"> function for GPU communications. CUDA IPC makes it possible to share GPU allocations between processes via portable handles, so that data can be accessed without staging through host memory. <\/span><span class=\"_richTextNode\">Multi-GPU <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>spmd<\/span><span class=\"_richTextNode\"> workflows in MATLAB can benefit from these GPU-aware mechanisms, which contribute to the speedup we saw earlier. One limitation is that this applies to <\/span><span class=\"_richTextNode\" style=' font-weight: bold;'>process pools<\/span><span class=\"_richTextNode\"> (not thread-based pools).<\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">As a result, when MATLAB runs multi-GPU code, it automatically benefits from these optimizations to accelerate simulations.<\/span><\/div><h2  style = 'margin: 20px 10px 5px 4px; padding: 0px; line-height: 25px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 20px; font-weight: 700; text-align: left; '><span class=\"_richTextNode\">Implementing a multi-GPU Fast Fourier Transform<\/span><\/h2><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">One example of how this technology has helped MATLAB speed up multi-GPU calculations is discussed in this <\/span><a href = \"https:\/\/arxiv.org\/pdf\/2603.26818\"><span class=\"_richTextNode\" style=' text-decoration: underline;'>paper<\/span><\/a><span class=\"_richTextNode\"> (<\/span><a href = \"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/183518-matlabmultigpufft\"><span class=\"_richTextNode\">code<\/span><\/a><span class=\"_richTextNode\">).   By using the <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>spmd<\/span><span class=\"_richTextNode\"> family of functions (<\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>spmdCat<\/span><span class=\"_richTextNode\">, <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>spmdSend<\/span><span class=\"_richTextNode\">, <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>spmdReceive<\/span><span class=\"_richTextNode\">, etc.) to implement a multi-GPU <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>fft<\/span><span class=\"_richTextNode\">, the authors achieved a maximum speedup of approximately <\/span><span class=\"_richTextNode\" style=' font-weight: bold;'>60\u00d7<\/span><span class=\"_richTextNode\"> when running on <\/span><span class=\"_richTextNode\" style=' font-weight: bold;'>4 NVIDIA H100 GPUs<\/span><span class=\"_richTextNode\">, compared to <\/span><span class=\"_richTextNode\" style=' font-weight: bold;'>100 CPUs<\/span><span class=\"_richTextNode\">. <\/span><\/div><div  style = 'margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre; color: rgb(33, 33, 33); font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left; '><span class=\"_richTextNode\">If you are working with<\/span><span class=\"_richTextNode\"> advanced <\/span><span class=\"_richTextNode\">data-communication patterns in your multi-GPU MATLAB applications, don\u2019t hesitate to give the <\/span><span class=\"_richTextNode\" style=' font-family: monospace;'>spmd<\/span><span class=\"_richTextNode\"> functions a try!<\/span><\/div>\r\n<\/div><script type=\"text\/javascript\">var css = '._richTextNode {white-space: break-spaces;} \/* Copyright 2014-2025 The MathWorks, Inc. *\/\/* Styling that is common to warnings and errors is in diagnosticOutput.css *\/.embeddedOutputsErrorElement {    min-height: 18px;    max-height: 550px;} .embeddedOutputsErrorElement .diagnosticMessage-errorType {    overflow: auto;} .embeddedOutputsErrorElement.inlineElement {} .embeddedOutputsErrorElement.rightPaneElement {} \/* Copyright 2014-2025 The MathWorks, Inc. *\/\/* Styling that is common to warnings and errors is in diagnosticOutput.css *\/.embeddedOutputsWarningElement {    min-height: 18px;    max-height: 550px;} .embeddedOutputsWarningElement .diagnosticMessage-warningType {    overflow: auto;} .embeddedOutputsWarningElement.inlineElement {} .embeddedOutputsWarningElement.rightPaneElement {} \/* Copyright 2015-2025 The MathWorks, Inc. *\/\/* In this file, styles are not scoped to rtcContainer since they could be in the Dojo Tooltip *\/.diagnosticMessage-wrapper {    \/* Since this selector may be used outside RTC Container without RTC semantic variables, a    default value is used. An example here is the App Designers alert tooltip. *\/    font-family: var(--rtc-output-font-family, Menlo, Monaco, Consolas, \"Courier New\", monospace);    font-size: var(--rtc-output-font-size, 12px);} .diagnosticMessage-wrapper.diagnosticMessage-warningType {    \/*This fallback value will be used for appdesigner warnings*\/    color: var(--rtc-warning-output-color, var(--mw-color-matlabWarning));} .diagnosticMessage-wrapper.diagnosticMessage-warningType a {    \/*This fallback value will be used for appdesigner warnings*\/    color: var(--rtc-warning-output-color, var(--mw-color-matlabWarning));    text-decoration: underline;} .rtcThemeDefaultOverride .diagnosticMessage-wrapper.diagnosticMessage-warningType,.rtcThemeDefaultOverride .diagnosticMessage-wrapper.diagnosticMessage-warningType a {    color: var(--mw-color-matlabWarning) !important;} .diagnosticMessage-wrapper.diagnosticMessage-errorType {    \/*This fallback value will be used in appdesigner error tooltip text*\/    color: var(--rtc-error-output-color, var(--mw-color-matlabErrors));} .diagnosticMessage-wrapper.diagnosticMessage-errorType a {    \/*This fallback value will be used in appdesigner error tooltip text*\/    color: var(--rtc-error-output-color, var(--mw-color-matlabErrors));    text-decoration: underline;} .rtcThemeDefaultOverride .diagnosticMessage-wrapper.diagnosticMessage-errorType,.rtcThemeDefaultOverride .diagnosticMessage-wrapper.diagnosticMessage-errorType a {    color: var(--mw-color-matlabErrors) !important;} .diagnosticMessage-wrapper .diagnosticMessage-messagePart,.diagnosticMessage-wrapper .diagnosticMessage-causePart {    white-space: pre-wrap;} .diagnosticMessage-wrapper .diagnosticMessage-stackPart {    white-space: pre;} \/* Copyright 2014-2025 The MathWorks, Inc. *\/.embeddedOutputsTextElement,.embeddedOutputsVariableStringElement {    white-space: pre;    word-wrap:  initial;    min-height: 18px;    max-height: 550px;} .embeddedOutputsTextElement .textElement,.embeddedOutputsVariableStringElement .textElement {    overflow: auto;} .textElement,.rtcDataTipElement .textElement {    padding-top: 2px;} .embeddedOutputsTextElement.inlineElement,.embeddedOutputsVariableStringElement.inlineElement {} .inlineElement .textElement {} .embeddedOutputsTextElement.rightPaneElement,.embeddedOutputsVariableStringElement.rightPaneElement {    min-height: 16px;} .rightPaneElement .textElement {    padding-top: 2px;    padding-left: 9px;}'; var head = document.head || document.getElementsByTagName('head')[0], style = document.createElement('style'); head.appendChild(style); style.type = 'text\/css'; if (style.styleSheet){ style.styleSheet.cssText = css; } else { style.appendChild(document.createTextNode(css)); }<\/script>","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/matlab\/files\/2026\/06\/MATLAB_SPMD_gpu_2.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><p>Today's guest blogger is Weinan Chen. Weinan is an Application Engineer on the Parallel Computing team at MathWorks. Additional thanks to Linda Koletsou Soulti, Joss Knight, and the Parallel... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/matlab\/2026\/06\/12\/direct-communication-between-multiple-gpus-using-matlabs-spmd-functions\/\">read more >><\/a><\/p>","protected":false},"author":176,"featured_media":4307,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[23,42,136,14],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/posts\/4309"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/users\/176"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/comments?post=4309"}],"version-history":[{"count":1,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/posts\/4309\/revisions"}],"predecessor-version":[{"id":4310,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/posts\/4309\/revisions\/4310"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/media\/4307"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/media?parent=4309"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/categories?post=4309"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/tags?post=4309"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}