{"id":19285,"date":"2026-05-22T09:19:14","date_gmt":"2026-05-22T13:19:14","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=19285"},"modified":"2026-05-22T09:19:14","modified_gmt":"2026-05-22T13:19:14","slug":"from-pytorch-litert-to-c-c-and-cuda-source-code","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2026\/05\/22\/from-pytorch-litert-to-c-c-and-cuda-source-code\/","title":{"rendered":"From PyTorch &#038; LiteRT to C, C++, and CUDA source code"},"content":{"rendered":"<div class=\"rtcContent\">\r\n<div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<div>\r\n<h6><\/h6>\r\n<table style=\"background-color: #e2f0ff;\">\r\n<tbody>\r\n<tr>\r\n<td style=\"vertical-align: middle; padding: 10px;\"><strong>Guest writer: <a href=\"https:\/\/www.linkedin.com\/in\/christoph-stockhammer\/\" target=\"_blank\" rel=\"noopener\">Christoph Stockhammer<\/a><\/strong>\r\n<h6><\/h6>\r\n<span style=\"font-weight: bold;\">Christoph Stockhammer <\/span>is an application engineer at MathWorks, focusing on AI use cases. Christoph holds a Master's in Mathematics from the Technical University of Munich.<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<h6><\/h6>\r\n<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><\/div>\r\n<\/div>\r\n<div><\/div>\r\n<div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">If you have ever tried to deploy a PyTorch model on embedded hardware, you know the story: the model itself is only half the battle. The real challenge starts when you need a runtime, shared libraries, hardware-specific builds, and a toolchain that somehow still fits on your target device.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Starting with MATLAB R2026a, there is an alternative route. Instead of shipping a runtime to the device, you can now generate <strong>standalone C\/C++ (and CUDA&#174;) source code<\/strong> directly from <strong>PyTorch ExportedProgram<\/strong> and <strong>LiteRT<\/strong> models. No interpreter. No inference engine. No import process into MATLAB. Just readable and portable source code that you compile with your existing toolchain.<\/div>\r\n<h2 style=\"margin: 3px 10px 5px 4px; padding: 0px; line-height: 25px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 20px; font-weight: bold; text-align: left;\">Why standalone code generation?<\/h2>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Embedded targets--from microcontrollers to devices like Raspberry Pi or NVIDIA&#174; Jetson&#8482;--care deeply about predictability. Memory usage, startup time, and binary size often matter more than flexibility. Runtime-based solutions such as LiteRT (for Microcontrollers) or ONNX Runtime do a great job, but they still bring dependencies and abstraction layers that require more processing power and memory usage above and beyond pure source code.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Standalone code generation removes that layer entirely. The generated code contains only what your model needs: loops, math, and data. That makes it easier to analyze, debug, certify, and integrate with existing embedded software.<\/div>\r\n<h2 style=\"margin: 3px 10px 5px 4px; padding: 0px; line-height: 25px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 20px; font-weight: bold; text-align: left;\">Under the hood: MLIR as the secret sauce<\/h2>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">The key technology behind this workflow is <strong>MLIR (Multi-Level Intermediate Representation)<\/strong>. MATLAB Coder lowers PyTorch ExportedProgram and LiteRT models into <a href=\"https:\/\/mlir.llvm.org\/\" target=\"_blank\" rel=\"noopener\">MLIR<\/a>, applies a series of graph- and hardware-aware optimizations, then emits C\/C++ or CUDA source code.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Because the optimizations happen at the IR level, the generated code can take advantage of:<\/div>\r\n<ul style=\"margin: 10px 0px 20px; padding-left: 0px; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-size: 14px;\">\r\n \t<li style=\"margin-left: 56px; line-height: 21px; min-height: 0px; text-align: left; white-space: pre-wrap;\">Operator fusion to reduce memory traffic<\/li>\r\n \t<li style=\"margin-left: 56px; line-height: 21px; min-height: 0px; text-align: left; white-space: pre-wrap;\">Parallel execution using OpenMP on multicore CPUs<\/li>\r\n \t<li style=\"margin-left: 56px; line-height: 21px; min-height: 0px; text-align: left; white-space: pre-wrap;\">Vectorization (for example, ARM&#174; Neon or Intel&#174; AVX)<\/li>\r\n \t<li style=\"margin-left: 56px; line-height: 21px; min-height: 0px; text-align: left; white-space: pre-wrap;\">Hardware-specific kernels where available<\/li>\r\n<\/ul>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">The result is code that can be both portable <em>and<\/em> efficient.<\/div>\r\n<\/div>\r\n<div><\/div>\r\n<div>\r\n<h2 style=\"margin: 3px 10px 5px 4px; padding: 0px; line-height: 25px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 20px; font-weight: bold; text-align: left;\">PyTorch and LiteRT, supported directly<\/h2>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">The workflow supports two input formats:<\/div>\r\n<ul style=\"margin: 10px 0px 20px; padding-left: 0px; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-size: 14px;\">\r\n \t<li style=\"margin-left: 56px; line-height: 21px; min-height: 0px; text-align: left; white-space: pre-wrap;\"><strong>PyTorch ExportedProgram<\/strong>, for a clean and stable representation of PyTorch models<\/li>\r\n \t<li style=\"margin-left: 56px; line-height: 21px; min-height: 0px; text-align: left; white-space: pre-wrap;\"><strong>LiteRT<\/strong>, whether it originates from TensorFlow or is converted from PyTorch models<\/li>\r\n<\/ul>\r\n<h2 style=\"margin: 3px 10px 5px 4px; padding: 0px; line-height: 25px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 20px; font-weight: bold; text-align: left;\">A classic example: multi-layer perceptron (MLP)<\/h2>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Feedforward networks based on dense layers are one of the most popular network architectures and they are ideal for introductory examples. So let's look at one:<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Defining such a network only takes a few lines of code in torch:<\/div>\r\n<pre style=\"background-color: #f5f5f5; padding: 10px; border-radius: 4px; font-family: monospace; font-size: 13px; overflow-x: auto;\"><code>self.net = nn.Sequential(\r\n    nn.Linear(in_features, hidden1),\r\n    nn.ReLU(),\r\n    nn.Linear(hidden1, hidden2),\r\n    nn.ReLU(),\r\n    nn.Linear(hidden2, out_features)\r\n)<\/code><\/pre>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Usually, we would train the model with actual data, but for our purposes it suffices to just write the model to disk with the original (randomly initialized) weights and biases:<\/div>\r\n<pre style=\"background-color: #f5f5f5; padding: 10px; border-radius: 4px; font-family: monospace; font-size: 13px; overflow-x: auto;\"><code>example_inputs = (torch.randn(batch_size, in_features),)\r\nexported_program = torch.export.export(model, example_inputs)\r\nout_file = \"three_layer_mlp.pt2\"\r\ntorch.export.save(exported_program, out_file)<\/code><\/pre>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">This will produce a file \"three_layer_mlp.pt2\" and we can visualize it in tools such as <a href=\"https:\/\/netron.app\/\" target=\"_blank\" rel=\"noopener\">Netron<\/a>. As expected, we see three dense layers (titled \"linear\") with relu activations in between:<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><img decoding=\"async\" class=\"alignnone wp-image-19299\" style=\"vertical-align: baseline; max-width: 200px; height: auto;\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2026\/05\/RLU_upload.png\" alt=\"Netron visualization of three-layer MLP showing linear layers with relu activations\" width=\"200\" \/><\/div>\r\n<\/div>\r\n<div><\/div>\r\n<div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Now, we can go ahead and load this model in MATLAB as well:<\/div>\r\n<pre style=\"background-color: #f5f5f5; padding: 10px; border-radius: 4px; font-family: monospace; font-size: 13px; overflow-x: auto; color: #333333;\">>> mlp = loadPyTorchExportedProgram('three_layer_mlp.pt2')<\/pre>\r\n<div style=\"background-color: #ffffff; padding: 10px; border: none; font-family: monospace; font-size: 13px; overflow-x: auto; color: #555555 !important; white-space: pre;\">Loading the model. This may take a few minutes.\r\n\r\nmlp =\r\n\r\n  PyTorchExportedProgram contained in three_layer_mlp.pt2:\r\n\r\n    Input Specifications\r\n    ______________________________________\r\n\r\n    Input  Name    Size       Type\r\n    _____  _____   ________   ________\r\n\r\n    1      \"in1\"   \"1 x 16\"   \"single\"\r\n\r\n    Output Specifications\r\n    _______________________________________\r\n\r\n    Output  Name     Size      Type\r\n    ______  ______   _______   ________\r\n\r\n    1       \"out1\"   \"1 x 8\"   \"single\"<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">This looks good: We see the correct number of input features (16) and output features (8). It is advisable to also run some inference tests from within MATLAB. For example, I can have the model predict just with some random data:<\/div>\r\n<pre style=\"background-color: #f5f5f5; padding: 10px; border-radius: 4px; font-family: monospace; font-size: 13px; overflow-x: auto; color: #333333;\">>> invoke(mlp, randn(1,16,'single'))<\/pre>\r\n<div style=\"background-color: #ffffff; padding: 10px; border: none; font-family: monospace; font-size: 13px; overflow-x: auto; color: #555555 !important; white-space: pre;\">ans =\r\n\r\n  1&#215;8 single row vector\r\n\r\n   -0.0442   -0.2474    0.3094    0.1759   -0.0768    0.0293    0.1398   -0.0767<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">We now want to move ahead and generate C source code from the model's prediction function. To this end, we just need to put the above commands into a MATLAB function:<\/div>\r\n<pre style=\"background-color: #f5f5f5; padding: 10px; border-radius: 4px; font-family: monospace; font-size: 13px; overflow-x: auto;\"><code>function predictions = predictModel(inputFeatures)\r\n    mlp = loadPyTorchExportedProgram('three_layer_mlp.pt2');\r\n    predictions = invoke(mlp, inputFeatures);\r\nend<\/code><\/pre>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">As a final step, we generate C source code files with a single MATLAB command:<\/div>\r\n<pre style=\"background-color: #f5f5f5; padding: 10px; border-radius: 4px; font-family: monospace; font-size: 13px; overflow-x: auto; color: #333333;\">>> codegen -c predictModel.m -args {zeros(1,16,'single')}<\/pre>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">My laptop has an Intel i7 processor. In order to improve performance, the code generator recognizes my processor configuration automatically (I could also manually override this with a different configuration if I wanted to). Using this piece of information, the code generator leverages AVX instructions (which my processor supports) for calculating the matrix-vector multiplications which are at the heart of the dense layers. So rather than naive for-loops, I get something like the following in the generated C code:<\/div>\r\n<pre style=\"background-color: #f5f5f5; padding: 10px; border-radius: 4px; font-family: monospace; font-size: 13px; overflow-x: auto;\"><code>c = _mm256_add_ps(c, _mm256_mul_ps(_mm256_loadu_ps(&amp;A[idxA]), b));<\/code><\/pre>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Here _mm256_mul_ps is the single precision (FP32) packed multiplication operation using Intel intrinsics, which can lead to much better performance. The above is just one example of how the generated code is optimized to the specific hardware it is intended to be executed on.<\/div>\r\n<\/div>\r\n<div><\/div>\r\n<div>\r\n<h2 style=\"margin: 3px 10px 5px 4px; padding: 0px; line-height: 25px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 20px; font-weight: bold; text-align: left;\">More than just inference<\/h2>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">In real applications, neural networks almost never run in isolation. Preprocessing, postprocessing, other control logic and more are almost always part of the story, too. One nice thing about using automated code generation is that you can generate code for the <em>entire application<\/em>--signal processing, feature extraction, and neural network inference--in one deterministic step.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Example applications, including the MATLAB code and Simulink models, ship with this capability. For example, running <a href=\"https:\/\/www.mathworks.com\/help\/coder\/ug\/monocular-depth-estimation-using-depth-anything-v2-pytorch-model.html\" target=\"_blank\" rel=\"noopener\">monocular depth estimation using Depth Anything V2 PyTorch Model<\/a> can be used for applications such as autonomous driving and navigation.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><img decoding=\"async\" class=\"alignnone wp-image-19295\" style=\"vertical-align: baseline; max-width: 800px; height: auto;\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2026\/05\/Depth-Anything-V2-output.png\" alt=\"Monocular depth estimation example showing original image and depth map side by side\" width=\"800\" \/><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">The <a href=\"https:\/\/www.mathworks.com\/help\/coder\/ug\/segmentation-and-object-detection-using-yolo-v11-litert-model.html\" target=\"_blank\" rel=\"noopener\">segmentation and object detection using YOLO v11 LiteRT model<\/a> example shows how to identify and outline objects to enable image segmentation and detection, all without relying on NVIDIA cuDNN or TensorRT&#8482; libraries.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><img decoding=\"async\" loading=\"lazy\" width=\"560\" height=\"337\" class=\"alignnone size-full wp-image-19297\" style=\"vertical-align: baseline;\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2026\/05\/YOLOv11-output.png\" alt=\"YOLO v11 segmentation and object detection example\" \/><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Finally, the <a href=\"https:\/\/www.mathworks.com\/help\/coder\/ug\/predict-battery-state-of-charge-using-litert-model.html\" target=\"_blank\" rel=\"noopener\">predict battery state of charge using LiteRT model<\/a> example shows the workflow to deploy an AI model that predicts the battery state of charge (SOC), a key metric for energy management systems in electric vehicles and other battery-powered devices.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><img decoding=\"async\" loading=\"lazy\" width=\"616\" height=\"371\" class=\"alignnone size-full wp-image-19294\" style=\"vertical-align: baseline;\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2026\/05\/BSoC-output.png\" alt=\"Predicted BSOC vs Ground truth chart\" \/><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">In this day and age, one could also consider Generative AI tools when looking for C or C++ source. One major distinction of the workflow we described above is that it is <em>deterministic<\/em> and <em>traceable<\/em>. If you generate the C sources ten times (using the same configuration) it will produce the exact same C sources ten times. No dependency on backend large language models, context or prompts.<\/div>\r\n<\/div>\r\n<div><\/div>\r\n<div>\r\n<h2 style=\"margin: 3px 10px 5px 4px; padding: 0px; line-height: 25px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 20px; font-weight: bold; text-align: left;\">A practical alternative to runtimes<\/h2>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Our benchmarks show that the generated code delivers performance quite comparable to runtime-based approaches. Just as importantly, the code is human-readable, configurable, and re-entrant as needed.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Here is a chart showing the performance of the automatically generated code vs LiteRT for several popular network architectures on Raspberry Pi 4. In the initial release, our focus is on supporting as many networks and layers as possible given our users' propensity to use many different types of networks. Many optimizations are planned and we anticipate matching and exceeding the performance of the interpreter in subsequent releases:<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><img decoding=\"async\" class=\"alignnone wp-image-19296\" style=\"vertical-align: baseline; max-width: 800px; height: auto;\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2026\/05\/Raspberry-Pi-4-benchmark-R2026a.png\" alt=\"Performance comparison chart: Generated Code vs LiteRT C++ interpreter on Raspberry Pi 4 ARM Cortex-A for various network architectures\" width=\"800\" \/><\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif, Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">In summary: If you are looking for an AI deployment workflow that prioritizes transparency, portability, and tight integration with embedded systems, standalone code generation from PyTorch and LiteRT using MATLAB Coder is well worth a look.<\/div>\r\n<\/div>\r\n<\/div>\r\n","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2026\/05\/RLU_upload.png\" onError=\"this.style.display ='none';\" \/><\/div><p>\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nGuest writer: Christoph Stockhammer\r\n\r\nChristoph Stockhammer is an application engineer at MathWorks, focusing on AI use cases. Christoph holds a Master's in Mathematics from the... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2026\/05\/22\/from-pytorch-litert-to-c-c-and-cuda-source-code\/\">read more >><\/a><\/p>","protected":false},"author":156,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/19285"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/156"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=19285"}],"version-history":[{"count":13,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/19285\/revisions"}],"predecessor-version":[{"id":19303,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/19285\/revisions\/19303"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=19285"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=19285"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=19285"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}