{"id":1219,"date":"2023-05-15T13:30:08","date_gmt":"2023-05-15T17:30:08","guid":{"rendered":"https:\/\/blogs.mathworks.com\/matlab\/?p=1219"},"modified":"2023-05-15T13:30:08","modified_gmt":"2023-05-15T17:30:08","slug":"from-hpc-consultancy-to-a-faster-fzero-function-in-matlab-r2023a","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/matlab\/2023\/05\/15\/from-hpc-consultancy-to-a-faster-fzero-function-in-matlab-r2023a\/","title":{"rendered":"From HPC consultancy to a faster fzero function in MATLAB R2023a"},"content":{"rendered":"<div class=\"rtcContent\">\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">Sometime in 2021, I was doing some High Performance Computing (HPC) consultancy with a university in the south of England. This involved various things such as getting the code to scale across multiple nodes, reducing memory requirements and speed optimisation. It was a big success and I got code that originally couldn't run at all on their HPC system to not only run but scale quite nicely. Then I threw in a 6x speed increase as a bonus by optimising the user's code.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">I eventually got stuck with MATLAB's <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/fzero.html\"><span style=\"font-family: monospace;\">fzero<\/span><\/a> function. The user's code was calling it billions of times on rather simple functions which revealed overheads that nobody had reported before.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">I had a chat with development who did some fixes and<span style=\"font-weight: bold;\"> in R2022a, <\/span><span style=\"font-weight: bold; font-family: monospace;\">fzero<\/span><span style=\"font-weight: bold;\"> was made around 3x faster than before<\/span> for simple cases like this. This was nice but a member of the core math team, Bobby Cheng, wanted to investigate further. He made a few more optimisations and also discovered that any function using <span style=\"font-family: monospace;\">varargin<\/span> had some additional overheads that could be resolved via the JIT compiler. The team behind JIT compilation took care of this and now many functions using <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/varargin.html\"><span style=\"font-family: monospace;\">varargin<\/span><\/a> have been made a little faster.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\">These fixes made <span style=\"font-family: monospace;\">fzero<\/span> even faster on the simple type of function that was being used in my HPC case. The release notes always compare against the previous MATLAB version but to show the cumulative gains, I prefer to compare this benchmark against R2021b which is when I first noticed issues with this function.<\/div>\r\n<div style=\"background-color: #f5f5f5; margin: 10px 0 10px 0;\">\r\n<div class=\"inlineWrapper\">\r\n<div style=\"border-left: 1px solid #d9d9d9; border-right: 1px solid #d9d9d9; border-top: 1px solid #d9d9d9; border-bottom: 0px none #212121; border-radius: 4px 4px 0px 0px; padding: 6px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: #212121; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px;\"><span style=\"white-space: pre;\">N = 1e6;<\/span><\/div>\r\n<\/div>\r\n<div class=\"inlineWrapper\">\r\n<div style=\"border-left: 1px solid #d9d9d9; border-right: 1px solid #d9d9d9; border-top: 0px none #212121; border-bottom: 0px none #212121; border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: #212121; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px;\"><span style=\"white-space: pre;\">rng <span style=\"color: #a709f5;\">default<\/span><\/span><\/div>\r\n<\/div>\r\n<div class=\"inlineWrapper\">\r\n<div style=\"border-left: 1px solid #d9d9d9; border-right: 1px solid #d9d9d9; border-top: 0px none #212121; border-bottom: 0px none #212121; border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: #212121; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px;\"><span style=\"white-space: pre;\">levels = 1.5*rand(N,1);<\/span><\/div>\r\n<\/div>\r\n<div class=\"inlineWrapper\">\r\n<div style=\"border-left: 1px solid #d9d9d9; border-right: 1px solid #d9d9d9; border-top: 0px none #212121; border-bottom: 0px none #212121; border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: #212121; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px;\"><span style=\"white-space: pre;\">out = zeros(N,1);<\/span><\/div>\r\n<\/div>\r\n<div class=\"inlineWrapper\">\r\n<div style=\"border-left: 1px solid #d9d9d9; border-right: 1px solid #d9d9d9; border-top: 0px none #212121; border-bottom: 0px none #212121; border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: #212121; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px;\"><span style=\"white-space: pre;\">tic<\/span><\/div>\r\n<\/div>\r\n<div class=\"inlineWrapper\">\r\n<div style=\"border-left: 1px solid #d9d9d9; border-right: 1px solid #d9d9d9; border-top: 0px none #212121; border-bottom: 0px none #212121; border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: #212121; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px;\"><span style=\"white-space: pre;\"><span style=\"color: #0e00ff;\">for <\/span>i = 1:N<\/span><\/div>\r\n<\/div>\r\n<div class=\"inlineWrapper\">\r\n<div style=\"border-left: 1px solid #d9d9d9; border-right: 1px solid #d9d9d9; border-top: 0px none #212121; border-bottom: 0px none #212121; border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: #212121; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px;\"><span style=\"white-space: pre;\"> out(i) = fzero(@(x)fzeroFun(x,levels(i)),[0 2]);<\/span><\/div>\r\n<\/div>\r\n<div class=\"inlineWrapper\">\r\n<div style=\"border-left: 1px solid #d9d9d9; border-right: 1px solid #d9d9d9; border-top: 0px none #212121; border-bottom: 0px none #212121; border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: #212121; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px;\"><span style=\"white-space: pre;\"><span style=\"color: #0e00ff;\">end<\/span><\/span><\/div>\r\n<\/div>\r\n<div class=\"inlineWrapper outputs\">\r\n<div style=\"border-left: 1px solid #d9d9d9; border-right: 1px solid #d9d9d9; border-top: 0px none #212121; border-bottom: 1px solid #d9d9d9; border-radius: 0px; padding: 0px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: #212121; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px;\"><span style=\"white-space: pre;\">toc<\/span><\/div>\r\n<div style=\"color: #212121; padding: 10px 0px 6px 17px; background: #ffffff none repeat scroll 0% 0% \/ auto padding-box border-box; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px; overflow-x: hidden; line-height: 17.234px;\">\r\n<div class=\"inlineElement eoOutputWrapper embeddedOutputsTextElement\" style=\"width: 1146px; white-space: pre; font-style: normal; color: #212121; font-size: 12px;\" data-testid=\"output_0\">\r\n<div class=\"textElement eoOutputContent\" style=\"max-height: 261px; white-space: pre; font-style: normal; color: #212121; font-size: 12px;\" data-previous-available-width=\"1116\" data-previous-scroll-height=\"17\" data-hashorizontaloverflow=\"false\">Elapsed time is 1.669698 seconds.<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<ul style=\"margin: 10px 0px 20px; padding-left: 0px; font-family: Helvetica, Arial, sans-serif; font-size: 14px;\">\r\n \t<li style=\"margin-left: 56px; line-height: 21px; min-height: 0px; text-align: left; white-space: pre-wrap;\">R2021b: 28.52 seconds<\/li>\r\n \t<li style=\"margin-left: 56px; line-height: 21px; min-height: 0px; text-align: left; white-space: pre-wrap;\">R2022a: 10.25 seconds<\/li>\r\n \t<li style=\"margin-left: 56px; line-height: 21px; min-height: 0px; text-align: left; white-space: pre-wrap;\">R2023a: 1.67 seconds<\/li>\r\n<\/ul>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><span style=\"font-weight: bold;\">That's almost a 17x speedup across a few releases and users do not need to change their code at all!<\/span> The cumulative effect of all of these optimisations really adds up over time.<\/div>\r\n<div style=\"margin: 2px 10px 9px 4px; padding: 0px; line-height: 21px; min-height: 0px; white-space: pre-wrap; color: #212121; font-family: Helvetica, Arial, sans-serif; font-style: normal; font-size: 14px; font-weight: 400; text-align: left;\"><span style=\"font-weight: bold;\">Function definition<\/span><\/div>\r\n<div style=\"background-color: #f5f5f5; margin: 10px 0 10px 0;\">\r\n<div class=\"inlineWrapper\">\r\n<div style=\"border-left: 1px solid #d9d9d9; border-right: 1px solid #d9d9d9; border-top: 1px solid #d9d9d9; border-bottom: 0px none #212121; border-radius: 4px 4px 0px 0px; padding: 6px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: #212121; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px;\"><span style=\"white-space: pre;\"><span style=\"color: #0e00ff;\">function <\/span>u = fzeroFun(x,lv)<\/span><\/div>\r\n<\/div>\r\n<div class=\"inlineWrapper\">\r\n<div style=\"border-left: 1px solid #d9d9d9; border-right: 1px solid #d9d9d9; border-top: 0px none #212121; border-bottom: 0px none #212121; border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: #212121; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px;\"><span style=\"white-space: pre;\"><span style=\"color: #008013;\">% The function to be solved by fzero<\/span><\/span><\/div>\r\n<\/div>\r\n<div class=\"inlineWrapper\">\r\n<div style=\"border-left: 1px solid #d9d9d9; border-right: 1px solid #d9d9d9; border-top: 0px none #212121; border-bottom: 0px none #212121; border-radius: 0px; padding: 0px 45px 0px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: #212121; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px;\"><span style=\"white-space: pre;\">u = x*sin(x) - lv;<\/span><\/div>\r\n<\/div>\r\n<div class=\"inlineWrapper\">\r\n<div style=\"border-left: 1px solid #d9d9d9; border-right: 1px solid #d9d9d9; border-top: 0px none #212121; border-bottom: 1px solid #d9d9d9; border-radius: 0px 0px 4px 4px; padding: 0px 45px 4px 13px; line-height: 18.004px; min-height: 0px; white-space: nowrap; color: #212121; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 14px;\"><span style=\"white-space: pre;\"><span style=\"color: #0e00ff;\">end<\/span><\/span><\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<script type=\"text\/javascript\">var css = '\/* Styling that is common to warnings and errors is in diagnosticOutput.css *\/.embeddedOutputsErrorElement {    min-height: 18px;    max-height: 550px;} .embeddedOutputsErrorElement .diagnosticMessage-errorType {    overflow: auto;} .embeddedOutputsErrorElement.inlineElement {} .embeddedOutputsErrorElement.rightPaneElement {} \/* Styling that is common to warnings and errors is in diagnosticOutput.css *\/.embeddedOutputsWarningElement {    min-height: 18px;    max-height: 550px;} .embeddedOutputsWarningElement .diagnosticMessage-warningType {    overflow: auto;} .embeddedOutputsWarningElement.inlineElement {} .embeddedOutputsWarningElement.rightPaneElement {} \/* Copyright 2015-2022 The MathWorks, Inc. *\/\/* In this file, styles are not scoped to rtcContainer since they could be in the Dojo Tooltip *\/.diagnosticMessage-wrapper {    font-family: Menlo, Monaco, Consolas, \"Courier New\", monospace;    font-size: 12px;} .diagnosticMessage-wrapper.diagnosticMessage-warningType {    color: var(--mw-color-matlabWarning);} .diagnosticMessage-wrapper.diagnosticMessage-warningType a {    color: var(--mw-color-matlabWarning);    text-decoration: underline;} .rtcThemeDefaultOverride .diagnosticMessage-wrapper.diagnosticMessage-warningType,.rtcThemeDefaultOverride .diagnosticMessage-wrapper.diagnosticMessage-warningType a {    color: var(--mw-color-matlabWarning) !important;} .diagnosticMessage-wrapper.diagnosticMessage-errorType {    color: var(--mw-color-matlabErrors);} .diagnosticMessage-wrapper.diagnosticMessage-errorType a {    color: var(--mw-color-matlabErrors);    text-decoration: underline;} .rtcThemeDefaultOverride .diagnosticMessage-wrapper.diagnosticMessage-errorType,.rtcThemeDefaultOverride .diagnosticMessage-wrapper.diagnosticMessage-errorType a {    color: var(--mw-color-matlabErrors) !important;} .diagnosticMessage-wrapper .diagnosticMessage-messagePart,.diagnosticMessage-wrapper .diagnosticMessage-causePart {    white-space: pre-wrap;} .diagnosticMessage-wrapper .diagnosticMessage-stackPart {    white-space: pre;} .embeddedOutputsTextElement,.embeddedOutputsVariableStringElement {    white-space: pre;    word-wrap:  initial;    min-height: 18px;    max-height: 550px;} .embeddedOutputsTextElement .textElement,.embeddedOutputsVariableStringElement .textElement {    overflow: auto;} .textElement,.rtcDataTipElement .textElement {    padding-top: 2px;} .embeddedOutputsTextElement.inlineElement,.embeddedOutputsVariableStringElement.inlineElement {} .inlineElement .textElement {} .embeddedOutputsTextElement.rightPaneElement,.embeddedOutputsVariableStringElement.rightPaneElement {    min-height: 16px;} .rightPaneElement .textElement {    padding-top: 2px;    padding-left: 9px;}'; var head = document.head || document.getElementsByTagName('head')[0], style = document.createElement('style'); head.appendChild(style); style.type = 'text\/css'; if (style.styleSheet){ style.styleSheet.cssText = css; } else { style.appendChild(document.createTextNode(css)); }<\/script>","protected":false},"excerpt":{"rendered":"<p>\r\nSometime in 2021, I was doing some High Performance Computing (HPC) consultancy with a university in the south of England. This involved various things such as getting the code to scale across... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/matlab\/2023\/05\/15\/from-hpc-consultancy-to-a-faster-fzero-function-in-matlab-r2023a\/\">read more >><\/a><\/p>","protected":false},"author":176,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[42,20,14],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/posts\/1219"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/users\/176"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/comments?post=1219"}],"version-history":[{"count":2,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/posts\/1219\/revisions"}],"predecessor-version":[{"id":1228,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/posts\/1219\/revisions\/1228"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/media?parent=1219"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/categories?post=1219"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/matlab\/wp-json\/wp\/v2\/tags?post=1219"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}