{"id":5917,"date":"2023-05-04T11:26:19","date_gmt":"2023-05-04T15:26:19","guid":{"rendered":"https:\/\/blogs.mathworks.com\/videos\/?p=5917"},"modified":"2023-05-04T11:26:19","modified_gmt":"2023-05-04T15:26:19","slug":"speeding-up-a-large-file-processing-job-with-parfor-on-a-cluster","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/videos\/2023\/05\/04\/speeding-up-a-large-file-processing-job-with-parfor-on-a-cluster\/","title":{"rendered":"Speeding-Up a Large File Processing Job with Parfor on a Cluster"},"content":{"rendered":"<p>This video uses a different recording style from my others. Rather than recording continuously while I work, I pause recording when my code changes are taking a long time to execute or I have some repetitive editing tasks. The pausing of the recording effectively edits my video down to a shorter duration. <\/p>\n<p>This lets me show you real projects and problems that typically take many hours to solve, ones that involve lots of troubleshooting, investigating, debugging, trial and error, and thinking.<\/p>\n<p>I was working on this particular example for most of a day but the resulting video is just 90 min which, yes, is still too long. Feel free to play at a higher speed and skip around. Next time I will try and be more aggressive in my pausing.<\/p>\n<p>So, getting to the problem itself, I have some code that processes hundreds of large CSV files, which describe a graph of the connections between our  website pages each day. It takes several minutes to load and analyze each file, and the total running time is several hours. So I want to look at trying to speed it up.<\/p>\n<p>I plan to work on these aspects:<\/p>\n<ul>\n<li>Use the profiler to look for places I can speed up my serial code.<\/li>\n<li>Use parfor on my local machine with 6 physical\/12 logical processors<\/li>\n<li>Make sure my filenames work on Windows and Linux<\/li>\n<li>Get it working on a 128 processors network Linux cluster<\/li>\n<\/ul>\n<p>Features covered in this <a href=\"https:\/\/blogs.mathworks.com\/videos\/2015\/10\/29\/matlab-code-along-videos\/\">code-along<\/a> style video include:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.mathworks.com\/help\/parallel-computing\/parfor.html\">parfor<\/a><\/li>\n<li><a href=\"https:\/\/www.mathworks.com\/help\/parallel-computing\/parallel.pool.dataqueue.html\">dataqueue<\/a><\/li>\n<li>Source Control<\/li>\n<\/ul>\n<p><div class=\"row\"><div class=\"col-xs-12 containing-block\"><div class=\"bc-outer-container add_margin_20\"><videoplayer><div class=\"video-js-container\"><video data-video-id=\"6326750252112\" data-video-category=\"blog\" data-autostart=\"false\" data-account=\"62009828001\" data-omniture-account=\"mathwgbl\" data-player=\"rJ9XCz2Sx\" data-embed=\"default\" id=\"mathworks-brightcove-player\" class=\"video-js\" controls><\/video><script src=\"\/\/players.brightcove.net\/62009828001\/rJ9XCz2Sx_default\/index.min.js\"><\/script><script>if (typeof(playerLoaded) === 'undefined') {var playerLoaded = false;}(function isVideojsDefined() {if (typeof(videojs) !== 'undefined') {videojs(\"mathworks-brightcove-player\").on('loadedmetadata', function() {playerLoaded = true;});} else {setTimeout(isVideojsDefined, 10);}})();<\/script><\/div><\/videoplayer><\/div><\/div><\/div><\/p>\n<p>Play the video in full screen mode for a better viewing experience.<\/p>\n","protected":false},"excerpt":{"rendered":"<div class=\"thumbnail thumbnail_asset asset_overlay video\"><a href=\"https:\/\/blogs.mathworks.com\/videos\/2023\/05\/04\/speeding-up-a-large-file-processing-job-with-parfor-on-a-cluster\/?dir=autoplay\"><img decoding=\"async\" src=\"https:\/\/cf-images.us-east-1.prod.boltdns.net\/v1\/static\/62009828001\/a3014052-8798-4064-a579-c0d394ea642a\/16e941f2-63e2-4433-a7e8-b75f281e377e\/1280x720\/match\/image.jpg\" onError=\"this.style.display ='none';\"\/><\/p>\n<div class=\"overlay_container\">\n      <span class=\"icon-video icon_color_null\"><time class=\"video_length\">94:07<\/time><\/span>\n      <\/div>\n<p>      <\/a><\/div>\n<p>This video uses a different recording style from my others. Rather than recording continuously while I work, I pause recording when my code changes are taking a long time to execute or I have some&#8230; <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/videos\/2023\/05\/04\/speeding-up-a-large-file-processing-job-with-parfor-on-a-cluster\/\">read more >><\/a><\/p>\n","protected":false},"author":133,"featured_media":5941,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[27,37],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/posts\/5917"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/users\/133"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/comments?post=5917"}],"version-history":[{"count":10,"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/posts\/5917\/revisions"}],"predecessor-version":[{"id":5950,"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/posts\/5917\/revisions\/5950"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/media\/5941"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/media?parent=5917"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/categories?post=5917"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/tags?post=5917"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}