Speeding-Up a Large File Processing Job with Parfor on a Cluster
This video uses a different recording style from my others. Rather than recording continuously while I work, I pause recording when my code changes are taking a long time to execute or I have some repetitive editing tasks. The pausing of the recording effectively edits my video down to a shorter duration.
This lets me show you real projects and problems that typically take many hours to solve, ones that involve lots of troubleshooting, investigating, debugging, trial and error, and thinking.
I was working on this particular example for most of a day but the resulting video is just 90 min which, yes, is still too long. Feel free to play at a higher speed and skip around. Next time I will try and be more aggressive in my pausing.
So, getting to the problem itself, I have some code that processes hundreds of large CSV files, which describe a graph of the connections between our website pages each day. It takes several minutes to load and analyze each file, and the total running time is several hours. So I want to look at trying to speed it up.
I plan to work on these aspects:
- Use the profiler to look for places I can speed up my serial code.
- Use parfor on my local machine with 6 physical/12 logical processors
- Make sure my filenames work on Windows and Linux
- Get it working on a 128 processors network Linux cluster
Features covered in this code-along style video include:
Play the video in full screen mode for a better viewing experience.
댓글
댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.