The MATLAB Blog

Practical Advice for People on the Leading Edge

Parallel computing in MATLAB: Have you tried ThreadPools yet?

Give ThreadPool a try

If you have some parallel MATLAB code and want to try something that may make it go faster then I suggest opening your Parallel pool as follows before running your code.
parpool("Threads")
The result will probably be one of the following:
  • Your code goes faster than it did before
  • It's pretty much the same speed as it was before
  • You get an error message
Whatever the result, you may be wondering what's happening here. That's what this post is all about.

The default parallel pool in MATLAB is a ProcessPool

Before you start using parallel constructs such as parfor and parfeval from Parallel Computing Toolbox, you need to create a pool of workers and one way of doing this is to use the parpool function.
defaultPool = parpool() % This opens a parallel pool using the default profile
Starting parallel pool (parpool) using the 'Processes' profile ... Connected to parallel pool with 8 workers. defaultPool = ProcessPool with properties: Connected: true NumWorkers: 8 Busy: false Cluster: Processes (Local Cluster) AttachedFiles: {} AutoAddClientPath: true FileStore: [1x1 parallel.FileStore] ValueStore: [1x1 parallel.ValueStore] IdleTimeout: 30 minutes (30 minutes remaining) SpmdEnabled: true
Out of the box, the default type of parallel pool in MATLAB is a ProcessPool and we can see that this is the type of environment that has been created for me above. In a ProcessPool, each of the workers have their own process and so are completely independent and isolated from each other. They have their own memory, for example, and cannot see each other's memory
A way of visualizing a ProcessPool is shown below
process_workers.png
Back in R2020a, a new type of parallel pool was made available in MATLAB where each worker is a thread instead of a process. Naturally enough, one of these is referred to as a ThreadPool. Workers in a ThreadPool are much less isolated, existing in the same process which means, among other things, that they share the same memory
thread_workers.png
Let's close down our existing ProcessPool and open up a ThreadPool
delete(defaultPool) % Close down existing parallel pool
Parallel pool using the 'Processes' profile is shutting down.
threadPool = parpool("Threads")
Starting parallel pool (parpool) using the 'Threads' profile ... Connected to parallel pool with 8 workers.
threadPool =
ThreadPool with properties: NumWorkers: 8 Busy: false FileStore: [1x1 parallel.FileStore] ValueStore: [1x1 parallel.ValueStore]
From the output of parpool alone, you can see that there are differences between ThreadPools and ProcessPools. A ThreadPool doesn't have an IdleTimeout, for example. Open one up, walk away from MATLAB for as long as you like, and when you return the ThreadPool will still be open. A ProcessPool, on the other hand, will close down if you do nothing with it for 30 minutes.
Let's close down our ThreadPool and look at some of the other differences between these pool types.
delete(threadPool) % Close down the parallel pool
Parallel pool using the 'Threads' profile is shutting down.

ThreadPools start up faster than ProcessPools

Threads are more lightweight than processes and one consequence of this is that it is quicker to start a ThreadPool than a ProcessPool.
tic
processPool = parpool("Processes")
Starting parallel pool (parpool) using the 'Processes' profile ... Connected to parallel pool with 8 workers. processPool = ProcessPool with properties: Connected: true NumWorkers: 8 Busy: false Cluster: Processes (Local Cluster) AttachedFiles: {} AutoAddClientPath: true FileStore: [1x1 parallel.FileStore] ValueStore: [1x1 parallel.ValueStore] IdleTimeout: 30 minutes (30 minutes remaining) SpmdEnabled: true
processTime = toc
processTime = 28.7807
delete(processPool)
Parallel pool using the 'Processes' profile is shutting down.
You may wonder why I am always deleting pools before moving on. This is because creating concurrent parallel pools in the same MATLAB session is not supported. Let's now time how long it takes to create a ThreadPool
tic
threadPool = parpool("Threads")
Starting parallel pool (parpool) using the 'Threads' profile ... Connected to parallel pool with 8 workers.
processPool =
ThreadPool with properties: NumWorkers: 8 Busy: false FileStore: [1x1 parallel.FileStore] ValueStore: [1x1 parallel.ValueStore]
threadTime = toc
threadTime = 0.0992
delete(threadPool)
Parallel pool using the 'Threads' profile is shutting down.
Let's plot the results.
barh([processTime,threadTime])
yticklabels(["Processes","Threads"]);
xlabel('Pool startup time (s)')
fprintf("It is %.2f times faster to create a Threads pool than a Processes pool\n",processTime/threadTime)
It is 290.20 times faster to create a Threads pool than a Processes pool
Your mileage may vary but I am sure that your overall result will be the same as mine: It is significantly faster to create a ThreadPool than a ProcessPool.

Code run on a Threads pool is often faster, primarily because of shared memory

All of the workers in a ProcessPool have their own, independent piece of memory. If you want a worker to see something from elsewhere, either from the main MATLAB process or from another worker, the data needs to be sent to it. You often don't realize this because convenience functions such as parfor do the work for you automatically but you can see it with ticBytes/tocBytes which measures memory movement between client and workers.
Consider this function
function [t,out] = poolExample(n)
data = rand(n);
timer = tic();
parfor i = 1:n
out(i) = data(i,:) * data(:,i);
end
t = toc(timer);
end
We open a ProcessPool
processPool = parpool("Processes");
Starting parallel pool (parpool) using the 'Processes' profile ... Connected to parallel pool with 8 workers.
and run the code
ticBytes(processPool)
[processPoolTime,result] = poolExample(2000);
tocBytes(processPool)
BytesSentToWorkers BytesReceivedFromWorkers __________________ ________________________ 1 3.201e+07 4357 2 3.2012e+07 6318 3 3.2007e+07 2105 4 3.202e+07 15634 5 3.2009e+07 3802 6 3.2007e+07 2105 7 3.2007e+07 1625 8 3.2012e+07 6422 Total 2.5608e+08 42368
The time taken to run the function on the ProcessPool is
fprintf("Parallel loop completed on Process pool in %.2f seconds\n",processPoolTime)
Parallel loop completed on Process pool in 0.35 seconds
delete(processPool)
Parallel pool using the 'Processes' profile is shutting down.
In this example, the entire array data is sent from the client MATLAB process to every worker. The array is of size 2000*2000*8 = 3.2e+07 and I have 8 workers so there is 2.56e+08 bytes, almost quarter of a gigabyte, of data movement. Such data movement takes time. The tragedy is that each of these copies of data is identical and are never changed in the body of the loop so all of that time is due to wasted effort.
With a Threads pool, memory is shared between workers which vastly reduces the movement of data between them. As such, parallel code run on a Threads pool is often faster than that on a process pool since you don't need to pay data movement costs.
I can show you that the code is indeed faster on a ThreadPool:
threadsPool = parpool("Threads");
Starting parallel pool (parpool) using the 'Threads' profile ... Connected to parallel pool with 8 workers.
[threadPoolTime,result] = poolExample(2000);
fprintf("Parallel loop completed on Threads pool in %.2f seconds\n",threadPoolTime);
Parallel loop completed on Threads pool in 0.03 seconds
fprintf("The loop ran %.2fx faster on a Threads pool\n",processPoolTime/threadPoolTime);
The loop ran 10.71x faster on a Threads pool
and I can tell you that the primary reason why this code is faster on a is because of shared memory but what I can't do is demonstrate this because ticBytes/tocBytes don't work in a ThreadPool:
ticBytes(threadsPool)
[threadPoolTime,result] = poolExample(2000);
tocBytes(threadsPool)
Warning: Thread-based pools do not support measuring bytes sent to or received from workers.
BytesSentToWorkers BytesReceivedFromWorkers __________________ ________________________ Total NaN NaN
which leads us to our next point.

There are some MATLAB features that don't work on ThreadPools, which is why they are not the default yet

I have many conversations with MATLAB users that go like this:
Them: "Can you help me make my parallel MATLAB code go faster please"?
Me: "Sure. I'm a bit busy right now but in the meantime just try a ThreadPool"
Them: "Wow! It went faster. Why isn't this the default?"
and the reason why it isn't the default is that a ThreadPool doesn't yet cover all of the MATLAB language. This is something that we are actively working on of course and if you take a look at the release notes for parallel computing you'll see that, at the time of writing, most recent releases include some additional work related to ThreadPool.
Out of the box, we've decided that the default should be the pool type that supports the most stuff. If you disagree with this thinking then you can change the default pool type for your own MATLAB via Parallel Preferences.

A ThreadPool can't scale to more than one compute node

It's tempting to consider ProcessPool as legacy. Sooner or later, you think to yourself, all of the MATLAB language will be supported by ThreadPool, MathWorks will make it the default and we'll never look back. ThreadPool for ever!
However, there will always be one very important use case for ProcessPool and that is on multi-node High Performance Computing (HPC) systems. Since compute nodes on a HPC cluster are independent but connected machines with their own memory etc, you can only use a ProcessPool if you want to scale your code to run on multiple nodes. You can't use a shared memory ThreadPool because there is no shared memory between nodes.
Loosely speaking I think of a ThreadPool as being 'OpenMP-like' and ProcessPool as being 'MPI-like'. If that statement makes no sense at all to you, don't worry about it, just know that the key to scaling MATLAB code to multiple nodes and thousands of CPU cores is ProcessPools.

Resources and further reading

|
  • print

コメント

コメントを残すには、ここ をクリックして MathWorks アカウントにサインインするか新しい MathWorks アカウントを作成します。

Loading...
Go to top of page