# Using parfor Loops: Getting Up and Running81

Posted by Loren Shure,

Today I’d like to introduce a guest blogger, Sarah Wait Zaranek, who is an application engineer here at The MathWorks. Sarah previously has written about speeding up code from a customer to get acceptable performance. She again will be writing about speeding up MATLAB applications, but this time her focus will be on using the parallel computing tools.

### Introduction

I wanted to write a post to help users better understand our parallel computing tools. In this post, I will focus on one of the more commonly used functions in these tools: the parfor-loop.

This post will focus on getting a parallel code using parfor up and running. Performance will not be addressed in this post. I will assume that the reader has a basic knowledge of the parfor-loop construct. Loren has a very nice introduction to using parfor in one of her previous posts. There are also some nice introductory videos.

Note for clarity : Since Loren's introductory post, the toolbox used for parallel computing has changed names from the Distributed Computing Toolbox to the Parallel Computing Toolbox. These are not two separate toolboxes.

### Method

In some cases, you may only need to change a for-loop to a parfor-loop to get their code running in parallel. However, in other cases you may need to slightly alter the code so that parfor can work. I decided to show a few examples highlighting the main challenges that one might encounter. I have separated these examples into four encompassing categories:

• Independence
• Globals and Transparency
• Classification
• Uniqueness

### Background on parfor-loops

In a parfor-loop (just like in a standard for-loop) a series of statements known as the loop body are iterated over a range of values. However, when using a parfor-loop the iterations are run not on the client MATLAB machine but are run in parallel on MATLAB workers.

Each worker has its own unique workspace. So, the data needed to do these calculations is sent from the client to workers, and the results are sent back to the client and pieced together. The cool thing about parfor is this data transfer is handled for the user. When MATLAB gets to the parfor-loop, it statically analyzes the body of the parfor-loop and determines what information goes to which worker and what variables will be returning to the client MATLAB. Understanding this concept will become important when understanding why particular constraints are placed on the use of parfor.

### Opening the matlabpool

Before looking at some examples, I will open up a matlabpool so I can run my loops in parallel. I will be opening up the matlabpool using my default local configuration (i.e. my workers will be running on the dual-core laptop machine where my MATLAB has been installed).

if matlabpool('size') == 0 % checking to see if my pool is already open
matlabpool open 2
end
Starting matlabpool using the 'local' configuration ... connected to 2 labs.


Note : The 'size' option was new in R2008b.

### Independence

The parfor-loop is designed for task-parallel types of problems where each iteration of the loop is independent of each other iteration. This is a critical requirement for using a parfor-loop. Let's see an example of when each iteration is not independent.

type dependentLoop.m
% Example of a dependent for-loop
a = zeros(1,10);

parfor it = 1:10
a(it) = someFunction(a(it-1));
end


Checking the above code using M-Lint (MATLAB's static code analyzer) gives a warning message that these iterations are dependent and will not work with the parfor construct. M-Lint can either be accessed via the editor or command line. In this case, I use the command line and have defined a simple function displayMlint so that the display is compact.

output = mlint('dependentLoop.m');
displayMlint(output)
The PARFOR loop cannot run due to
the way variable 'a' is used.

In a PARFOR loop, variable 'a' is
indexed in different ways,
potentially causing dependencies
between iterations.



Sometimes loops are intrinsically or unavoidably dependent, and therefore parfor is not a good fit for that type of calculation. However, in some cases it is possible to reformulate the body of the loop to eliminate the dependency or separate it from the main time-consuming calculation.

### Globals and Transparency

All variables within the body of a parfor-loop must be transparent. This means that all references to variables must occur in the text of the program. Since MATLAB is statically analyzing the loops to figure out what data goes to what worker and what data comes back, this seems like an understandable restriction.

Therefore, the following commands cannot be used within the body of a parfor-loop : evalc, eval, evalin, and assignin. load can also not be used unless the output of load is assigned to a variable name. It is possible to use the above functions within a function called by parfor, due to the fact that the function has its own workspace. I have found that this is often the easiest workaround for the transparency issue.

Additionally, you cannot define global variables or persistent variables within the body of the parfor loop. I would also suggest being careful with the use of globals since changes in global values on workers are not automatically reflected in local global values.

### Classification

A detailed description of the classification of variables in a parfor-loop is in the documentation. I think it is useful to view classification as representing the different ways a variable is passed between client and worker and the different ways it is used within the body of the parfor-loop.

Challenges with Classification

Often challenges arise when first converting for-loops to parfor-loops due to issues with this classification. An often seen issue is the conversion of nested for-loops, where sliced variables are not indexed appropriately.

Sliced variables are variables where each worker is calculating on a different part of that variable. Therefore, sliced variables are sliced or divided amongst the workers. Sliced variables are used to prevent unneeded data transfer from client to worker.

Using parfor with Nested for-Loops

The loop below is nested and encounters some of the restrictions placed on parfor for sliced variables.

type parforNestTry.m
A1 = zeros(10,10);

parfor ix = 1:10
for jx = 1:10
A1(ix, jx) = ix + jx;
end
end

output = mlint('parforNestTry.m');
displayMlint(output);
The PARFOR loop cannot run due to
the way variable 'A1' is used.

Valid indices for 'A1' are
restricted in PARFOR loops.



In this case, A1 is a sliced variable. For sliced variables, the restrictions are placed on the first-level variable indices. This allows parfor to easily distribute the right part of the variable to the right workers.

The first level indexing ,in general, refers to indexing within the first set of parenthesis or braces. This is explained in more detail in the same section as classification in the documentation.

One of these first-level indices must be the loop counter variable or the counter variable plus or minus a constant. Every other first-level index must be a constant, a non-loop counter variable, a colon, or an end.

In this case, A1 has an loop counter variable for both first level indices (ix and jx).

The solution to this is make sure a loop counter variable is only one of the indices of A1 and make the other index a colon. To implement this, the results of the inner loop can be saved to a new variable and then that variable can be saved to the desired variable outside the nested loop.

A2 = zeros(10,10);

parfor ix = 1:10
myTemp = zeros(1,10);
for jx = 1:10
myTemp(jx) = ix + jx;
end
A2(ix,:) = myTemp;
end

You can also solve this issue by using cells. Since jx is now in the second level of indexing, it can be an loop counter variable.

A3 = cell(10,1);

parfor ix = 1:10
for jx = 1:10
A3{ix}(jx) = ix + jx;
end
end

A3 = cell2mat(A3);

I have found that both solutions have their benefits. While cells may be easier to implement in your code, they also result in A3 using more memory due to the additional memory requirements for cells. The call to cell2mat also adds additional processing time.

A similar technique can be used for several levels of nested for-loops.

### Uniqueness

Doing Machine Specific Calculations

This is a way, while using parfor-loops, to determine which machine you are on and do machine specific instructions within the loop. An example of why you would want to do this is if different machines have data files in different directories, and you wanted to make sure to get into the right directory. Do be careful if you make the code machine-specific since it will be harder to port.

% Getting the machine host name

[~,hostname] = system('hostname');

% If the loop iterations are the same as the size of matlabpool, the
% command is run once per worker.

parfor ix = 1:matlabpool('size')
[~,hostnameID{ix}] = system('hostname');
end

% Can then do host/machine specific commands
hostnames = unique(hostnameID);
checkhost = hostnames(1);

parfor ix = 1:matlabpool('size')
[~,myhost] = system('hostname');
if strcmp(myhost,checkhost)
display('On Machine 1')
else
display('NOT on Machine 1')
end
end
On Machine 1
On Machine 1


In my case since I am running locally -- all of the workers are on the same machine.

Here's the same code running on a non-local cluster.

matlabpool close
matlabpool open speedy
parfor ix = 1:matlabpool('size')
[~,hostnameID{ix}] = system('hostname');
end

% Can then do host/machine specific commands
hostnames = unique(hostnameID);
checkhost = hostnames(1);

parfor ix = 1:matlabpool('size')
[~,myhost] = system('hostname');
if strcmp(myhost,checkhost)
display('On Machine 1')
else
display('NOT on Machine 1')
end
end
Sending a stop signal to all the labs ... stopped.
Starting matlabpool using the 'speedy' configuration ... connected to 16 labs.
On Machine 1
On Machine 1
On Machine 1
NOT on Machine 1
On Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1


Note: The ~ feature is new in R2009b and discussed as a new feature in one of Loren's previous blog posts.

Doing Worker Specific Calculations

I would suggest using the new spmd functionality to do worker specific calculations. For more information about spmd, check out the documentation.

Clean up

matlabpool close
Sending a stop signal to all the labs ... stopped.


Tell me about some of the ways you have used parfor-loops or feel free to post questions regarding non-performance related issues that haven't been addressed here. Post your questions and thoughts here.

Get the MATLAB code

Published with MATLAB® 7.9

### Note

Loren replied on : 1 of 81

Ellis-

You can’t control the order of the parfor at all. Why is their an efficiency of uniform rand order of processing for you situation? arrayfun is not parallelized. parfor IS the mechanism for parallel for-loop constructs that have each loop independent.

–Loren

Ellis D. Cooper replied on : 2 of 81

It says somewhere about parfor that the order in which iterations are performed is not guaranteed. How random is that order? In my application I would very much like the order in which they are performed to be uniformly distributed. How can I get the efficiency advantages of parfor and also – in my case – the advantages of uniform random order of processing?

A related question is that the arrayfun function presumably is coded at a lower level as a for loop. Is there a way to parallelize arrayfun, or should I not use it and just rely on parfor (subject to the randomness requirement above)?

Ninad replied on : 3 of 81

I thought that following code should work fine

global test;

test.val=10;

parfor i=1:10
global_test_function(i);
end


________________________________________

function global_test_function(i)
global test;
fprintf('%d\n',i*test.val);


But it does not..

I get error

??? Error using ==> parallel_function at 594
Error in ==> global_test_function at 3
Attempt to reference field of non-structure array.

Error in ==> global_test_script at 7
parfor i=1:10

Can you please explain me if/how I can get around this?

Ellis D. Cooper replied on : 4 of 81

Loren – Thank you for your replies.

In my application two or more different iterations of the loop could assign different numbers to a certain variable. Thus, the last one would be the only permanent result. That would be okay provided the last one is chosen uniformly at random from among the alternatives.

Okay, so I will use parfor instead of arrayfun, provided there is a neat way to achieve that required uniform randomness.

Ellis

Sarah Z replied on : 5 of 81

So I should have been more clear and complete in my statement about the use of globals. Good catch!

Global values are not updated or transferred between workers and client. However, you can within a function called by a parfor define and use globals in each worker’s workspace.

A possible work around for your code sample would be the following. In this example, each worker can use that global value in its respective workspace and therefore in all the functions it calls.


global test;

test.val=10;

parfor i=1:10
globalBridge(test, i);
end

------------------------------

function globalBridge(testVar, i)
% Sets all the necessary global variables before calling function
% that uses them.
global test
test = testVar;
global_test_function(i);
end



By using the variable test in the parfor body, it is clear that the variable test is needed on the workers. MATLAB will only transfer the value of test once to each worker per calling of the parfor loop.

If you call the parfor loop multiple times, the value of the variable test is always propagated from the client to the workers. Therefore, the workers will not accidentally get out of sync.

Cheers,
Sarah

Amar replied on : 6 of 81

Hello folks,

I am new to the parallel processing toolbox. I ran the following code and found that “parfor” is infact taking more time to run than simply running “for”. Am I missing something terribly? I would appreciate your thoughts on this.

thnaks
Amar

TestCode:

matlabpool open 4;
Starting matlabpool using the ‘local’ configuration … connected to 4 labs.

b=cell(length(10000),1);

>> tic; for i=1:100000, b{i}=rand(1,1000);
end;toc
Elapsed time is 1.619394 seconds.

>> tic; parfor i=1:100000, b{i}=rand(1,1000);
end;toc
Elapsed time is 5.625785 seconds.

Loren replied on : 7 of 81

Amar-

parfor is unlikely to be faster than for in your situation where there is little work being done in each pass through the loop. The savings happens when the work is significantly more than the overhead of the loop itself.

–Loren

Amar replied on : 8 of 81

Aha! That explains it. Thanks!

Sarah Z replied on : 9 of 81

Amar-

Loren is correct (as usual :)). Sometimes for large data and a short running loop – the time it takes to transfer the data overwhelms the gain you get my putting the loop in parallel. You can check this by running the loop on 1 worker with and without data transfer. See code sample below:


b=cell(length(10000),1);

tic;
for i=1:10000
b{i}=rand(1,1000);
end;
toc

matlabpool open 1

tic;
parfor i=1:10000
b = rand(1,1000);
end;
toc

tic;
parfor i=1:10000
b{i} = rand(1,1000);
end;
toc

matlabpool close



My resulting times were as follows:
Elapsed time is 0.338584 seconds.
Elapsed time is 0.387303 seconds.
Elapsed time is 1.057302 seconds.

We see then that the data transfer is the issue. For a more realistic example, I would run a longer running calculation within the loop. A simple thing to do is to add a pause command.

Also, there is a nice section in the documentation regarding performance.

Cheers,
Sarah

Steve L replied on : 10 of 81

Ellis,

I just noticed your question in comment 4 didn’t appear to be addressed. You said that “In my application two or more different iterations of the loop could assign different numbers to a certain variable. Thus, the last one would be the only permanent result.”

That means your application is NOT suited for use with PARFOR, as Sarah called out in the Independence section of this blog posting, because the result of the loop would depend on the order in which the loop applications were executed.

You could probably make your application suitable for PARFOR, by having each iteration assign a value to a particular element of an array or cell array and choosing one element at random from that array/cell array once the loop is finished, outside the PARFOR.

Martin replied on : 11 of 81

Is there any way to find the ID of a the currently executing worker (a la “labindex”) within a parfor loop?

The reason I ask is that I have a parfor loop, but I only want one of the workers to use my GPU for calculations. I’d like the other workers to continue on the CPUs.

Anyone have any ideas how I might go about getting this to work?

Thanks!

Sarah Z replied on : 12 of 81

Martin.

Ideally, I would the spmd construct for that. That is what it is really designed to do.

But if you really want to do it within a parfor loop, you can do the following:





Each worker will have a different task ID – one task will exist for each worker if you are using a matlabpool.

Cheers,
Sarah

Angie replied on : 13 of 81

another parfor problem, which is not mentioned here… how to call a function from inside the parfor loop? I know… “The body of a parfor-loop cannot make reference to a nested function. However, it can call a nested function by means of a function handle.” … but how to call it using the function handle? I have tried something like

fce=@test1;
parfor i=1:10
result(i)=feval(fce,i);
end


(where fce is my own function, in the same directory as the script which calls the function), but the error message “Undefined function or method ‘test1’ for input arguments of type ‘double’.” displays.
I think it is because the file dependencies are not set. How can I do it?

Sean replied on : 14 of 81

It seems parfor has issues with temporary variables that exist as structs:

k = [1 2 3];
parfor h = 1:3
z.var = k(h)
disp(z.var)
end


Returns the error “The variable z in a parfor cannot be classified.” In this simplistic example I feel like z should definitely be a temporary variable. If I remove the “.var” portion, the loop works perfectly. Is parfor incapable of segmenting structures?

Sarah Z replied on : 15 of 81

Sean,

You can use structures, but you can only access them in particular ways.

Here is a code tidbit explaining what you can and can’t do – and how to convert between those types of structures.


a.field1 = rand(10,1);
a.field2 = rand(10,1);

% Converting a into a structure you can use (b)
b = cell2struct(num2cell([a.field1 a.field2]), {'field1','field2'}, 2);

% parfor will not let you index into a's fields
% parfor ii = 1:10
%    a.field1(ii) = a.field1(ii)*rand;
%    a.field2(ii) = a.field2(ii)*rand;
% end

% But it will let you index into b if b is a structure
% array

parfor ii = 1:10
b(ii).field1 = b(ii).field1*rand;
b(ii).field2 = b(ii).field2*rand;
end

%% Convert b back into a
a.field1 = [b.field1]';
a.field2 = [b.field2]';



Note: b will use more memory than a – so be careful of that if your calculation is memory intensive.

Let me know if this example isn’t clear.

Cheers,
Sarah

Sarah Z replied on : 16 of 81

Angie.

Here is an example of using the nested function via a function handle. However, I believe you must explicitly pass data into your function handle. I don’t believe the benefit of the nested function seeing the calling function’s workspace exists when you call it within a parfor loop. I did not have to add any FileDependencies when running this on my local workers.


function testMain
%The body of a parfor-loop cannot make reference to a
% nested function.
%However, it can call a nested function by means of a % function handle.

fce=@myNest;
result = zeros(1,10);

parfor ii=1:10
result(ii) = fce(ii);
end

disp(result)

function d = myNest(ii)
d = rand*ii;
end
end



Let me know if you have any follow up questions.

Cheers,
Sarah

Sarah Zaranek replied on : 17 of 81

Hi Angie.

I mispoke about having access to workspace variables. Although, your example will still not see ii without explictly passing it in. The following example will work as expected with respect to the variable a since the current value a is saved with the function handle.

function testMain
%The body of a parfor-loop cannot make reference to a
% nested function.
%However, it can call a nested function by means of a % function handle.

a = 5;
fce=@myNest;
result = zeros(1,10);

parfor ii=1:10
result(ii) = fce();
end

disp(result)

function d = myNest
d = a*rand;
end
end



For your above example – the following should work and not give an error.


function testMain
%The body of a parfor-loop cannot make reference to a
% nested function.
%However, it can call a nested function by means of a % function handle.

fce=@myNest;
result = zeros(1,10);

parfor ii=1:10
result(ii) = feval(fce,ii);
end

disp(result)

function d = myNest(ii)
d = rand*ii;
end
end



Cheers,
Sarah

Rakesh replied on : 18 of 81

I am having trouble with using the variables/matrices created in a for loop within a parfor loop.
for example, when the following script is executed, the matrix A is not available in my base workspace.

I understand that parfor is creating a workspace of variables to be used in the loop on every worker, but arent the results supposed to be sent back to the base workspace when I run the script?

Rakesh replied on : 19 of 81

oops forgot to wrap the code

clc; clear all;
matlabpool open 4

parfor ix1 = 1:10
A = zeros(1,10);
for ix3 = 1:10
A(ix3) = ix3;
end
end

matlabpool close

Sean replied on : 20 of 81

Your example was very helpful, but up to a point. In both my example, and in my actual code, I’m not actually trying to index into a field. Note all I want to do is assign z.var = k, not z.var(x) = k. Is there some implicit indexing applied? In my real code there is an operation like

variable.field1 = result;
variable.field2 = parameter1;
variable.field3 = parameter2;


etc. This is a separate variable for each instance of the variable. I’ve tried creating the variable using struct() before the loop begins, and that didn’t help. It still says it can’t classify the variable. Oddly enough, if you create a function that does the exact same operations and call that function from inside the loop, it works just fine. Is this a bug or a quirk of MATLAB’s distributed toolbox?

Sean replied on : 21 of 81

To amend my previous comment, I found that it does indeed work if I instantiate the struct explicitly using struct() inside the parfor loop and all the assignments are contained in just that loop. I guess I just need to pay more attention to how I create variables when doing parallelization, since it requires so much more explicit variable contexts!

Sarah Zaranek replied on : 22 of 81

Hello Rakesh,

In order to bring your data back to the client workspace, it either needs to be classified as a reduction variable or a sliced variable. The link above to the documentation on classification of variables can give you more in depth information on this (see blog section on classification). In a nutshell, you either need to be indexing your variable by the iterate variable for the parfor-loop or performing a reduction operation on that variable. The section on classification will have a master list of all supported reduction operations.

Using the code in your comment – I have adjusted it in two ways to bring back the data. In the version you sent me, only the values of A from the last iteration will be kept. I included one version where all values of A were kept – and one where only the last ones were kept.


%% In this case A will not be brought back

parfor ix1 = 1:10
A = zeros(1,10);
for ix3 = 1:10
A(ix3) = ix3;
end
end

clear ix3
%% In this case A will be brought back for each iteration

parfor ix1 = 1:10
A = zeros(1,10);
for ix3 = 1:10
A(ix3) = ix3;
end
Akept2(ix1,:) = A;
end

display(Akept2)
clear ix3

%% In this case A will only be brought back for last iteration
clear all

Akept3 = [];

parfor ix1 = 1:10
A = zeros(1,10);

for ix3 = 1:10
A(ix3) = ix3;
end

if ix1 < 10
A = [];
end

Akept3 = [Akept3 ; A];
end

display(Akept3)


Hopefully this is useful to you.
Cheers,
Sarah

Angie replied on : 23 of 81

Sarah Z,
thank you about your example about structures. I tried to do it, but what is the best solution, when you have the structure of the arrays of different size? Do I have to make an individual cell array for every field of the structure?

Sarah Z replied on : 24 of 81

Angie,

So, it probably depends on what you want to do (do you need to use them as structures or would you happy enough indexing them as separate variables or as a cell?, etc).

Here is one example where the fields are modified so they are the same size. If you could give me an example – it might be easier to figure out what might be the best option for you.

% Filling cell arrays with empty elements so they are the same size:

a.field1 = rand(5,1);
a.field2 = rand(10,1);

a1 = num2cell(a.field1);
a2 = num2cell(a.field2);

a1{max(length(a2)),1} = [];

% Converting a into a structure you can use (b)
b = cell2struct(a1,a2, {‘field1′,’field2’}, 2);

Cheers & Happy Converting,
Sarah

hedayat replied on : 25 of 81

i want to parallelized this code, but i cant.
“swarm” is a s*(n+1) matrix
for j=1:s
particle{j,2} = zeros(5,n);
particle{j,2}(1,:) = 0.9;
particle{j,2}(2,:) = 0.1;
particle{j,2}(3,:) = swarm(j,1:n); %
particle{j,2}(5,:) = swarm(j,1:n); % Set BestValue
b = particle{j,2}(5,:);
particle{j,3} = zeros(2,2);
[Rulequality,TP] = qq(x,y,b,class);
particle{j,3} = zeros(2,2);
particle{j,3}(1,:) = Rulequality;
particle{j,3}(2,:) = TP;
particle{j,4} = zeros(4,1);
tmp = repmat(swarm(j,1:n),s,1);
comp = (swarm==tmp);
comp = sum(comp,2);
[a b] = sort(comp);
particle{j,4} = b(end-4:end-1);
end

Sarah Z replied on : 26 of 81

Hello hedayat,

Sorry for the delay. I have been traveling spreading the joy of MATLAB :)

So the big issue is the restrictions for indexing a sliced variable that I discussed above. Here is the original code and how to convert it to one that works. I have simplified it, but hopefully the basic ide is clear!


%%  This doesn't work with parfor in terms of sliced variable indexing
particle = cell(10,4);
swarm = ones(10,10);
s = 10;
n = 5;

for j=1:s
particle{j,2} = zeros(5,n);
particle{j,2}(1,:) = 0.9;
particle{j,2}(2,:) = 0.1;
particle{j,2}(3,:) = swarm(j,1:n); %
particle{j,2}(5,:) = swarm(j,1:n); % Set BestValue
particle{j,3} = ones(2,2);
end

particleSafe = particle;  % saving it so we can compare later

%%  This does work in terms of indexing of sliced variables
% Swarm shows a message that it is a broadcast variable, that is fine

particle = cell(10,4);
swarm = ones(10,10);
s = 10;
n = 5;

% Converting particle to particle2 format
% We want a matrix of this format:
% particle2 =  repmat({cell(1,4)},[10 1]);
% This would be a 10 x 1 cell with every element holding
% a 1 x 4 cell

particle2 = mat2cell(particle,ones(10,1),4);

parfor j=1:s
particle2{j}{2} = zeros(5,n);
particle2{j}{2}(1,:) = 0.9;
particle2{j}{2}(2,:) = 0.1;
particle2{j}{2}(3,:) = swarm(j,1:n); %
particle2{j}{2}(5,:) = swarm(j,1:n); % Set BestValue
particle2{j}{3} = ones(2,2);
end

particle2 = vertcat(particle2{:});

isequal(particle2,particleSafe)



Let me know if you have any questions on this.

Cheers,
Sarah

John replied on : 27 of 81

I’ve found parfor-loops to be very convenient (since I discovered them). I was wondering if there’s a simple way to monitor the progress of a job. For instance, this doesn’t work:

iter = 0;

parfor k = 1:N
iter = iter + 1;
fprintf('Iteration %d of %d\n',iter, N);

%Do something

end %the parfor


Matlab complains that iter is being used illegally.

On one hand, I can see why this violates the parfor-restrictions, but OTOH, the fprintf has no side-effects, and is only to help the user see what’s going on. Any obvious workarounds?

Sarah Z replied on : 28 of 81

Hi John.

So, probably the easiest way to monitor your job is using the following tool on MATLAB central:

http://www.mathworks.com/matlabcentral/fileexchange/24594-parfor-progress-monitor

It was made by one of the developers, Edric.

You could also use persistent variables to keep track of the number of iterations done on a single worker – but there is no easy way to determine that across workers.

In your example, since the values of iter only get adding up per chunks of iterations separately and then summed together at the end of the parfor, there is no way to easily get that intermediate information.

Here is an example of how it could be done with persistent variables per worker:


function myTestFunc

iter = 1;
N = 20;

parfor k = 1:N
counter = myCounter(iter);
fprintf('Iteration Total %d of %d\n',counter, N);
%Do something
end %the parfor

end

function currentTotal = myCounter(iter)

persistent testCounter

if isempty(testCounter)
testCounter = iter;
else
testCounter = testCounter + 1;

end
currentTotal = testCounter;
end



Output would be something like this:

Iteration Total 1 of 20
Iteration Total 2 of 20
Iteration Total 3 of 20
Iteration Total 4 of 20
Iteration Total 5 of 20
Iteration Total 6 of 20
Iteration Total 7 of 20
Iteration Total 1 of 20
Iteration Total 2 of 20
Iteration Total 3 of 20
Iteration Total 4 of 20
Iteration Total 5 of 20
Iteration Total 6 of 20
Iteration Total 7 of 20
Iteration Total 8 of 20
Iteration Total 9 of 20
Iteration Total 8 of 20
Iteration Total 9 of 20
Iteration Total 10 of 20
Iteration Total 10 of 20

Hope this is useful.

Cheers,
Sarah

Tobias replied on : 29 of 81

Hi Sarah,

I am quite new to Matlab and just starting to explore it. What I’ve seen so far seems really nice!!

I like to ask, how I can best modify my code (to run with parfor) if I have multiple nested funcitons.

Something like this:

for A = 1:1:36

for B = -1:-1:-40

for C = 0:1:35

for D = -0:-1:-25

for n = 1:100000
**calculation of X usinf A,B,C,D
end

Results(A,B,C,D) = X;

end
end
end
end

Thanks a lot!
Tobias

Sarah Z replied on : 30 of 81

Hi Tobias.

Two thoughts for you:

1. You can use the methods I showed above for nested for-loops. In my mind, I think that the best option would be to use the cell workaround. In your example, I am assuming you want X to be saved for every value of n even for your example it wasn’t. For best performance, you want to put the parfor loop out as far as possible. To me it makes the most conceptual sense, to put the n in the outermost loop and put the parfor loop with it. Here is how it would work for you:

parfor n = 1:100000

for A = 1:1:36

for B = -1:-1:-40

for C = 0:1:35

for D = -0:-1:-25

Results{n}(A,B,C,D) = X;

end
end
end
end
end


2. It might be possible to limit the number of internal loops using vectorization. Not only would this make your code easier to put in parallel, but it may very well speed up your serial code. An example of this can be found in my previous guest blog post here – http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/

Cheers,
Sarah

Sachin replied on : 31 of 81

Hello there,
I am trying to integrate some functions numericaly using ‘parfor’. The following is my code but it is not working. It is showing the following error

??? Error using ==> parallel_function at 598
Error in ==> syms at 77
Attempt to add "x" to a static workspace.
See MATLAB Programming, Restrictions on Assigning to Variables for details.

Error in ==> main at 16
parfor i=1:16


clear all
clc
tic
r=0.4;
t=0.7;
L=4;
E=72000;
alp=3/2;
nu=0.33;
G=E/(2*(1+nu));
h=12.8;
i=1;
j=1;
Kxx=zeros(17,17);
K=1;
for i=1:16
r=0.2+i*0.05;
for j=1:16
t=0.2+j*0.05;
J=[r,t,L];
syms x;
M=myfun(x,J);
y1=M(1);
y2=M(2);
y3=M(3);
y4=M(4);
y5=M(5);
z1=M(6);
z2=M(7);
z3=M(8);
z4=M(9);
z5=M(10);
K=abs((((6/(E*h))*(I1+I2+I3+I4+I5))+(((alp*L)/(G*h))*(I6+I7+I8+I9+I10)))^-1);
Kxx(i,:)=K;
end

end
r=0.2:0.05:1;
t=0.2:0.05:1;
surf(r,t,Kxx')
toc

function [M] = myfun(x,J)
d1=((x.^2)/(J(2)+(2*J(1))-2*sqrt(J(1)^2-(J(1)-x)^2))^3);
d2=((x.^2)/(J(2)+(2*J(1))-2*sqrt(J(1)^2-(x-J(1))^2))^3);
d3=((x.^2)/(J(2)+(2*J(1))).^3);
d4=((x.^2)/(J(2)+(2*J(1))-2*sqrt(J(1)^2-(x-(J(3)-(2*J(1))))^2))^3);
d5=((x.^2)/(J(2)+(2*J(1))-2*sqrt(J(1)^2-(x-(J(3)-(J(1))))^2))^3);
c1=((x.^2)/(J(2)+(2*J(1))-2*sqrt(J(1)^2-(J(1)-x)^2)));
c2=((x.^2)/(J(2)+(2*J(1))-2*sqrt(J(1)^2-(x-J(1))^2)));
c3=((x.^2)/(J(2)+(2*J(1))));
c4=((x.^2)/(J(2)+(2*J(1))-2*sqrt(J(1)^2-(x-(J(3)-(2*J(1))))^2)));
c5=((x.^2)/(J(2)+(2*J(1))-2*sqrt(J(1)^2-(x-(J(3)-(J(1))))^2)));

M=[d1 d2 d3 d4 d5 c1 c2 c3 c4 c5];


Sarah Z replied on : 32 of 81

Sachin,

If you use the functional form of the defining x, I believe it should work for you. syms x uses assignin, and so you run into the transparency issues as discussed above. Try the following instead –

x = sym('x','real');


I think that should fix things for you.

Cheers,
Sarah

Wanda replied on : 33 of 81

Hi Sarah,

I have the following codes for parallel computing.

global x

parfor j=1:N

paramean(j)=solode(j);

end
y=paramean*x;

function paramean=solode(j)
global x

tmax=7200; %max time

mu=0.00579827560572969*(1+0.2*x(j));
for iteration=1:iterationmax
wphases=wpha_gene(:,iteration);
para(iteration)=myfun(mu,wphases);
end
paramean=mean(para);

The error message is Attempted to access x(1); index out of bounds because numel(x)=0.

This is my first time using PARFOR. I cannot figure out what’s wrong. Could you please give me some help?

Thanks a lot!

Wanda

Sarah Zaranek replied on : 34 of 81

Hi Wanda.

So, the issue is that the worker doesn’t get passed your global variable, it is just gets defined as global. Therefore, it just thinks it is an empty variable You need MATLAB to know that you need that variable on each worker.

This is similar to another question asked earlier in this thread. If you scroll up to my respond to Ninad (comment #5), you can see how to pass it to the workers and have it still act like a global. A note of caution is that when I say still act like a global -it is global per worker, not across workers or across worker and client.

You can also not use a global in this case and pass all of x to each worker by making in another input to your function solode. It seems as if you don’t even need all of x in this example, so I think you could even just put x(j) as an extra input to that function. Then x would be treated like a sliced variable and limit your communication overhead between client and workers.

Cheers,
Sarah

Josh Dillon replied on : 35 of 81

Hi Sarah. When using a parfor for parallel computing in a multi-core scenario, I have had problems with large matrices. The data is copied to each worker causing out-of-memory failure. Is the recommended solution to use distributed arrays? Doesn’t this require communication between workers and potentially result in a bottleneck, or am I missing something?

As another possible solution, I implemented a shared memory wrapper (FEX: 28572-sharedmatrix). It uses POSIX, which potentially limits its use, although I believe many, if not most, people use Cygwin when compiling Mex code. The advantage is that the data lives outside of Matlab and can be accessed by multiple processes (not just Matlab).

Anyway, I am curious what other approaches you can recommend or approaches people have taken when each worker needs read-only access to the same data.

Thanks,
Josh

Yenny Noa replied on : 36 of 81

Hi everyone,

I am using a parfor to calculate the same function on several points of large dimensions. The problem is that the function is generated dynamically. There are some parameters (function_id, function_instance, the problem dimension, etc) that are used to differentiate between different set-ups of runs. In a same run, I want to use the same instance of the function to calculate the values of my set of vectors, so I have to generate the function before to start the parfor execution. That is:

% this line initialize the function that can be used
% through ‘fgeneric’.
fgeneric(‘initialize’, ifun, iinstance);

Then, the execution of the parfor:

parfor ind = 1:setsize
results(ind,1) = feval(‘fgeneric’, x(ind,:) ‘);
end

Why when I execute the parfor the function’fgeneric’ doesn´t exist anymore?
Please, anyone could say me how can I transfer the same function handler ‘fgeneric’ to all the workers.

I am a Cuban student, so my English could not be right at all. Sorry for that.

Cheers,

Yenny

HaiChao Zhang replied on : 37 of 81

Hi there:
When I use parfor method to calculate an LDPC code,I encounter one problem.At first,I create LDPC encode and decode object outside the parfor body,an error appear like this “Not enough input variable”.But after I move the LDPC encode and decode object into the parfor body,my code goes successful!I know the abject is different with common variables,but I don’t understand why this happen!

My code is below:

load H_180_R12.mat
[m n]=size(H);
k=n-m;

% lenc=fec.ldpcenc(H);
%
% ldec=fec.ldpcdec(H);
% ldec.DecisionType = 'hard decision';
% ldec.OutputFormat = 'whole codeword';
% ldec.NumIterations = 50;
% ldec.DoParityChecks='Yes';

matlabpool open local 2;
for i=1:1
PsdB(i)=2+1*(i-1);
%     PsdB=2;
Ps_dB=PsdB(i);

Ps=10^(Ps_dB/10);
fernum=0;

lenc=fec.ldpcenc(H);

ldec=fec.ldpcdec(H);
ldec.DecisionType = 'hard decision';
ldec.OutputFormat = 'whole codeword';
ldec.NumIterations = 50;
ldec.DoParityChecks='Yes';

msg=randint(1,k);
codeword=encode(lenc,msg);
s_bpsk=-sign(codeword-0.5);
noise=wgn(1,n,1,'linear');
recv=sqrt(Ps).*s_bpsk+noise;

llr=(2*sqrt(Ps)).*recv;
codeword_dec=decode(ldec,llr);
ber=sum(xor(codeword,codeword_dec));
if ber~=0
fernum=fernum+1;
end
end
end
matlabpool close;
figure(1);semilogy(PsdB,FER,'-*');grid on;
xlabel('Ps');ylabel('BER');
title('H\_CodeLength=180\_Rate=1/2 FER performance');

Yenny Noa replied on : 38 of 81

Hi Sarah and Loren,

Please, could you tell me please why the following code gives me this error:

??? Error using ==> parallel_function at 598
Error in ==> fgeneric at 356
fgeneric has not been initialized. Please do: fgeneric(‘initialize’, FUNC_ID,
INSTANCE_ID, DATAPATH) first, where FUNC_ID is the number of the chosen test function.

Error in ==> parPSO at 9
parfor ind = 1:3
Error in ==> testparPSO at 16
solution = parPSO(FUN, DIM);
Error in ==> run at 74
evalin(‘caller’,[script ‘;’]);

The code is as follow:

matlabpool firstConfig 2

addpath('.');  		% should point to fgeneric.m etc.

FUN.ifun = 1;
FUN.iinstance = 1;
DIM = 2;

tic
solution = parPSO(FUN, DIM);
toc

matlabpool close

% this is the parPSO function definition

function gbest = parPSO(FUN, DIM)

fgeneric('initialize', FUN.ifun, FUN.iinstance,    FUN.datapath, FUN.opt);

x = 2 * 5 * rand(3,DIM) - 5;

fgen = @fgeneric;

parfor ind = 1:3
gbest(1,ind) = feval(fgen, x(ind,:)');
end

fgeneric('finalize');

end



Please, I did all you explained before but nothing works, it gives me the same error always. I don’t know what to do.
Thanks,
Yenny

Sarah Z replied on : 39 of 81

Hi Yenny,

I was out traveling, so this is my first chance to respond to your question.

Since I don’t know exactly what fgeneric does, this is only a guess on my part. Since it doesn’t return a variable back obviously to the main MATLAB workspace my guess is that it is either changing the state/value of an object or using a global or persistent variable.

In many cases these values would not transfer over your client to your worker, the first step would be to initialize within the parfor loop and see if that makes a difference. If so, then we can alter the code so that it initializes using fgeneric only once per worker.

Cheers,
Sarah

Sarah Z replied on : 40 of 81

HaiChao Zhang,

Hello, if MATLAB can’t serialize the variable to transfer it to the workers, it ends up getting passed as an empty value. I believe this is exactly what you are seeing. I think probably this Wikipedia entry does a such better job explaining serialization than I can ( http://en.wikipedia.org/wiki/Serialization ). I see this issue with older objects in MATLAB (occasionally, but rare) and objects that come from 3rd party tools.

The workaround is to do exactly what you did, which is to initialize within the worker instead of in the client. You can modify it so that you only need to initialize once per worker, if that is ideal for your case. See code example below to do that. It loads a file for the variable it creates, but you could do something else for your variable creation.


function myTestFunc

N = 20;

parfor ii = 1:N
% Do something with the data here
end

end

end

end



Cheers,
Sarah

Yenny Noa replied on : 41 of 81

Hi Sarah,

Thanks for your answer. I had already tried what you are suggesting, that is, I changed the initialization of the ‘fgeneric’ function to within the parfor loop and it works.

The problem is that I generally make 100000 function evaluations on each run of the algorithm, so this solution is not feasible for me because this will imply 100000 of fgeneric’s initializations.

Please, what should I do to initialize only once fgeneric per worker?

Cheers,

Yenny

Sarah Z replied on : 42 of 81

Hi Yenny.

I am not sure exactly what fgeneric is doing (defining a global variable, a persistent variable, a state of a object). However, if you want to run something once per worker, the easiest way would be to do something like the following:


numWorkers = matlabpool('size');

parfor ii = 1:numWorkers
fgeneric('initialize', FUN.ifun, FUN.iinstance,    FUN.datapath, FUN.opt);
end



Depending on what exactly fgeneric is doing, this may solve your problem.

Cheers,
Sarah

Josh Dillon replied on : 43 of 81

Hi Sarah–still wondering on my question (Sept 10). This issue actually comes up a lot in our group’s work. We do large-scale machine learning and end up doing a lot of (iterative) computation over large data-sets (sparse matrices) which are too big to have in memory more than once. Under this scenario, keeping the data distributed rapidly makes the computation IO bound–despite being on the same 8core machine. Even if this is not the case, it makes implementation substantially more difficult to require one to assess which parts of the data can be carved up for “local” computation and in many circumstances it is simply not possible.

This seems like an obvious problem/deficiency in the parallel computing toolbox so I would very much like to know if I’m missing some recommended approach for dealing with this issue.

Thanks,

Josh

Sarah Z replied on : 44 of 81

Hi Josh.

Sorry for my delay in response. So, the Parallel Computing Toolbox doesn’t have built-in shared memory capabilities. I have let the development team know about your request, I have come across other customer interested in this capabilitiy.

For your case, you perhaps could use memmapfile and rely on the OS to provide shared memory access to data on disk.

Cheers,
Sarah

Josh Dillon replied on : 45 of 81

Hi Sara–thanks for the response!

My mex program “sharedmatrix” is essentially this–although I think its slightly better than a memory mapped file as POSIX provides additional functionality. Anyway, good to know that writing this program wasn’t a waste of time!

Thanks for passing on the request too!

Cheers,
Josh

Shinliang replied on : 46 of 81

Hi,

I have been trying to use parfor for code as shown below. I understand that my indexing is not sliced, but clearly the iterations are independent of each other? Why then does parfor do not allow this to run?

node = xxxx;
parfor e=1:TNELMS
hx_elem=hx(node(e,1:6));
Gx=somefunction(hx_elem);
G_globalx(node(e,1:6))=G_globalx(node(e,1:6))+Gx;
end

Basawaraj replied on : 47 of 81

Hi
I was running the following code:

for x = 1:100
d_x = d_3x(x,:);
for y = 1:10
d_y = d_3y(y,:);
parfor z = 1:10
k1 = [ d_x ; d_y ; d_3z(z,:) ]; %Where d_x, d_y & d_3z(z,:) are 1x4 double
k1 = sum(k1);
[r t] = corr_spear(k1 , mod3d_3);
if abs(r) > abs(r3d_3(z,1))
r3d_3(z,1) = r;
k3d_3(z,1) = k1;
end
end
end
end


While running the above code, I get the following error:
??? Error using ==> parallel_function at 587
Error in ==> parallel_function>make_general_channel/channel_general at 864
Subscripted assignment dimension mismatch.

Any thoughts on the cause?
I did figure out that it was something to do with “if” condition within the parfor look, but am not sure what the issue is.

Sarah Z replied on : 48 of 81

Hi Shinliang,

There are two issues I see in your code. The first is simply a warning that it is treating node like a broadcast variable instead of a sliced variables. That is an easy fix. I have described the fix below.


TNELMS = 100;
node = ones(TNELMS, 10);

parfor e=1:TNELMS
hx_elem=hx(node(e,1:6));
Gx=somefunction(hx_elem);
G_globalx(node(e,1:6))=G_globalx(node(e,1:6))+Gx;
end

%% Workarounds ---

% If node has more than 6 columns - crop it and only act on the 6 columns,
% otherwise replace with a :

parfor e=1:TNELMS
hx_elem=hx(node(e,:));
Gx=somefunction(hx_elem);
end

% OR
nodec = node(:,1:6);

parfor e=1:TNELMS
hx_elem=hx(nodec(e,:));
Gx=somefunction(hx_elem);
end

node(:,1:6) = nodec;



The bigger issue is with the function G_globalx. MATLAB has no idea that you are indexing into it uniquely for each iteration. node(e,1:6) could always return the same value, breaking the independence of each loop.

You must be able to index it like described in the blogpost above. I can try to help you with that.

Cheers,
Sarah

Sarah Z replied on : 49 of 81

Basawaraj,

Looks like the code doesn’t work even with a for-loop. I believe the issue is the the sum(k1) currently returns 1 x 4 matrix. I am not sure if this fits what you are trying to do, but if you create a 1 x 12 matrix to sum instead of a 3 x 4 matrix to sum – you get the correct dimensions. See code example below. The code then would work both with a for loop and a parfor loop.



d_3x = rand(100,4);
d_3y = rand(10,4);
d_3z = rand(10,4);
r3d_3 = rand(10,4);
k3d_3 = rand(10,4);

for x = 1:100
d_x = d_3x(x,:);

for y = 1:10
d_y = d_3y(y,:);

parfor z = 1:10

k1 = [ d_x ,d_y , d_3z(z,:) ];
%Where d_x, d_y & d_3z(z,:) are 1x4 double
k1 = sum(k1);
%[r t] = corr_spear(k1 , mod3d_3);
% don't have function, so set r equal to
% random number
r = rand;
if abs(r) < abs(r3d_3(z,1))
r3d_3(z,1) = r;
k3d_3(z,1) = k1;
end

end
end
end



Cheers,
Sarah

Dhruv Singh replied on : 50 of 81

Hi Sarah,

First of all, let me start by saying this is a very terse and well put together blog and thanks for putting it together.. I got most of my questions answered in your explanations and the following questions…

I would like to know if we can make use of PBS job scheduling and exploit more than 8 processors but the distributed licensing restriction does not allow that. Is there any workaround?

Thanks,
Dhruv

sudeendra replied on : 52 of 81

HI,

I am using parfor loop in my application. The input data file that i have is huge file in the order of GB’s. The data is used in independent chunks across loop iterations.Inside the parfor loop i am using the file operations like fseek and fread from that file.

I know that is a FILE operations are a overhead in this parfor but is there any method to use this efficently inside parfor.

Sarah Wait Zaranek replied on : 53 of 81

Hi Sudeendra.

You are doing what I would suggest: either using something like fread, textscan or memorymapping.

Is the read becoming a bottleneck for you?

The only other suggestion I have is that it might be easier for you to use something like spmd, load a 1/4 of the data on each worker (if you can handle that in memory) and then work with it (I am assuming 4 workers here). It limits the number of loads, but then it would require more memory.

Cheers,
Sarah

sudeendra replied on : 54 of 81

Hi,

I have my application with parfor running with number of workers being 4 for over 10000 frames but after few thousand frames (ex: 2000 frames) one of the matlab worker session was shut down.

The license i use is a independent license (it is not shared with any other user).

Can anyone explain why such errors occur and what deos it infer.

Below is the error i encounter:
============================================================

Error using ==> parallel_function at 598
The session that parfor is using has shut down

??? The client lost connection to lab 2.
This might be due to network problems, or the interactive matlabpool job might have errored. This is causing:
java.io.IOException: An operation on a socket could not be performed because the system lacked sufficient buffer
space or because a queue was full

============================================================

Loren replied on : 55 of 81

sudeendra,

Please use the link on the right of the blog to contact technical support. They should be able to help you get to the bottom of this.

–Loren

wdlang replied on : 56 of 81

the following code runs ok in Matlab 2007b

however, it breaks in Matlab 2010b

the error reported is that the variable fk0_list cannot be classified.

i cannot accept this. It is obvious that fk0_list is a sliced variable.

clear all; close all; clc; tic; format long

N=30;       Mi=50;
Mf1=100;    Mf2=150;
dt=20;
nMax=75;
tlist=0:dt:dt*nMax;
t1_list=[900, 1500, 2100, 2700];
weight=sqrt([2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43]);
fk0_list=zeros(length(t1_list),length(tlist));
disp(['N=',int2str(N),', Mi=',int2str(Mi),', Mf1=',int2str(Mf1),', Mf2=',int2str(Mf2)])

Hi=zeros(Mi,Mi);
for s=1:Mi-1
Hi(s,s+1)=-1;     Hi(s+1,s)=-1;
end
[Vi,Di]=eig(Hi);
Pi=zeros(Mf2,N);
Pi((Mf2-Mi)/2+1:Mf2/2+Mi/2,1:N)=Vi(1:Mi,1:N);

Vf1=zeros(Mf1,Mf1);
for s=1:Mf1
Vf1(:,s)=sqrt(2/(Mf1+1))*sin(pi*s*(1:Mf1)'/(Mf1+1));
end
dd1=-2*cos(pi*(1:Mf1)/(Mf1+1))';

Vf2=zeros(Mf2,Mf2);
for s=1:Mf2
Vf2(:,s)=sqrt(2/(Mf2+1))*sin(pi*s*(1:Mf2)'/(Mf2+1));
end
dd2=-2*cos(pi*(1:Mf2)/(Mf2+1))';

matlabpool

parfor s10=1:length(t1_list)
disp(['s10=',int2str(s10)])
holdtime=t1_list(s10);

P=Pi;
P((Mf2-Mf1)/2+1:Mf2/2+Mf1/2,1:N)=Vf1'*P((Mf2-Mf1)/2+1:Mf2/2+Mf1/2,1:N);
P((Mf2-Mf1)/2+1:Mf2/2+Mf1/2,1:N)=kron(exp(-i*holdtime*dd1),ones(1,N)).*P((Mf2-Mf1)/2+1:Mf2/2+Mf1/2,1:N);
P((Mf2-Mf1)/2+1:Mf2/2+Mf1/2,1:N)=Vf1*P((Mf2-Mf1)/2+1:Mf2/2+Mf1/2,1:N);

PP=Vf2'*P;
Pa=zeros(Mf2,N+1);
Pb=zeros(Mf2,N+1);
sDen=zeros(Mf2,Mf2);
for s=1:length(tlist)
t=tlist(s);
P=kron(exp(-i*t*dd2),ones(1,N)).*PP;
P=Vf2*P;

Pa(:,1:N)=P;
Pa(:,N+1)=zeros(Mf2,1);
Pa(1,N+1)=1;
for s1=1:Mf2-1
if s1>1
Pa(s1-1,1:N)=-Pa(s1-1,1:N);
Pa(:,N+1)=zeros(Mf2,1);
Pa(s1,N+1)=1;
end
Pa(s1,N+1)=1;

Pa2=Pa';
sDen(s1,s1)=det(Pa2*Pa);

Pb=Pa;
for s2=s1+1:Mf2
Pb(s2-1,1:N)=-Pb(s2-1,1:N);
Pb(:,N+1)=zeros(Mf2,1);
Pb(s2,N+1)=1;

sDen(s1,s2)=det(Pa2*Pb);
sDen(s2,s1)=sDen(s1,s2)';
end
end

Pa(Mf2-1,1:N)=-Pa(Mf2-1,1:N);
Pa(:,N+1)=zeros(Mf2,1);
Pa(Mf2,N+1)=1;
sDen(Mf2,Mf2)=det(Pa'*Pa);

fk0_list(s10,s)=(sum(sum(sDen))-sum(diag(sDen))+N)/Mf2;
end
end
matlabpool close

toc

Sarah Wait Zaranek replied on : 57 of 81

wdlang,

In versions of MATLAB prior to R2007b, parfor designated a more limited style of parfor-loop than what is available in MATLAB 7.5 and later. This old style was intended for use with codistributed arrays (such as inside an spmd statement or a parallel job), and has been replaced by a for-loop that uses drange to define its range.

The new parfor in R2007b used parentheses in defining its range to distinguish it from the old parfor. This only happened in this release to have customer code behave as expected. However, it should have thrown a warning that you were using the old functionality. Once this change occured (R2008a and on), parentheses were not needed.

If you put parenthese around your range, you will see the same error in R2007b and R2010b.

1) Convert parfor to drange to get old parfor behavior
2) Make minor changes in your code (described in above blog post) to use the new, more powerful version of parfor.

Kevin replied on : 58 of 81

Could you please teach me how to slice the variable “data” in order to avoid resulting in unnecessary communication overhead? Many thanks to your kindly help.

for i=1:num1
parfor j=1:num2,
idx=fold(j).idx;
data=myData(idx,:);
label=myLabel(idx);
% perform some processes below...
...
end
end
Kevin

Kevin replied on : 59 of 81

Sorry, I would like to say, slice the variable “myData” rather than “data”.

lucas replied on : 60 of 81

Hi

I like to ask, how I can best modify my code (to run with parfor) if I have multiple nested funcitons

r=1;
for s=2:1:4
for t=3:1:8
if t>s
for u=4:1:10
if u>t
for v=5:1:12
if v>u
m=[r s t u v];
end
end
end
end
end
end
end

If i use parfor i want to get the same result as i use for i.e m=[1 2 3 4 5]…m=[1 4 8 10 12]
I don’t know how to display m-matrix when i use parfor
thank for help

Kevin replied on : 61 of 81

I think I have solved my question; the answer is just like this:

for i=1:num1
for j=1:num2,
data2{j}=myData(fold(j).idx),:);
label2{j}=myLabel(fold(j).idx);
end
parfor j=1:num2,
data=data2{j};
label=label2{j};
% perform some processes below…
…
end
end

Alex James replied on : 62 of 81

It took me a while to figure out how to get my problem (definition of elements in a large sparse matrix) working smoothly in parfor, but now I’m happily getting it done roughly nworkers times faster than before. Thanks Loren!

One caveat now though is: I have my big sparse matrix defined, but it takes me over 9 hours to run lsqnonneg on it! Is there any handy way to parallelize the lsqnonneg operations? ie maybe with codistribution somehow?

Thanks again!

Alex James replied on : 63 of 81

I think I solved my problem too. I realized we had both 2009a and 2010a installed and I was using the older of the two. Switching to 2010a and running again, it now seems to be using >100% of the CPU cycles in top (ie running on multiple threads?) where before in 2009a it only used up to 100%.

And here I was trying to figure out how to install ATLAS and TSNNLS to make life easier, joke’s on me! I guess that’s only parfor the course.

Muahahah.

Alex James replied on : 64 of 81

JEEZ! Down from over 9hrs to just about half an hour. Way to go new matlab.

lucas replied on : 65 of 81

Hi

Once again:)
I like to ask, how I can best modify my code (to run with parfor) if I have multiple nested funcitons

r=1;
for s=2:1:4
for t=3:1:8
if t>s
for u=4:1:10
if u>t
for v=5:1:12
if v>u
m=[r s t u v];
end
end
end
end
end
end
end

If i use parfor i want to get the same result as i use for i.e m=[1 2 3 4 5]…m=[1 4 8 10 12]
I don’t know how to display m-matrix when i use parfor
thank for help

Sarah Zaranek replied on : 66 of 81

Alex and Kevin.

Looks like you solved your problems without me!

A few notes on your solutions:

Kevin – Exactly, indexing into your data contained as a cell array will allow parfor to figure out what data goes to what worker and allowing it to act like a sliced variable.

Alex – I am happy you saw such a jump in performance for your code. We are always updated our multithreaded options and working on performance. R2010a increased performance for sparse matrix indexing and introduced multithreading (which has been around since R2007a) for more functions. Both of those may have helped you in terms of performance.

Cheers,
Sarah

Sarah Zaranek replied on : 67 of 81

Lucas –

Sorry for the delay, but I have been out of town. Here are my thoughts below.

A few suggestions:

1) I would put the interior code in a function for easier handling of the indices.

2) I would concatenate the resulting output from the subfunction to get a master list of all desired m values. You can either display them at the end or as part of the interior function.

3) Since this code is very basic and fast running, it will not be sped up with a parfor loop. Hopefully this is just part of a larger simulation (e.g. once you get m you run some long running function using it).

4. I probably should preallocate the mtotal variable in the subfunction for performance and just for good programming considerations but since it is a small matrix, it really doesn’t effect performance that much. If you go on to use this code in production, I would do so.

Code example here:


function testCode

r=1;
ss = 2:1:4;
mtotal = [];

parfor ii = 1:length(ss);
s = ss(ii);
m = mSetUp(s,r);
if ~isempty(m)
mtotal = [mtotal;m];
end
end

display(mtotal)

function mtotal = mSetUp(s,r)
mtotal = [];

for t=3:1:8
if t>s
for u=4:1:10
if u>t
for v=5:1:12
if v>u
m=[r s t u v];
display(m)
mtotal = [mtotal;m];
else
end
end
end
end
end
end



Cheers,
Sarah

lucas replied on : 68 of 81

Thank you Sarah :).That’s right this is just part of a larger simulation.

Saad replied on : 69 of 81

Dear Loren or Sarah

I hope you are well. I was using parfor on a quad core and I got the following error:

??? Error using ==> parallel_function at 594
The session that parfor is using has shut down

??? The client lost connection to lab 1.
This might be due to network problems, or the pmode parallel job might have
errored.

My code is the following:
grid1D=[-1, 0.5, 0.6,0.1, 0.8, 0.1 , 100, 1, 0.1; -0.5, 0.5, 0.6,0.1, 0.8, 0.1 , 100, 1, 0.1; 0, 0.5, 0.6,0.1, 0.8, 0.1 , 100, 1, 0.1; …
0.4, 0.5, 0.6,0.1, 0.8, 0.1 , 100, 1, 0.1; 0.9, 0.5, 0.6,0.1, 0.8, 0.1 , 100, 1, 0.1; 1, 0.5, 0.6,0.1, 0.8, 0.1 , 100, 1, 0.1] ;

n=num2cell(grid1D,2);

matlabpool open local 4

parfor i=1:8

P=cal4(H, n{i});

S(i,:)= cal5(P,n{i});

end

[j1, idx]=min(S);

As you can see the body of my parfor loop is not complicated but I dont understand why it crashes. I have got another question please. Is it possible to write two parfor loops with different body in the same m-file?

I would really appreciate any comment you would make on this error message. Thanks so much.

Kind Regards

S

wdlang replied on : 70 of 81

i just find that my code is much slower using parfor. The point is that the overhead for parfor is negligible. i guess the reason is behind the rand function i called. Below is my code.

==========

clear all; close all; clc; tic

Nx=100; Ny=100;

plist=0.05:0.05:0.95;

num_sample=20000;

matlabpool open 3

parfor sss=1:length(plist)

sss

p=plist(sss);

for s10=1:num_sample
pattern=(rand(Ny,Nx)<p);
pattern2=zeros(Ny,Nx);

new=zeros(2,10000);
new2=zeros(2,10000);
num_new=0;
num_new2=0;

flag=0;
found=0;
s=0;
while (found==0)&&(s0
num_new2=0;
for s1=1:num_new
x=new(1,s1);
y=new(2,s1);
if (x>1)&&(pattern(y,x-1)==1)&&(pattern2(y,x-1)~=flag)
num_new2=num_new2+1;
new2(1,num_new2)=x-1;
new2(2,num_new2)=y;
pattern2(y,x-1)=flag;
end
if (x1)&&(pattern(y-1,x)==1)&&(pattern2(y-1,x)~=flag)
num_new2=num_new2+1;
new2(1,num_new2)=x;
new2(2,num_new2)=y-1;
pattern2(y-1,x)=flag;
end
if (y 0
matlabpool close
end

toc

Sarah Wait Zaranek replied on : 71 of 81

wdlang,

However, I can give you some things to check about why your performance isn’t as expected when using parfor.

I have found it is usually one of three things:

1. Very small problem: If your for-loop is running on the order of a couple of seconds, the overhead of transferring your data to the workers and back again just swamps out the gain you get using parfor.

The test would be just to time your serial code. I wouldn’t approach putting in parallel anything less than 10 seconds, personally.

2. Too much data transfer: Even if you problem is adequately sized, data transfer can still be an issue. If you are moving around large chunks of data, it still may swamp your improved performance.

The test would be to run your problem but run it on 1 worker. If you see a sizeable difference in run time, then you know it might be overhead on sending your data to the worker. You can then test for this by writing a dummy function that passing the data in – does a pause – and then passes it out. That way you can accurately get an estimate time of sending it to at least 1 worker.

The test would be to run MATLAB in the single-threaded mode (doc link: http://www.mathworks.com/help/releases/R2011b/techdoc/matlab_env/f8-4994.html#bq24t0c) and see if this changes your run time for the for-loop version at all.

My co-worker Jiro Doke and I also wrote a timing function for parfor called parTicToc that may help you diagnose your issues. It is located on the File Exchange here:
http://www.mathworks.com/matlabcentral/fileexchange/27472-partictoc

Cheers,
Sarah

Suresh replied on : 72 of 81

Hi,

I have been using parfor for sometime on my quadcore pc for some pretty big optimization probelm which generally takes hours to solve..
I found sometime my code works fine but sometimes it breakes with the error .

??? Error using ==> parallel_function at 598
The session that parfor is using has shut down

??? The client lost connection to lab 4.
This might be due to network problems, or the interactive matlabpool job might have
errored.

Please suggest how to deal with this issue.

Regards,
Suresh

Sarah replied on : 73 of 81

Suresh.

It is hard to diagnose this without the actual code. Usually it is either a network issue or a data size limit issue. I have talked to support, and they encourage you to contact them to help you pinpoint the issue. Their number is listed below:

Monday-Friday
Hours: 08:30 – 20:00 ET
Tel: 508-647-7000, option 2

Cheers,
Sarah

Coen replied on : 74 of 81

Dear Sarah,

I have been a (happy) Matlab user for many years now but have run into a limitation (?) of the distributed computing toolbox. For my research, I require parallel execution of a very large number of matrix-vector multiplications of differing sizes in Real-Time. These matrices are constructed offline, in an extremely expensive operation, and saved in a static data structure (MatSet in the example below). This static data structure is used by all workers, and is not modified after creation.

When I run the code, which is equivalent to the code below, I find that the PARFOR loop takes 10x more time to complete than the FOR loop in Matlab 2009b. As I understand from your earlier explanations, this is because of the constant transfer of data (MatSet in this case) between workers. In my case, however, this data transfer is completely unnecessary!

My question is whether there is some way of loading a static dataset into the workspace of the workers so as to prevent unnecessary communication overhead between workers?

Coen

matlabpool(4);
Msize = 100; Nloop = 1000;
c1 = zeros(Msize, Nloop); c2 = zeros(Msize, Nloop);
% initialization loop (runs fine!)
MatSet = cell(Nloop, 1);
parfor i=1:Nloop
MatSet{i} = rand(Msize); % in real life this would contain an extremely expensive code operation
end
% real-time parallel loop (SLOW!)
tic;
parfor i=1:Nloop
c1(:,i) = MatSet{i} * rand(Msize, 1);
end
time1 = toc;
% real-time serial loop
tic;
for i=1:Nloop
c2(:,i) = MatSet{i} * rand(Msize, 1);
end
time2 = toc;
fprintf('Parallel time: %2.4f ms, Serial Time: %2.4f ms\n', 1000*time1, 1000*time2);


Felix replied on : 75 of 81

Hi,
I get this error message:

??? Error using ==> parallel_function at 598
Input argument “initRobotList” is undefined

I have no clue how this message could arise. I set up a simulation based on matlab-classes, that should run for 5 different parameter sets simultaneously.
In the beginning there is the parameter ‘initRobotList’ that is handed over to the constructor.
The instances of the class are created successfully for all 5 simulations. This variable is nowhere else used but in the constuctor of the class, but after 5 min, and approx. 5000 simulation-steps in each of the 5 simulations, there is this message.

What could that be?
Where should I start looking?
It is a really big simulation so it is impossible to comment to a minimal example and see if it’s working.

Thanks alot and kind regards

Fx

Felix replied on : 76 of 81

Now it worked somehow. And how did I solve it? I did nothing! So I think I got this error, because I changed something in the sourcecode of the class and saved it, while the simulation tried to use it and got confused. Could that be?

Doesn’t matter, it works, thanks!

Barend replied on : 77 of 81

Dear Loren

I have not used the parallel computing features of matlab before, so I am quite new with the implementation of it. I might be asking a question that has already been adressed – apologies if this is the case. I have the following code:


topNodes    = [2 3 4 5 6 7];
bottomNodes = [1 2 3 4 5 6];
polarity    = [1 1 1 -1 -1 -1];
nodes       = max([topNodes bottomNodes]);
T           = zeros(nodes,numel(polarity));

for i = 1:numel(polarity);
if polarity(i) < 0
T(i,bottomNodes(i)) = -1;
T(i,topNodes(i)) = 1;
else
T(i,bottomNodes(i)) = 1;
T(i,topNodes(i)) = -1;
end
end



Okay, so now if I use a parfor loop this does not work. There are many of my applications where I have an array with values that represent index values of a matrix and for some reason this is a challenge for me when I try to implement a par for loop.

Hope to hear from you soon.

Kind regards,

Barend

Sarah replied on : 78 of 81

This relates to the restrictions spelled out in the above section of the blog post entitled – Classification.

For parfor loops, when you index into a sliced variables, restrictions are placed on the first-level variable indices. This allows parfor to easily distribute the right part of the variable to the right workers. One of these first-level indices must be the loop counter variable or the counter variable plus or minus a constant. Every other first-level index must be a constant, a non-loop counter variable, a colon, or an end.

See a possible fix below.


topNodes    = [2 3 4 5 6 7];
bottomNodes = [1 2 3 4 5 6];
polarity    = [1 1 -1 1 1 1];
nodes       = max([topNodes bottomNodes]);
T           = zeros(numel(polarity),nodes);
parfor i = 1:numel(polarity);
myData = T(i,:);
if polarity(i) < 0
myData(i,bottomNodes(i)) = -1;
myData(i,topNodes(i)) = 1;
else
myData(i,bottomNodes(i)) = 1;
myData(i,topNodes(i)) = -1;
end
T(i,:) = myData;
end

Free screen recorder replied on : 79 of 81

“d_3x = rand(100,4);
d_3y = rand(10,4);
d_3z = rand(10,4);
r3d_3 = rand(10,4);
k3d_3 = rand(10,4);

for x = 1:100
d_x = d_3x(x,:);

for y = 1:10
d_y = d_3y(y,:);

parfor z = 1:10

k1 = [ d_x ,d_y , d_3z(z,:) ];
%Where d_x, d_y & d_3z(z,:) are 1×4 double
k1 = sum(k1);
%[r t] = corr_spear(k1 , mod3d_3);
% don’t have function, so set r equal to
% random number
r = rand;
if abs(r) < abs(r3d_3(z,1))
r3d_3(z,1) = r;
k3d_3(z,1) = k1;
end

end
end
end”

Sarah, this looks good! Thanks for your contribution!

Tatiana replied on : 80 of 81

Dear Loren

Could you please describe how to configure the ‘local’ configuration so that it allows to use 8 workers (or any number more than 4) during the parallel computations?

Thank you very much in advance!

Sarah Zaranek replied on : 81 of 81

@Tatiana

The maximum number of workers was raised from 4 to 8 (in R2009a) and from 8 to 12 (in R2011b). So, make sure you have at least R2009a.

The maximum number of workers is usually set to the number of cores that the OS tells MATLAB you have. You can change this by changing the local configuration by going to the parallel pulldown menu and managing the local configuration. If you look at the properties, you can set the cluster size to 8. However, if you only have 4 cores – this will not help you in terms of speed up.

Cheers,
Sarah