Comparing repmat and bsxfun Performance37

Posted by Loren Shure,

I've been asked repeatedly about the performance comparison between two MATLAB functions, bsxfun and repmat. These two functions can each help with calculations in which two arrays are expected to have the same dimensions, but some of the input dimensions, instead of agreeing, may have the value 1. The simple example I use here is subtracting the columns means from a matrix.

Setup

First I set up the data.

m = 1e5;
n = 100;
A = rand(m,n);

Code I'm Tempted to Write

And now here's the code I'm tempted to write, safely tucked inside the confines of a try statement.

try
AZeroMean = A - mean(A);
catch ME
disp(ME.message);
end
Matrix dimensions must agree.


As you can see, MATLAB does not allow binary operators to work on arrays with different sizes (except when one of the inputs is a scalar value). There are at least two ways to remedy this.

• Store the intermediate calculation from mean(A) in a vector and then create a new array the same size as A with replicates of the row vector from the mean. You can do this with repmat or via indexing an appropriate number of times into the row of this vector.
• Call bsxfun with the appropriate two inputs and allow it to perform the equivalent singleton dimension expansion. The nice thing about this is there is no need for a large intermediate array the same size as A. A possible downside, especially since bsxfun is relatively new, is that the code doesn't, at first reading, appear as obvious.
• Timing repmat

Using the most excellent timeit utility that Steve Eddins posted to the file exchange, I now time the repmat calculations. First I create an anonymous function that does my calculation. Then I pass that function handle to timeit. timeit carefully warms up the function by running it enough so the times are not subject to first-time effects, figuring out how many times to run it to get meaningful results, and more.

frepmat = @() A - repmat(mean(A),size(A,1),1);
timeit(frepmat)
ans =
0.30964


Indexing with ones

repmat uses a variety of techniques for replicating an array, depending on the details of what's being replicated. One technique is to index into the array with ones in the dimension to replicate. Here's an illustative example with a vector.

q = [17 pi 42 exp(1)];
q5 = repmat(q,5,1)
q5 =
17       3.1416           42       2.7183
17       3.1416           42       2.7183
17       3.1416           42       2.7183
17       3.1416           42       2.7183
17       3.1416           42       2.7183


Timing Indexing

One thing I notice with the repmat solution is that I need to create the vector mean(A) for the function. I need to do the same thing without repmat and I want to be able to set up one function call for performing the calculation so I can use timeit. Since I can't index into the results of a function without assigning the output to a variable, I create an intermediate function meanones to help.

type meanones
function y = meanones(A)

mn = mean(A);
y = A - mn(ones(1,size(A,1)),:);



Now I'm ready to do the timing.

findex = @() meanones(A);
timeit(findex)
ans =
0.31389


Timing bsxfun

Next see the timing calculation done using bsxfun.

fbsxfun = @() bsxfun(@minus,A,mean(A));
timeit(fbsxfun)
ans =
0.20569


Punchline

In this example, bsxfun performs fastest. Now that you see bsxfun in action, can you think of uses for this function in your work? Let me know here.

Get the MATLAB code

Published with MATLAB® 7.6

Note

A B replied on : 1 of 37

I use bsxfun rather frequently (e.g. to shift a bunch of points by a constant vector), but it makes the code rather unreadable. I mean, compare
points = points + vector
with
points = bsxfun(@plus,points,vector)

Any ideas on how to improve the readability?

Loren replied on : 2 of 37

A B-

I recommend you place a comment in your code, perhaps one that says what you’ve written above

% points = points + vector


–Loren

Ken Eaton replied on : 3 of 37

For the longest time, I had thought that matrix operations tended to be faster than using repmat (and I had never really used bsxfun). However, I did some simple timing calculations recently and now I’m not so sure. I was curious how the following calculation would compare to the ones you already timed:

AZeroMean = A – ones(size(A,1),1)*mean(A);

-Ken

Loren replied on : 4 of 37

Ken-

It will depend on the size of your arrays, but generally you should find bsxfun faster as there is no creation of the intermediate array (which could be large). Memory allocation (and the matrix multiplication) can take appreciable time.

–Loren

OkinawaDolphin replied on : 5 of 37

I am going to use bsxfun for normalizing datasets, i. e. for operations like this:

NormalizedDataset = ScaleFactor .* (OriginalDataSet – Shift)

I just tried out bsxfun myself and I found that it is faster than using repmat, even though calculation time is below one milisecond in my example.

x = repmat(magic(10), 100, 1); % data set initialization
a = repmat(1000, 1, size(x, 2)); % scale factor
b = repmat(100, 1, size(x, 2)); % shift

Now the calculation times are compared:

1. repmat

tic;
b1 = repmat(b, size(x, 1), 1);
a1 = repmat(a, size(x, 1), 1);
y = a1 .* (x + b1);
toc

yields a calculation time from 0.550 ms to 0.735 ms.

2. bsxfun with two variables

tic;
x1 = bsxfun(@plus, x, b);
y1 = bsxfun(@times, x1, a);
toc

yields a calculation time from 0.183 ms to 0.0232 ms.

3. bsxfun with one variable

tic;
y1 = bsxfun(@plus, x, b);
y1 = bsxfun(@times, y1, a);
toc

yields a calculation time from 0.213 ms to 0.293 ms.

My Matlab version is R2007a with multithreading enabled, and the PC I used has a dual core processor so multithreading will take place.

How can we interprete these results?

We need only one array for temporary calculation results when using bsxfun. We need two when using repmat. For each temporary array, memory must be allocated. So bsxfun is faster due to less memory allocations.

Another factor is the calculation time for elementwise matrix addition and multiplication. When performing a normal matrix multiplication, the rows of one matrix are multiplied by the columns of the other one. This is of course time consuming. However, when normalizing a dataset, only elemetwise multiplications and additions take place. In Matlab, matrices are internally represented as vectors. These vectors are generated by concatenating column vectors. Therefore, an elementwise matrix multiplication is nothing else but an elementwise vector multiplication. The resulting matrix has exactly the same number of rows and columns as the matrices multiplied. The same is true for addition.

Question to Loren: Is bsxfun faster than the operators ‘.*’ and ‘+’? Also, why is the third example (using the result variable for the intermediate calculation, too) slower than the second one? Is the difference not significant?

Loren replied on : 6 of 37

+ and .* will be faster than using bsxfun IF you already have the arrays around to do the calculations on. If you do not, and they are large (yours are not very big), then you are likely going to be better off with bsxfun.

As to why your 3rd example takes longer, I think you need to do more careful measurements first to be sure. I recommend using timeit, from the FEX. It pre-runs code, determines how many times to run it to get significant times and is more robust at timing than a simple tic/toc.

–Loren

Jessee replied on : 7 of 37

Often I want to index the output of a function without assigning the output to a variable first (a problem you circumvented with the intermediate function “meanones”).

Could you explain why this is illegal in MATLAB?

Loren replied on : 8 of 37

Jessee-

It simply was never part of the initial language design to allow indexing directly into computed entities. It’s on the enhancement list to allow this sometime in the future.

–Loren

Duncan replied on : 9 of 37

Regarding why your third example is slower than your second example, the result is in fact somewhat expected. I believe you are timing your code in the MATLAB command window. In R2007a, in the command window, there is no in-place optimization, meaning memory is not re-used. In other words, your line of code:

y1 = bsxfun(@times, y1, a);

actually allocates memory for the output y1, store the result in the output variable, and then free the memory for the input y1. Using the same variable as both input and output does not save you any time.

If you put the code in an M-function, then in-place optimization takes place, and you will see time savings for re-using variables:

function myfunction

x = repmat(magic(10), 100, 1); % data set initialization
a = repmat(1000, 1, size(x, 2)); % scale factor
b = repmat(100, 1, size(x, 2)); % shift

tic;
x1 = bsxfun(@plus, x, b);
y1 = bsxfun(@times, x1, a);
toc;

tic;
y1 = bsxfun(@plus, x, b);
y1 = bsxfun(@times, y1, a);
toc;

You can see the difference more clearly if you use larger matrices.

Markus replied on : 10 of 37

Hi Loren,

which version is fastest also depends very much on the matrix dimensions. Look at my test function:


function timingtest(m)
n = 100;
A = rand(m,n);
N = min(1e6 / m, 1e5);
tic
for k=1:N
B = A - repmat(mean(A),size(A,1),1);
end
toc
tic
for k=1:N
mn = mean(A);
B	= A - mn(ones(1,size(A,1)),:);
end
toc
tic
for k=1:N
B = bsxfun(@minus,A,mean(A));
end
toc


Here the dimension m is an input argument. The results are as follows:

>> timingtest(1e5)
Elapsed time is 1.894369 seconds.
Elapsed time is 2.268081 seconds.
Elapsed time is 1.875516 seconds.
>> timingtest(10)
Elapsed time is 5.758173 seconds.
Elapsed time is 3.377169 seconds.
Elapsed time is 3.982497 seconds.
>> timingtest(1)
Elapsed time is 4.618704 seconds.
Elapsed time is 2.182044 seconds.
Elapsed time is 2.650870 seconds.

The smaller the matrix dimensions, the better the version with “ones”.

In my projects, I have replaced nearly every “repmat” by “indexing with ones” and saved a lot of processing time. And, by the way, I won a Matlab Contest by applying this tweak :-)

Regards
Markus

Loren replied on : 11 of 37

Markus-

Congratulations on winning! And a nice illustration of how the size matters. Small enough, and the intermediate arrays you create are not a big deal. Large enough, and bsxfun is the better way to go.

–Loren

OkinawaDolphin replied on : 12 of 37

It seems that neither R2007a nor R2007b have the function timeit, but I investigated computation time with a function similar to what Duncan wrote:

function myfunction

x = repmat(magic(10), 100, 1); % data set initialization
a = repmat(1000, 1, size(x, 2)); % scale factor
b = repmat(100, 1, size(x, 2)); % shift

tic;
for n = 1 : 10000
x1 = bsxfun(@plus, x, b);
y1 = bsxfun(@times, x1, a);
end;
toc;

tic;
for n = 1 : 10000
y1 = bsxfun(@plus, x, b);
y1 = bsxfun(@times, y1, a);
end;
toc;

end

Now it is significantly faster to use the result variable for the intermediate calculation, too.

Loren replied on : 13 of 37

timeit can be downloaded from the File Exchange. Steve Eddins is the author. It does not ship with MATLAB.

–Loren

OkinawaDolphin replied on : 14 of 37

Here are the two functions I just tested with timeit:

function myfunction1

x = repmat(magic(10), 100, 1); % data set initialization
a = repmat(1000, 1, size(x, 2)); % scale factor
b = repmat(100, 1, size(x, 2)); % shift

for n = 1 : 10000
x1 = bsxfun(@plus, x, b);
y1 = bsxfun(@times, x1, a);
end;

end

function myfunction2

x = repmat(magic(10), 100, 1); % data set initialization
a = repmat(1000, 1, size(x, 2)); % scale factor
b = repmat(100, 1, size(x, 2)); % shift

for n = 1 : 10000
y1 = bsxfun(@plus, x, b);
y1 = bsxfun(@times, y1, a);
end;

end

timeit(@myfunction1) yields 960 – 970 ms.
timeit(@myfunction2) yields around 690 ms.

In my opinion this means that using only one variable takes less time than two variables. Well, this is certainly not a scientific experiment. It is only a hint. But the result is consistent with the the concept of in-place optimization.

Gautam Vallabha replied on : 15 of 37

A B asks a good question about readability of BSXFUN code. One difficulty is the name BSXFUN. It is an accurate description (Binary Singleton eXpansion FUNction), but on first glance the acronym is unintuitive and forbidding. I wonder if having an alias for it would help:

———-
A = rand(3,3);
match_dimensions = @(varargin) bsxfun(varargin{:});
B = match_dimensions(@minus, A, mean(A))
———-

Or, because we are dealing with binary operators, we can try to mimic the infix notation:

———-
match_dimensions = @(a,op,b) bsxfun(op,a,b);
B = match_dimensions(A, @minus, mean(A))
———-

Gautam

Daniel replied on : 16 of 37

When this post came around, I read it, and didn’t understand bsxfun. Now I understand bsxfun and I realize that bsxfun is the solution to alot of grief I have had over the years. Countless times I have sworn over the fact that I cannot add an array to a matrix, and now I realize that this bsxfun everyone was talking about was the solution.

Nifty.

–DA

Jinhui replied on : 17 of 37

Loren,
I use bsxfun quite frequently and like it a lot. Indeed I found that many MATLAB toolbox functions are rewritten with bsxfun recently (say, repmat now uses bsxfun in sparse matrix case, spline interpolation also uses bsxfun, etc).

I have one question on the design of the function,though. Why is the function restricted to singleton expansion instead of more general integer-multiple expansion. What I mean can be illustrated more easily through an example.

% an example not work in current implementation
a = rand(4,6);
b = rand(2,6);
c = bsxfun(@plus,a,b); % does NOT work since it violates SINGLETON requirement
Conceptually bsxfun(@plus,a,b) can have a reasonable interpretation as “a + repmat(b,[2 1]);”, here 2 comes from size(a,1)/size(b,1). The well-defined nature here comes from mod(size(a,1),size(b,1)) == 0 so that mismatched dimension is an integer multiple of the other. Obviously singleton expansion is a special case of this.

If an extension of bsxfun implements this, it will eliminate the need for repmat for more cases, which seems to be in the spirit of design of bsxfun (although we need to find a different interpretation for letter “s” in bsxfun :-)).

From my experience of C MEX programming, the implementation of integer-multiple expansion should be similar to singleton expansion which, I guess, involves a different stride for pointers. Or am I missing something?

Jinhui

Loren replied on : 18 of 37

Jinhui-

MATLAB has a traditional for scalar expansion, hence the behavior of bsxfun. If we did allow matches like you suggest, it could mean different things. Replicate the 2 rows in order multiple times, i.e., 1 2 1 2 1 2, OR it could be replicate each “the right number of times”, i.e., 1 1 2 2. It didn’t seem worth the confusion since we didn’t know which scenario was more likely of real interest.

–Loren

Jinhui Bai replied on : 19 of 37

Loren,
Thanks for the answer on the design philosophy. At least now I know that it is not because of technical and efficiency reasons in implementation.

My interpretation literally comes from the behavior of repmat, which means [1 2 1 2] intead of [1 1 2 2]. But I agree that scalar expansion is an intuitive concept in linear algebra, but repmat is a MATLAB-specific concept. From the ease-of-use point of view, repmat-based interpretation may not be as clear as singleton expansion, especially for new users.

Jinhui

StephenLL replied on : 20 of 37

This may be asking too much but is it possible to go beyond binary expansion?

I would think that an arbitrary amount of expansion would be useful.

Something along the lines:

x = rand(10,1)
y = rand(1,10)
z = rand([1,1,5])

g = bsxfun(@betainc,x,y,z);

Thanks for listening.

Stephen

Loren replied on : 21 of 37

Stephen-

We’ll put this idea on our list to consider for future enhancements.

–Loren

Tom Clark replied on : 22 of 37

Hi Loren,

Sorry to resurrect an old blog – I just had a thought of a way to enhance readability for bsxfun.

What about creating a bsxfun type class? You could then overload all the operators (+ – etc) to use the ‘ugly’ call to bsxfun simply.

Let’s look at a simple example for which you could use bsxfun:


% Generates an error unless you use bsxfun, or repmat etc...
A = rand(10);
B = ones(10,1);
A = A - B;



But if we designated a class:


A = rand(10);
B = ones(10,1);
% Make a variable of type bsxclass.
% Values of B are kept in the private
% data of the class.
B_exp = bsxclass(B)
% B_exp is effectively an array which
% can be expanded along a singleton
% dimension
A = A - B_exp;



… so the class definition contains the horrible calls to bsxfun, and your code contains simple operator statements.

What does everyone think? Would such a class get used? Will there be a significant functional overhead caused by overloading the method / class invokation?

.. i.e. should I sit down and write it??!

Tom

Loren replied on : 23 of 37

Tom-

Perhaps easier to hide the bsxfun calls, but the need to convert and then operate doesn’t save the user steps. Interesting idea with interesting trade-offs.

My guess is that unless the arrays are large, you will pay a noticeable price (relatively) for the object.

–Loren

Tom Clark replied on : 24 of 37

Loren,

You’re probably right. I’ll put it on the backburner!

If anyone would like this idea implemented, drop me a message (t.clarkremovespam@cantab.net) and I’ll give it a shot.

Tom

Tim Wilson replied on : 25 of 37

Hi Loren,
I have just found the bsxfun and I like it a lot. But it has only half solved my problem.
I have a 4-D array and I would like to perform this operation

A(:,1,:,:) = A(:,1,:,:) + B


bsxfun will do the addition, what is the best way of assigning the values. The arrays are quite large and currently 1/3 of my computational time is taken up with the squeeze() function.

Any ideas?

Tim

Loren replied on : 26 of 37

Tim-

I don’t see |squeeze| in your code here so you need to tell us more. What are typical sizes for |A| and |B|? Are they full or sparse? Instead of |squeeze|, would |reshape| meet your needs? It should not be computationally intensive.

Are you sure |squeeze| is the culprit? Have you profiled your entire code?

–Loren

Steve Eddins replied on : 27 of 37

Tim—squeeze is implemented as a simple reshape operation, which should take essentially no time even for very large arrays. As Loren says, how did you conclude that squeeze is the culprit?

Cris Luengo replied on : 28 of 37

Hi Loren,

I was thinking, what is the reason for implementing the |bsxfun| function, and not making the singleton expansion the default behavior for all operators? MATLAB has always done singleton expansion in the case of a scalar value (i.e. all dimensions are singleton): |A+4| is the same as |A+repmat(4,size(A))|. Is there any conceivable problem that could arise if MATLAB were to always expand any singleton dimensions? I would love to be able to do |A-mean(A)|! I’ve been thinking about this for a while now, and I can’t come up with a scenario where this would backfire.

Cheers,
Cris.

Loren replied on : 29 of 37

Cris-

We are worried about compatibility and making debugging very difficult for users since more expressions will get answers when users didn’t mean to write what they wrote.

–Loren

Steve Eddins replied on : 30 of 37

Cris—Personally, I would like to see us make the change you recommend.

Loren replied on : 31 of 37

Cris-

I want to echo Steve’s comments. I am all for it as well. Just need to find the right path through.

–Loren

Cris Luengo replied on : 32 of 37

Loren,

I see what you mean, with bad code producing answers instead of error messages. Adding a long Nx1 array with a 1xN array would give a huge NxN matrix, instead of a simple “what are you doing?” message. Changing something as basic as this should definitely not be taken lightly! But at least it wouldn’t change the meaning of existing code.

I was pointed at a language called Yorick some days ago. It has some very crazy, but interesting, ideas on indexing: http://yorick.sourceforge.net/manual/yorick_37.php#SEC37
Note especially the “*” and “..” rubber indexing. Again something I would have used extensively if it had been available in MATLAB! :)

Best,
Cris.

Ben replied on : 33 of 37

Hi Loren,

I am trying to run the following code where V and MM are two cells.

Q = bsxfun(@mtimes, V, MM);


In Matlab document, it is said “The inputs must be of the following types: numeric, logical, char, struct, cell.” But I got error message:

Error using bsxfun
Operands must be numeric arrays.

Does bsxfun not support cell or I made something wrong? Thanks a lot for your help in advance.

Best,
Ben

Loren replied on : 34 of 37

Ben-

bsxfun does support cells, but the operator mtimes does not. The function being called by bsxfun would have to know how to work with cell arrays.

–Loren

Ben replied on : 35 of 37

Hi Loren,

I also tried using a self-defined function called mytimes.

Q = bsxfun(@mtimes, V, MM);


where the mytimes function is

Hi Loren,

I also tried using a self-defined function called mytimes.

function out = mytimes(v,m)

out = v * m;


I set breakpoint in mytimes function, but again got error message without call mytimes function.

Could you provide a example of using cells?

Thank you very much!
Ben

Ben replied on : 36 of 37

Loren,

I further tried

    x = 1:10;
y = 1:20;
z = bsxfun(@mytimes, x, y');


This time, there is no error message and I got to my breakpoint in the mytimes function.

Thank you,
Ben

Jason replied on : 37 of 37