# In-place Operations on Data 45

Posted by **Loren Shure**,

Recently I mentioned that I would describe a relatively new optimization we have for MATLAB code that allows some function calls to operate on variables without allocating more memory for the result. This can be very beneficial when processing large datasets. I will describe and demonstrate this feature using R2007a.

### Contents

### Make Data

Create a "large" dataset. This one works for me with 1GB RAM and 1.5GB of swap space running Windows XP and no other applications.

n = 38*2^20; x = randn(n,1);

Now call another function that will call either a regular function or an in-place function.

inplaceTest(x)

Let's look at the code for `inplaceTest`.

`type inplaceTest`

function inplaceTest(x) % Call functions with either regular or in-place semantics. %% Call a Regular Function with the Same Left-Hand Side x = myfunc(x); %% Call an In-place Function with the Same Left-Hand Side x = myfuncIP(x); %% Call a Regular Function with a Different Left-Hand Side y = myfunc(x); %% Call an In-place Function with Same Left-Hand Side % Note: if we changed this next call to assign output to a new LHS, we get an error x = myfuncIP(x);

It just gets the data in and twice calls two different functions.

See what the Windows Task Manager shows for a recent run of mine for the 4 calculations. You can see when the CPU was busy and you can match that with the times that memory was allocated, whether temporary or permanent. Here's how a typical run of this code appears in the Windows Task Manager Performance tab when I run it.

You can see the four calls to the computational functions in the CPU Usage History, and you can see the amount of extra memory used in the Page File Usage History. When we call the in-place function using the same input and output variable names from the test function, no extra memory is allocated.

### Required Characteristics of In-place Behavior

Let's look at the code for the two functions `myfunc` and `myfuncIP` so we can see how they differ.

Here's the function `myfunc.m`

`type myfunc`

function y = myfunc(x) y = sin(2*x.^2+3*x+4);

and here's the function `myfuncIP.m`

`type myfuncIP`

function x = myfuncIP(x) x = sin(2*x.^2+3*x+4);

As you can see, the two * myfunc* functions differ. The in-place version has a return argument

*with the same name*as one of the input arguments, in this case,

`x`. Also, the computation needs to put its results into

`x`instead of

`y`, since the new return variable name is now different.

The second important idea required to take advantage of in-place computation is that the in-place function must itself be
called from another function, as I've done with `inplaceTest`.

### Considerations

I'd like to pass on several tips and rules of thumb about in-place functions. This is not an ordered list.

- Because of MATLAB's JIT, some functions or parts of code are already doing some work in place. In these cases, you will see
less or no benefit than you might expect by switching to in-place operations. For example, using the binary version of
`max`(e.g,`A=max(A,B)`) doesn't gain, even if in a function. - Benefits don't usually occur for the smallest arrays. It takes typically at least 1000 elements before you might see a difference.
- Keep your code natural. Don't write unnatural code just to take advantage of this. This is particularly true if your function
is already computationally intensive. There is more speed gain for simpler functions, though memory might still be a concern.
For example,
`lu`is an order`n^3`operation, and the memory speed gain is only order`n^2`. So it's not worth doing contortions to write`[L A P] = lu(A)`(more naturally written`[L U P] = lu(A)`), unless you really need to conserve memory.

### Limitations

There are some limitations regarding when the in-place optimization is operative. These include:

- Not all built-in MATLAB functions currently obey in-place semantics. We tried to support the most important and obvious ones first such as elementwise operations. There is not a list of these functions. It will grow from release to release.
- There is no interface for the in-place operations via MEX-files.
- MATLAB should be able to recognize in-place possibilities even when variable names don't match between the input and output names.

### Feedback?

Do you plan to take advantage of this new feature in your code? Let me know here.

Get
the MATLAB code

Published with MATLAB® 7.4

## 45 CommentsOldest to Newest

Loren,

Could you explain how this applies within the structure of a larger function. If I use intermediate calculations, does the in-place performance apply to that as well, or is strictly the matching of the input variables to output variables in the function declaration that cues MatLab to do so? How does the speed efficiency compare when one is considering the overhead of additional function calls versus the calculational savings of performing in place assignments?

Thanks,

Dan

Dan-

Within a larger function, this does nothing specific. It’s the function call boundaries that allow this optimization to occur right now. The function call overhead should be tiny compared to the memory savings for large arrays.

–Loren

the usual applies to this loren blog: very nicely written, very good points, excellent examples…

however, whilst there are good pointS – there are still no pointERS!

will we ever get these critters that this community’s been longing for ever since 1984…

us

This is a great addition to Matlab! I am planning to use it to simplify the coding of the addition of a sparse matrix to a large dense matrix (see my newsgroup article complaining about my failure to attempt this using ‘evalin’ [which was way too slow] at: http://newsreader.mathworks.com/WebX?14@43.71OCbY2f4a5@.ef5149c ).

Dear Loren,

this looks like an extremely useful feature. I often work with large three dimensional data sets (video data) and this could be invaluable for making efficient use of memory.

In which release was this optimisation added and where is it documented?

Thanks,

Alex

Alex-

Parts of this optimization started getting added with the JIT in R13, but most of it started getting added in R2006b. The full effect is only available in R2007a currently. I think this is not yet in the documentation so this is probably your best place for the information. Stuart McGarrity may also soon have an article out in News & Notes with this as part of this topic.

–Loren

How I can use the task manager by using a dual core cpu? I think you have to install a special prog? Thanks for any information

I use the task manager on my dual core machine by changing View->CPU History->One Graph, All CPUs

–Loren

Hi Loren, interesting fact. It would be great to get regular updates on the Matlab JIT both positive and negative features.

My question is: I use “rogue” in-place in mex files a lot, using the fact that modifying input arguments in mex files in fact changes the data in the matlab environment. This is not officially supported though. Will this be changed? If not then I think users might appreciate it to document this feature and make it “official”.

Christoph

Christoph-

I don’t know the plans, but I know we don’t claim to support changing the input arguments. It isn’t not something you can depend on to be available in the future.

–Loren

Very useful addition. Does it work with quad CPU´s?

Trip-Trap,

The memory management is separate from processors. If you turn in multi-threading in R2007a, certain operations will benefit. See the documentation for more information on the details.

–Loren

I tested the example with the profiler and I used both a row vector with 10^6 elements and a rectangular matrix with 1000 * 1000 elements. Here are my results:

1. Vector

x = myfunc(x); 0.138 s

y = myfunc(x); 0.132 s

y = myfuncIP(x); 0.119 s

x = myfuncIP(x); 0.118 s

The inplace function is 10 – 20 ms faster than the regular function, which is not surprising. When the left-hand side is different from the output parameter, the regular function is faster and the in-place function is slower when compared to the case that left-hand side and output parameter are the same, even though the difference is small.

2. Matrix

x = myfunc(x); 0.138 s

y = myfuncIP(x); 0.125 s

y = myfunc(x); 0.123 s

x = myfuncIP(x); 0.119 s

This time, the regular function is a little bit faster than the in-place function when left-hand side and output parameter are different. When left-hand side and output parameter are the same, the in-place function is significantly faster than the regular function.

It seems that using an in-place function can sometimes speed up and sometimes slow down a program. How can this be explained? Are in-place functions only a means of saving memory or can they be used to speed up a program?

Timing is tricky. And it’s best to run things a few times first before doing the timing to ensure that all dlls, etc. are loaded. Also, in this case, you need to be sure everything is running from functions.

I wonder if your first set of timings reflects the first time running effect instead of something more profound. Or…

The profiler also can influence the times. And I am not sure if the profiler interferes with true inplace behavior. You’d have to watch your task manager at the same time to see if memory was really being used in place or not.

Also, when doing timings like these, I would recommend using larger arrays so the times can differ more profoundly.

I am unaware of cases where inplace is actually slower than regular. If it can’t do the operation in place, it should be comparable to the calculation don’t without the inplace semantics.

–Loren

I ran the profiler several times for both the vector and the matrix, so the first running effect certainly did not influence the timing. An array with one million elements of type double is not enough for a comparison? I get an out of memory error when trying to use 36 million elements, so here are my results for a vector with 25 million elements:

x = myfunc(x); 3.650 s

y = myfuncIP(x); 3.130 s

y = myfunc(x); 3.104 s

x = myfuncIP(x); 3.030 s

These are the results for a rectangular matrix with 5000 * 5000 elements:

x = myfunc(x); 3.784 s

y = myfuncIP(x); 3.171 s

y = myfunc(x); 3.075 s

x = myfuncIP(x); 3.045 s

When I wrote my previous message, I overlooked that the output parameter of myfunc(x) is y and the output parameter and myfuncIP(x) is x. This means that the following comparisons should be done:

% different left-hand side

x = myfunc(x);

y = myfuncIP(x);

% same left-hand side

y = myfunc(x);

x = myfuncIP(x);

I don’t know whether the profiler influences timing significantly, especially. If it doesn’t, I conclude that in-place functions are faster than regular functions and that using the same left-hand side is faster than using different left-hand sides. However, when the same left-hand side is used, an in-place function is not significantly faster than a regular one.

Does Matlab use some kind of return value optimization like C++ does?

Thanks for clarifying and rerunning code. MATLAB does look to see if the work can be done in place and if the user asked for it to be done in place (via the same variable on right and left hand sides). Only if the user requests the in place behavior AND the function is suitable does MATLAB do this optimization currently. There are some more optimizations that we can add over time.

–Loren

This functionality should be very useful. In my application the solution matrix is stored as a field of a structure. A function is applied to that field repeatedly to obtain new solutions. I don’t know how to do in-place operation for structures. I guess ‘pulling out’ and ‘putting back’ lots of data into structures may be quite expensive.

Hi Loren. I want to define an class of object which behaves almost exactly like Matlab builtin ‘struct’, only with a few extra methods I’d like to add later. I want to inherit from a struct object so that I don’t have to write many accessor or general utility functions. Why can’t I use a struct as my parent object? In my constructor I do class(newobj,’newobjname’,stru) where stru is a builtin struct object, Matlab then complains that this parent object is not valid. Do you know if it’s possible at all to inherit Matlab builtin structures? Thanks!

Shakeham-

You cannot inherit from built-in datatypes in MATLAB. It’s been reported to development as an enhancement request.

–Loren

Thanks for the reply Loren! This prevents me from trying further in vain. I’d definitely keep an eye on the development of this feature. Since an object must be a struct first, I guess the only builtin object that can be inherited is the struct.

btw in the previous question, does in-place operation work on structure? Let’s say in function a struct is input and some fields of it is modified and the output is the same as input name to signal for inplace op. What will Matlab do?

Shakeham-

MATLAB only makes copies of struct fields if they change. I don’t think it will do them in-place currently. You could try modifying the code yourself to use structs and see what happens as you watch the task manager.

–Loren

I have some problems with the following that I saw was posted:

x = myfunc(x); 3.650 s

y = myfuncIP(x); 3.130 s

y = myfunc(x); 3.104 s

x = myfuncIP(x); 3.030 s

I feel that the following is better suited.

x = myfunc(x); 3.600 s

y = myfuncIP(x); 3.120 s

y = myfunc(x); 3.104 s

x = myfuncIP(x); 3.050 s

I have been playing with it for a while and I found that it seems to run a little smoother with those changes. I am not sure if it is because of personal taste or what. But I would suggest that you give those a try and tell me what you think. I am always open and thankful for feedback that I get from people on my coding. If I am in error about anything, or if someone finds a better way to get something done, I would always love to hear about it.

Hi,

Can I use this technique in case of recursion ? I wrote a deteminant of matrix function using symbolic toolbox for computing symbolic determinant by recursion technique.

How do I go about using it ?

regards,

Balavelan

Loren,

Thanks for pointing me to this blog entry; it is a better approach than manipulating caller variables. However, I’m having trouble getting it to do what I need. What I’m trying to write is something akin to the following:

function [x y] = myfuncIP2(x)

x = sin(2*x.^2+3*x+4);

y = x(1);

end

It looks to me like x is not handled in place unless it is the only return argument. Is this true? The only workaround I have to this is to set x.y = y and then have the caller pull it back out. But then if I have a big structure that needs to be handled in place, and it is passed all over in the code, then all of my functions are constrained to return their results by embedding them into this structure, which is ugly. Do you have any suggestions?

Thanks,

Jonathan

Jonathan-

To the best of my knowledge (and I just tried it with your example), you can have as many inputs and outputs as you like. And they don’t need to be in the same order. However, you

mustcall the function from another function to have the inplace operation have a chance. Perhaps you ran from a script or the command line?–Loren

I am calling it from another function, but have just noticed a bit more odd behavior. Here is what I’m currently using for testing:

function inplaceTest(x)

tic; x = myfuncIP(x); toc;

tic; x = myfuncIP(x); toc;

tic; x = myfuncIP(x); toc;

tic; [x y] = myfuncIP2(x); toc;

tic; [x y] = myfuncIP2(x); toc;

tic; [x y] = myfuncIP2(x); toc;

end

function x = myfuncIP(x)

x = 2*x;

end

function [x y] = myfuncIP2(x)

x = 2*x;

y = x(1);

end

I call it with:

inplaceTest(randn(38*2^20, 1))

and get this result:

Elapsed time is 0.316449 seconds.

Elapsed time is 0.152508 seconds.

Elapsed time is 0.152818 seconds.

Elapsed time is 0.338881 seconds.

Elapsed time is 0.398801 seconds.

Elapsed time is 0.399128 seconds.

These times are consistent over multiple runs; the first call to myfuncIP always takes twice as long as the next two, and always about the same amount of time as those to myfuncIP2. I am interpreting this to mean that for some reason the first call to myfuncIP is not in place, the next two are, and none of the final three calls are. I have tried this in more complicated examples, and have seen over an order of magnitude slowdown as a result of adding the extra parameter. Any ideas?

Thanks,

Jonathan

Jonathan-

The behavior you see is because the variable x has to come into inplaceTest and then a copy is made so nothing bad happens to the copy in the base workspace. Instead, create the initial x inside inplacedTest so the variable creation is locked down as well. I urge you to run my example from the initial post exactly as I wrote it. If you are on Windows, use the task manager and you will convince yourself there is no extra copying being done, even when you update it and use myfuncIP2.

–Loren

Loren,

any prospects for cellfun supporting in-place operations, please?

For example, given your demo

function inplaceTest(x)

% Call functions with either regular or in-place semantics.

%% Call a Regular Function with the Same Left-Hand Side

x = myfunc(x);

%% Call an In-place Function with the Same Left-Hand Side

x = myfuncIP(x);

%% Call a Regular Function with a Different Left-Hand Side

y = myfunc(x);

if I modify the functions as follows

function y = myfunc(x)

y = cellfun(@uminus, x, ‘UniformOutput’,false);

function x = myfuncIP(x)

x = cellfun(@uminus, x, ‘UniformOutput’,false);

then memory profile it:

n = 7*2^20;

x = randn(n,1);

x = {x};

profile -memory on

inplaceTest(x)

profile viewer

looking under inplaceTest, you can notice Allocated Memory has the same large value (57358.45 Kb) for myfunc and myfuncIP.

In contrast, if I don’t use cellfun in the functions,

function y = myfunc(x)

y = -x;

function x = myfuncIP(x)

x = -x;

then after memory profiling it

n = 7*2^20;

x = randn(n,1);

profile -memory on

inplaceTest(x)

profile viewer

then you’ll see negligible Allocated Memory for the in-place call, as expected.

I have a class that uses cell arrays to store its data, and its operators are all implemented using cellfun; it’d be nice to be able to do in-place operations for that class.

Thanks,

Felipe.

PS: I looked at Allocated Memory because running times don’t seem to change much unless we hit the virtual memory wall.

Felipe-

Thanks for the suggestion. This feedback will help us out. I am hoping you’d be willing to enter it here: http://www.mathworks.com/support/service_requests/contact_support.do

–Loren

Sure — I’ve just submitted it, as an enhancement request.

Felipe.

Hello,

I would like to know how much memory I could conserve by using [L A P]=lu(A); (inplace LU).

Actually I have an equation system, which runs out of memory when using x=A\b … If I used [L A P]=lu(A); x=A\(L\b); could I conserve some memory with this approach ?

Is anyhow real inplace x=A\b solution achievable in MATLAB ?

Thanks for reply.

Tamas-

LU doesn’t have in-place optimization. But in general, you should only code that way when it’s natural or you are desperate regarding memory. Otherwise code ends up too hard to understand later.

–Loren

Thanks for reply Loren,

actually I have to solve _very_ large linear systems (matrix sizes of 6-8 Gb), and it seems, when the code comes to the point of the A\b operation, MATLAB allocates approx. two times more as the system size. I wanted to know, if there is a way to avoid this, as it really matters if one has to have a computer with 16 or only 8 Gigs of memory. I know that iterative solvers use much less memory, but they converge very badly in my case.

Tamas-

If appropriate, sparse arrays MIGHT help. Parallel computing should help, if you have access to it.

–Loren

I tought about to pass the matrix A and array b to a MEX-function, which contains a call to the corresponding ZGESV LAPACK routine to solve my complex lin.eq. system. ZGESV solves the system in-place. I don’t know, if passing arrays to MEX-functions needs a duplicate of the array in the memory or not …

Tamas-

I believe the API to do that is not documented or supported so could change in the future. But it is certainly something you could try.

–Loren

Loren,

I don’t understand why, in inplaceTest, you say “if we changed this next call to assign output to a new LHS, we get an error”. In your example you don’t run y = myfuncIP(x). I seem to be able (R2009b) to do that. It would be awkward if a function that can compute in-place always needs the output argument to be the same as the input argument! The beauty of these in-place functions is that you can just change the variable name of the output argument in your old functions and get a nice speed and memory gain, while keeping perfect backward compatibility. But could this lead to problems in R2007a if the function is called with a different LHS argument?

Cris-

Perhaps you have more memory or swap space than me. On my machine, I run out of memory at that point. MATLAB has always had SOME inplace memory optimizations. They’ve just gotten better and more sophisticated over time.

–Loren

Ah, so it’s not because of the version of MATLAB. Excellent, thanks!

I’m running through some functions I use often, making sure the output and input argument names are the same… :)

Cris-

Version does matter, but it only gets better in newer versions. There was a time when even with the same input and output names, copies were always made.

–Loren

the stupid thing about memory is like this:

[V,D]=eig(H)

D is a n*n diagonal matrix!

it is absolutely stupid!

D can be a n*1 vector, that would save a lot of memory!

Jiang-Ming-

It’s true you can represent the eigenvalues in a vector, but if you look at the help and how they get used, you often need them in a matrix to do various matrix multiplications and other operations.

–loren

Hi Loren,

I understand how the inplace technique works and it works fine given that I work within functions and with variables in Matlab( I checked that it works fine using the secret option of format command :)).

The problem I face( may be I’m failing to understand something) is with achieving the same behaviour with object properties. In my understanding Matlab object properties are no different from normal variables in Matlab’s perspective. If I pass a property of an object to an inplace function there is a copy being made and the operation is not done inplace.

The following is the sample I tried

% DataWrapper.m

classdef DataWrapper < handle

properties

Data

end

methods

function obj = DataWrapper( data )

obj.Data = data;

end

end

end

% my_inplace_func

function in_data = my_inplace_func( in_data )

in_data = sin( in_data .* 2 );

end

% my_inplace_test with normal variables. No extra copies

% made. ( The nice case ).

function my_inplace_test()

data = rand( 2, 3);

data = my_inplace_func( data);

end

% my_inplace_test with object properties. Extra copy is

% being made here. Would be great if you tell me why so.

function my_inplace_test()

obj = DataWrapper( rand( 2, 3) );

obj.Data = my_inplace_func( obj.Data);

end

Or is this a bug or a feature to be implemented in future as part of the JIT?

Would be great if you can share some information on this because I use large data arrays and this problem is really stopping me to move to object oriented programming.

Shankar-

We will consider this and some related ideas as optimizations in the future. Thanks for you thoughts.

One less beautiful thing you could do for now is to get the property in a temp variable, set the property to [], make the call and set it back as in:

–Loren

I did try the workaround which you have suggested before. Sadly I will have to teach the clients of my class about these which is not what I want.

But I appreciate that you are willing to consider this. Would be great if this comes as part of Matlab soon.

Also I would like to add more to this since the same improvement needs to be done with field values of structures and cell arrays (copies made in those cases as well, like object properties).