Loren on the Art of MATLAB
January 18th, 2006
Inverting Logic in an Algorithm
A while ago, a colleague needed to use the sinc function to interpolate some channel data posted by the IEEE 802.15.3a working group. The files they supplied were large and the interpolation took a while. Seems like a job worthy inspecting the code and using the profiler (see references below). Here's what you'll find.
The sinc function used to contain this code, avoiding directly evaluating sinc(0):
y=ones(size(x));
i = find(x);
y(i)=sin(pi*x(i))./(pi*x(i));
The output is preloaded with ones and the function is only evaluated for nonzero input values. The problem with this is that the indexing in the third line is very expensive in terms of time.
Thinking about using the sinc function, it often gets used for signal processing waveform creation. So ask yourself, in a signal, how many of the time elements are 0 – and the answer is typically a very small number, usually either 0 or 1.
The following approach seems to run about 30 to 40% faster, based on a very unscientific study, and is what ships today in the Signal Processing Toolbox.
i=find(x==0);
x(i)= 1; % From LS: don't need this if /0 warning is off
y = sin(pi*x)./(pi*x);
y(i) = 1;
In this case, we look for the zero values (most likely a small number of them), replace them with ones, and then evaluate the expression on the whole input array. After that's complete, we replace the output values corresponding to input values of zero with ones.
Note the comment on the second line. If warnings for zero-division are off, then removing that line gives even more speed improvement. Why? Because a copy of the x array needs to be made if the second line is there since some values in x are being modified; the local copy no longer matches the array passed into the function. MATLAB will only use extra memory for an input array if it gets modified inside the function (see the documentation for more information).
Finally, an even shorter version, where we don't worry about the 0 division warning might look like this:
y = sin(pi*x)./(pi*x);
y(x==0) = 1;
In this version, we are performing the computation on the original array x, and then operating on the output, y using logical indexing to replace values where x = 0 with the value 1, taking advantage of MATLAB's scalar expansion.
Profiler References
Here are several pointers into the MATLAB documentation with information about the MATLAB Profiler.
11:21 UTC |
Posted in Indexing, Memory |
Permalink |
7 Comments »
You can follow any responses to this entry through the RSS 2.0 feed.
Both comments and pings are currently closed.
|
Skipping the 0 check could result in much worse penalty than copying the data array. On the Pentium IV architecture, FPU operations that result in NaN or Inf are very expensive. Divide-by-zero warnings being on or off doesn’t matter. For example:
warning(‘off’, ‘MATLAB:divideByZero’);
x = round(rand(1,1e7)); % Roughly equal number of zeros and ones
y = x + 1; % No zeros
tic; s = 1./x; toc
tic; s = 1./y; toc
The timing results on my machine come out to 2.2s for the first case, with 1/2 zeros, while the second case only takes 0.34 seconds.
Of course, your point in this was that the number of input elements that are exactly zero is very small. When I change the proportion to 10000 zeros (1%), then it’s only 0.38 vs the 0.34.
Just something to consider for other optimizations when there are many zeros. The same thing will apply to processing with NaNs.
To get around the sinc(0) = 1 problem, I add eps to the arguments in the numerator and denominator: sin(x + eps) ./ (x + eps). This gives negligible error and avoids the division by zero problem (unless x is -eps, which seems very unlikely). The following code:
% Compare sinc on a random vector
x = randn(1000000, 1);
y1 = sin(pi * x) ./ (pi * x); y1(x==0) = 1;
y2 = sin(pi .* x + eps) ./ (pi .* x + eps);
% Compare sinc on a zero vector
x = zeros(size(x));
y1 = sin(pi * x) ./ (pi * x); y1(x==0) = 1;
y2 = sin(pi .* x + eps) ./ (pi .* x + eps);
gives results within eps in the first case and identical results in the second. Using the profiler, we see execution times of 0.20/0.19 seconds for the random vector, and 0.42/0.05 seconds for the zero vector. Thus in both instances the ‘add eps’ version is faster. The results would also seem to bear out Peter’s comments above, that processing with NaNs is slow.
Hi Loren,
Is there anyway to globally cast the class of all variables to ‘single’ instead of ‘double’? For instance if I type a=10, I would like ‘a’ to be of class ‘single’ instead of ‘double’. Of course, I could do a=single(10), but I can avoid the additional casting step if I can set ‘single’ to be the default class.
Thanks,
Vijay
Vijay-
There is not a way to change the default. But you can do things like:
and write code that is agnostic with respect to what kind of floating point, e.g.,
–Loren
Hi Loren!
I had a piece of code
m = 2139; n = 10000; % a vector of sums S_stamp = single(rand(m, 1)*10); tic for i = 1:10 k = rand(m, n)>0.9; R = S_stamp(:, ones(1, n, 'int8')); A = zeros(m, n, 'single'); % I take sums, that meet the k condition A(k) = R(k); clear R k % than I sum them S.(['SS' int2str(i)]) = sum(A); clear A end toc % std([S.SS1 S.SS2 ..]After I read your article I did so:
I inverted k condition
z = single(0); tic for i = 1:10 k = rand(m, n)<0.9; R = S_stamp(:, ones(1, n, 'int8')); %I send zeros to the sums that now don't meet the initial k condition R(k) = z; clear k % than I sum the sums ans don't need to generate A = zeros ... THANK you % very much! S.(['SS' int2str(i)]) = sum(R); clear R end tocI got
Elapsed time is 23.300569 seconds.
Elapsed time is 22.413216 seconds.
It is a VERY good time improvement for my program! Thank you, Loren!
Please, Loren, can you help me ?
Than I decided to do so
for i = 1:10 k = rand(m, n)>0.9; R = S_stamp(:, ones(1, n, 'int8')); S.(['SS' int2str(i)]) = sum(R(k)); clear R endbut …
ans =
3
1
5
6
7
It seemed to me that R looses it’t structure
if I
I don’t get what I am looking for
Thank you very much for any tip!
Mykola
Mykola,
Logical indexing essentially deletes the zero elements and since there are not necessarily the same number left in each column, MATLAB returns the results in a single column vector. You should read more in the documentation about logical indexing. To get the result you want, you could instead try this:
–Loren
Loren, your code and your recommendations helped me a lot!
Thank you very much!
-Mykola