Should min and max Marry?38

Posted by Loren Shure,

Do you ever need both the minimum and maximum values on the same data? Sometimes I do, for example, when I want to tweak the limits of a 2-dimensional plot. So I was wondering whether that is a common task? I also wondered what the overhead is in calling min and max separately, instead of scanning through the data once, looking for both the lowest and highest values.

The Plan

I will explore this idea by creating my own simplified versions of max and min, working only for real vector inputs, not checking for NaNs, and not doing any error checking. Clearly, the code I show is not meant for production but to get to the heart of the algorithm.

Create Data

Here I create a very small test dataset.

n = 10;
x = rand(n,1)
x =
0.9145
0.1215
0.3765
0.7912
0.1848
0.0508
0.5562
0.4633
0.8539
0.0384


Simplified min function

This function is simple, as advertized.

type mymin
function xmin = mymin(x)
%MYMIN Minimum value of real vector input.
%   MYMIN is a simplified version of MIN that
%   finds the minimum value for
%   real-valued vector.  NaNs are not treated.
xmin = Inf;
for ind = 1:length(x)
if x(ind) < xmin
xmin = x(ind);
end
end



Test mymin Function

Now I compare the result from mymin to the built-in min function in MATLAB.

xmin = mymin(x);
xminML = min(x);
tf = isequal(xmin, xminML)
tf =
1


Test mymax Function

Do similar testing with mymax.

xmax = mymax(x);
xmaxML = max(x);
tf = isequal(xmax, xmaxML)
tf =
1


Larger Data

Now I want to do some timing so I will create a much larger array of data.

n = 3e7;
xL = rand(n,1);
tic
xLmin = mymin(xL);
timeMin = toc
tic
xLmax = mymax(xL);
timeMax = toc
timeMin =
0.7469
timeMax =
0.7491


Combined Function

Now let me show you my combined function myminmax that loops through the data the same way mymin and mymax did, but does both calculations in the loop together.

type myminmax
function [xmin, xmax] = myminmax(x)
%MYMINMAX Extreme values of real vector input.
%   MYMINMAX is a simplified version of MIN and
%   MAX combined and finds the
%   minimum  and maximum values for real-valued vector.
%   NaNs are not treated.
xmin = Inf;
xmax = -Inf;
for ind = 1:length(x)
if x(ind) < xmin
xmin = x(ind);
end
if x(ind) > xmax
xmax = x(ind);
end
end


First, let's check that we get the expected results.

[xminNew, xmaxNew] = myminmax(x);
tf = isequal([xminNew xmaxNew], [xmin xmax])
tf =
1


Time Combined Function

And now let's time the combined function.

tic
[xLminNew xLmaxNew] = myminmax(xL);
timeMinmax = toc
timeMinmax =
1.0614


Compare Total Times

To compare the times, let's look at the sum for the times calling the individual functions vs. calling the combined one.

t2 = timeMin+timeMax;
format long
disp([t2 timeMinmax])
Adding separate times     Combined Time
1.495988735998570   1.061442445897453


Reset format to default

format short

Worth It?

Is it worth having a combined function for min and max from a speed point of view? I don't know. It depends on many things, including how often it's needed, and if finding the min and max values is one of the dominant time consumers in the overall algorithm. What do you think? Let me know here.

Get the MATLAB code

Published with MATLAB® 7.6

Note

Jessee replied on : 1 of 38

If it’s a common procedure and there’s a more efficient way of doing it, I say marry them up. Perhaps a more generalized function would be appropriate, e.g.

[xmin,xmax,xfun] = statfun(x,’min’,’max’,’fun’)

where ‘fun’ could be any function that follows the form of your mymin function.

Yehuda S. replied on : 2 of 38

Very common indeed; my own function for that purpose is
[amin,amax]=extrema(a(:))

Gordon Saksena replied on : 3 of 38

Since R14, prctile() in the stats toolbox lets you compute multiple percentiles using one call to sort(). This helps a lot with large datasets. However, sort() is O(n log(n)) rather than O(n), and there are occasions to compute a lot more than two percentile values from the same data.

Tim Davis replied on : 4 of 38

Sounds plausible, but where do you stop? Do you also return the 2nd largest and 2nd smallest values? The mean? The mode? The median? The sum? Do you count how many Nan’s and/or Inf’s?

Since min and max are linear time functions, keeping them separate seems like a good idea to me (modularity vs efficiency with potential combinatorial explosion of things to do). You don’t lose much efficiency anyway.

Loren replied on : 5 of 38

FYI,

I haven’t tried it myself, but you might be interested to know that there is a submission on the File Exchange that has this functionality though not the speed.

–Loren

Cris Luengo replied on : 6 of 38

We have a function to compute both maximum and minimum at the same time in our image processing library. It makes a lot of sense: it saves time, it’s a simple, short function, and it’s used very often when displaying images.

By the way, since the minimum and the maximum are never the same value, you can get away with combining the two tests inside the loop:


if x(ind) < xmin
xmin = x(ind);
elseif x(ind) > xmax
xmax = x(ind);
end



This will return the correct values if you initialize both xmax and xmin to the first data value. Though I guess it doesn’t increase the speed all that much.

Loren replied on : 7 of 38

Tim-

I understand what you’re saying. I wasn’t trying to introduce “new” functionality in this case, just following what max and min did already (without the nan checking for laziness).

I *think* min/max are more matched and used together more often than the others you say, but we may hear otherwise.

–Loren

David Schwartz replied on : 8 of 38

I frequently use a similar function that I coded, but I prefer that it return a single array so that I can use the idiom diff(minmax(x)). It also needs to handle arrays, N-dim arrays, and NaN’s in the same manner as min and max.

-David-

Loren replied on : 9 of 38

David-

Thanks. I had been wondering whether to return 2 outputs or 1. I guess we could look at nargout and if it was 1, return your outputs, and otherwise, return 2 separately.

I had certainly intended a version that handled higher dimensions, but didn’t think the code would be as illustrative here.

If you have higher dimensions and a single output, do you interleave the mins and maxs for each dimension do you can still do your diff? I guess I don’t quite see the right design in my head just yet.

–Loren

Mansour replied on : 10 of 38

Hello,
Thanks for this article. To compare the result more accurately, I wrote the following code:

function TestSpeed()
for i=0:7
n=10 ^ i;
[timeMinMax,timeMyMinMax]=GetSpeed(n);
SpeedIncrease=((timeMinMax-timeMyMinMax)*100)/timeMinMax;
fprintf(‘n=10e%d \t Matlab Min and Max function time=%e \t MyMinMax=%e \tSpeed Increase =%f \n’,i,timeMinMax,timeMyMinMax,SpeedIncrease);
end

function [timeMinMax,timeMyMinMax]=GetSpeed(n)
xL = rand(n,1);
tic
xLmin = min(xL);
xLMax=max(xL);
timeMinMax = toc;
tic
xLMyMinMax = MyMinMax(xL);
timeMyMinMax = toc;
%fprintf(‘min and Max function time=%e \n Min and Max combined=%e \n’,timeMinMax,timeMyMinMax);

function [xmin xmax]=MyMinMax(x)
xmin = Inf;
xmax = -Inf;
for ind = 1:length(x)
if x(ind) xmax
xmax = x(ind);
end
end

and the result is:

n=10e0 Matlab Min and Max function time=1.173333e-005 MyMinMax=1.484826e-003 Speed Increase =-12554.761905
n=10e1 Matlab Min and Max function time=1.564445e-005 MyMinMax=1.116064e-003 Speed Increase =-7033.928571
n=10e2 Matlab Min and Max function time=1.955556e-005 MyMinMax=4.162540e-005 Speed Increase =-112.857143
n=10e3 Matlab Min and Max function time=8.325080e-005 MyMinMax=1.123048e-004 Speed Increase =-34.899329
n=10e4 Matlab Min and Max function time=7.509334e-004 MyMinMax=8.366985e-004 Speed Increase =-11.421131
n=10e5 Matlab Min and Max function time=7.564649e-003 MyMinMax=8.107455e-003 Speed Increase =-7.175567
n=10e6 Matlab Min and Max function time=3.503490e-002 MyMinMax=6.403859e-002 Speed Increase =-82.785127
n=10e7 Matlab Min and Max function time=3.547286e-001 MyMinMax=3.776955e-001 Speed Increase =-6.474495

apparently Matlab Min and Max are faster that my combined minmax but the difference in speed is notable whne the array size is small. When the size of arrays increases, the speed of two technique become similar ( I think one reason for this phenomena is the fact that Matlab compiles MyMinMax function when I called it but it doesn’t do this for Min and max. We can check it and I may do it at a later time). Interestingly the data for n=10e6 is not consistent with other data. I did another try and run the same program again( in hope that no other load on my windows system generates such inconsistency) and the result is follow:

n=10e0 Matlab Min and Max function time=1.313016e-005 MyMinMax=1.123048e-004 Speed Increase =-755.319149
n=10e1 Matlab Min and Max function time=1.592381e-005 MyMinMax=3.966985e-005 Speed Increase =-149.122807
n=10e2 Matlab Min and Max function time=1.843810e-005 MyMinMax=4.022858e-005 Speed Increase =-118.181818
n=10e3 Matlab Min and Max function time=8.325080e-005 MyMinMax=1.150984e-004 Speed Increase =-38.255034
n=10e4 Matlab Min and Max function time=7.520509e-004 MyMinMax=8.375366e-004 Speed Increase =-11.367013
n=10e5 Matlab Min and Max function time=8.232331e-003 MyMinMax=8.526223e-003 Speed Increase =-3.569974
n=10e6 Matlab Min and Max function time=3.553329e-002 MyMinMax=3.743828e-002 Speed Increase =-5.361144
n=10e7 Matlab Min and Max function time=3.784671e-001 MyMinMax=4.062687e-001 Speed Increase =-7.345838

It is completely different from the first run. Lets run it for the third time:

n=10e0 Matlab Min and Max function time=1.368889e-005 MyMinMax=1.128635e-004 Speed Increase =-724.489796
n=10e1 Matlab Min and Max function time=1.648254e-005 MyMinMax=3.911112e-005 Speed Increase =-137.288136
n=10e2 Matlab Min and Max function time=1.815873e-005 MyMinMax=3.994921e-005 Speed Increase =-120.000000
n=10e3 Matlab Min and Max function time=8.576509e-005 MyMinMax=1.145397e-004 Speed Increase =-33.550489
n=10e4 Matlab Min and Max function time=7.495366e-004 MyMinMax=8.520636e-004 Speed Increase =-13.678718
n=10e5 Matlab Min and Max function time=7.511010e-003 MyMinMax=8.101588e-003 Speed Increase =-7.862828
n=10e6 Matlab Min and Max function time=3.592496e-002 MyMinMax=3.764808e-002 Speed Increase =-4.796454
n=10e7 Matlab Min and Max function time=3.830791e-001 MyMinMax=3.967281e-001 Speed Increase =-3.562956
again different results. Why, do you know? ( I know it but I leave it for you to make some comments!)

Best Regards
Mansour

Cris Luengo replied on : 11 of 38

Mansour, your second and third run produce similar enough results. They are never going to be exactly the same because your input is random. The first time your MyMinMax was called it took an order of magnitude longer to compute, that’s the time it takes to load and compile the code.

Loren replied on : 12 of 38

Cris is right, Mansour. That’s partly why I (so cleverly) used all the functions I needed BEFORE I did the time comparison.

–Loren

Loren replied on : 13 of 38

Folks-

For really measuring performance, you might take a look at this file exchange entry by Bill McKeeman. It mentions first time costs of running functions among other tidbits that make measuring performance tricky.

–Loren

Mansour replied on : 14 of 38

yes, I know that they would not be the same as the input is random (I just wanted to have a subject for more chatting!) . But I am wondering why it took that much time to run the function, when n=10e1
n=10e1 Matlab Min and Max function time=1.564445e-005 MyMinMax=1.116064e-003 Speed Increase =-7033.928571

At this time myminmax function should be compiled and reused by Matlab.

Anyway, the result is: Always use Min and Max and don’t use your own MinMax function !
Am I right?

Regards
Mansour

Loren replied on : 15 of 38

Mansour-

The point was never whether to use the built in min and max functions. The point was to illustrate that asking for the min and max together can be faster than calling each one separately.

–Loren

Aaron replied on : 16 of 38

While I can’t say that I often want both the max and min of a vector, I have done it once or twice. A more useful addon for me, would be a max/min that could return the top/bottom x elments. I often find that I am sorting large vectors just to take the top 2 or 3 elements. It’s has yet to be a bottleneck issue so I haven’t written the mex code to do it proporly but I’ve considered it a few times.

Mike Boedigheimer replied on : 17 of 38

There is a need, and its almost provided by the RANGE function. It does the calculations, now just edit the code to return the desired values.

Martin Cohen replied on : 18 of 38

For computing both the min and max, I remember from my algorithms class that the number of comparisons can be reduced by 1/3 by, once you get started, using two consecutive values and comparing the min with the smaller and the max with the larger.

Something like this:

vmin = vmax = x[1];
for ( i=2 to n-1 by 2 )
{
if ( x[i] <= x[i+1] )
{
if ( x[i] vmax ) vmax = x[i+1];
}
else
{
if ( x[i] > vmax ) vmax = x[i];
if ( x[i+1] < vmin ) vmin = x[i+1];
}
// handle possible last entry

Note this does 3 comparisons per 2 values, or 1.5 comparisons per value.

martin cohen

naor replied on : 19 of 38

While we are considering new features of the min/max functions, I would request an optional syntax to return the index of min/max value. What’s wrong with [x i]=min(A) ? Only that you need the occasional extra line, and a temp var that you never use. Purely an aesthetic issue but still…

Graeme E. Smith replied on : 20 of 38

It strikes me that no one has considered the readability of the code, which for a large software project can be very important. Having separate min and max functions is a good way to ensure that the code has a logical structure that is easy to follow. But such structure does come at a cost: it is often not the most efficient way to write the code. I’ve worked on projects where code efficiency was regarded (sensibly in my mind) as secondary to readability and maintainability. However, in these situations we were well within the performance limits of the processor and relied on the optimizer in the compiler (it was Ada83 not Matlab) to improve code efficiency.

If I was in a situation where there were real performance problems then I’d be happy to have a combined function. However, under such circumstances why not use the mex compiler and write a C++ function which runs faster? Or better yet find someone who can programme machine code and get them to produce me a minimum number of instructions version of the minmax function.

Over all, I’d say merging function together starts to make Matlab a less desirable choice for programming. I use Matlab because the m-language is incredibly readable making it great for rapid aggressive prototyping projects. I accept the performance limitations compared with a language such as C++ because of this. If m-language was changed to make it less readable but more efficient then C++, which is likely to always be faster, would become the better choice.

Tim Davis replied on : 21 of 38

If this min/max function were implemented for real (in a C mexFunction), then for small vectors it would be no faster than calling min and max separately, because what really counts is memory traffic (the number of comparisons being cut by 1/3 is noise; processors are fast, memory is slow). The first access of x, a=min(x) would bring x into cache, and then b=max(x) would be “free”. So no win there.

The only win would be for x large enough to be bigger than cache, in which case [a b] = minmax(x) would be twice as fast as doing both a=min(x) and b=max(x) separately. [Note that I typed in that comment before doing the experiment below].

But the algorithm is already so fast that the extra speed (at most a factor of 2) wouldn’t make much difference to almost all applications. Where speed really matters, just write a mexFunction:

#include “mex.h”
/* [a b] = minmax (x), assumes x is a double vector. No error checking */
void mexFunction
(
int nargout,
mxArray *pargout [ ],
int nargin,
const mxArray *pargin [ ]
)
{
double a, b, y, *x ;
mwSignedIndex i, n ;
x = mxGetPr (pargin [0]) ;
n = mxGetNumberOfElements (pargin [0]) ;
a = (n == 0) ? 0 : x [0] ;
b = (n == 0) ? 0 : x [0] ;
for (i = 1 ; i a) a = y ;
else if (y < b) b = y ;
}
pargout [0] = mxCreateDoubleScalar (a) ;
pargout [1] = mxCreateDoubleScalar (b) ;
}

Compared with tic ; a=min(x) ; b=max(x) ; toc, it’s about 6 times faster. Of course, it doesn’t handle NaN’s like min and max do. It should only be twice as fast, of course (that is, the same time as just min or max itself). I’m sure the difference is that the MATLAB min and max have to do more than what I’ve got above (handle matrices, NaN’s, etc).

If you compare the above mexFunction with another one that just computes the min or the max, its run time is the same as computing both the min and the max at the same time. Which proves my point … it’s not the comparisons that matter, it’s the memory traffic.

Tim Davis replied on : 22 of 38

Oh, nuts … I tried the “less than sign pre gt than sign” thingy but it didn’t work, and the “less than sign” simply disappears so you can’t read my code, above.

martin cohen replied on : 23 of 38

And if you use the same technique, your mex function will take 1/4 (not 1/3 as I said earlier) less time for large arrays.

laz replied on : 24 of 38

I think it’s great that fast functions like minmax are being considered, but i think that there are other optimizations that matlab needs first.
For example, median is O(n log n), when average case O(n) is easy reworking of quicksort.
see http://en.wikipedia.org/wiki/Selection_algorithm
same method can be used to make prctile(x,p) O(n) for scalar p.

Another misuse of sort() is randperm which has an easier O(n) algorithm. in fact, using sort in randperm leads to (small) bias because rand can repeat numbers.
see http://en.wikipedia.org/wiki/Fisher-Yates_shuffle

Cun Zhang replied on : 25 of 38

Great Idea!

Tim Davis replied on : 26 of 38

Martin: I encourage you to give your method a try; it would be a real eye-opener for you. Try modifying my mexFunction with your pseudo-code. You’ll find that your analysis is incorrect.

Processors are 10 times faster than memory, and this is a memory-bound algorithm. That is, the time it takes is limited by the speed of getting data from memory, not the time that individual comparisons take. The comparisons here are free, in terms of performance.

There are computations where processor speed is the limiting factor. This is not one of them. For example, computing x=A\b is limited by processor speed, but min/max isn’t.

Tim Davis replied on : 27 of 38

Actually, there’s no need to try, if you look at my previous comment. Compare the performance of computing both the min and max at the same time versus computing just the min. These two functions take the same time, even though minmax.c is doing twice the number of comparisons.

Tim Davis replied on : 28 of 38

randperm can be done in O(n) time, as laz points out. I’ve got one in CSparse, called cs_randperm.

It uses a minor variant of the shuffle method laz cites on Wikipedia. Mine is not ideal because it uses rand()%(n-k), and that’s not so good because the lower bits from rand() are as “random” as the upper bits. I didn’t write it because I needed statistical purity, though. I use it in dmperm (the one in MATLAB) except that dmperm in MATLAB doesn’t have an option given to the user to use its randomized version.

Is the performance of randperm critical to your application?

Loren replied on : 29 of 38

Folks-

Much food for thought here and I thank you for it. I will mention just a couple of points.

I was trying to see how common the combined use of min and max are. I still don’t know if I have a feel for that. I was also trying to understand how big a portion of your overall algorithms were dominated by the calls to min and max vs. spending time doing lots of other stuff as well. My guess is that min and max, while perhaps taking some time, would not be dominant in most of your application profile reports. But again, I don’t know.

We do often have to decide in a design what to do for large for small data sizes because as Tim mentioned, algorithms can be dominated by computer architecture, etc.

As for readability, I am not sure that minAndMax, or whatever name might be used, we be much less readible than calling min and then max. Sometimes the function names do help out. Still I understand the desire for readability and applaud it totally.

Thanks again for chiming in!
–Loren

Kelly replied on : 30 of 38

I have my own version of a minmax function. I won’t chime in on the speed discussion, since I usually use this function for convenience rather than speed. My usual use is to equate axis limits (or colormap limits) of several plots. My minmax returns a single value for min and max regardless of the number of dimensions in the input data. This way, I can just pass all the y data, or all the xdata, that I plan to plot and figure out what my axis limits need to be, like so:

ax(1) = subplot(2,1,1);
plot(x1,y1);
ax(2) = subplot(2,1,2);
plot(x2, y2);
set(ax, ‘ylim’, minmax([y1 y2]));

It’s perhaps a trivial use, but I use it pretty often.

aslak grinsted replied on : 31 of 38

I’d really like to see a new “range” function giving me both min and max. In my case it is more for convenience than for speed. I really often find my self needing exactly that.

Mike N replied on : 32 of 38

I would like to see MATLAB in-lining function calls, in this case, with a bit of optimisation, it would be possible to do both calculations inside a single loop, saving the time, without requiring users to make their code less clear.

A more practical example would be inlining calls to functions coded in MATLAB – making it as efficient (say) to call a function within a loop as to execute the same program with the function hand coded inside the loop.

Daniel Armyr replied on : 33 of 38

Hi.
Today I was implementing an algorithm and I came to think of this post and the idea of “marrying” related functions.

In my work I regularly calculate the mean and variance of several hundreds of matrices, each with over a milion elements. Now, this usually runs faster than one might suspect and limited amouts of RAM are generally the big problem.

However, when calculating the variance, one gets the mean as a by-product. Would it be possible for var (or a variant of it) to output the mean as well as the variance? Would it be useful? It would definately be economical.

Sincerely
Daniel Armyr
Newly appointed “Head Matlab Expert” at his department.

Loren replied on : 34 of 38

Daniel-

Congratulations on your new position! Certainly mean and var could be combined? I don’t think it would do away with the need for mean, but could be more efficient IF people realized how to best use the new variance to get both. The question I have for all of these thoughts is

1) is that really a bottleneck in the code
2) will uses figure it out and find what they need

Thanks for the suggestion.
–Loren

Daniel Armyr replied on : 35 of 38

1) Due to the sheer bulk of data, memory allocation and shuffling is probably the tighter bottleneck, but the variance/mean does take visible time. At this time I have a java implementation to handle these calculations in the time-critical applications, so I can’t provide you with profiling output, unfortunately.

2) Well, if it were documented, I suppose some would find it. I know I looked there before posting here. But I am sure you have usability experts who can answer that question for you.

Loren replied on : 36 of 38

Daniel-

–Loren

John replied on : 37 of 38

Hi Loren,
Your function is pretty useful for me.
However, what I’m trying to figure out is how to find the first Max.100 values and the minimum 100 values among dataset.
any idea?

Loren replied on : 38 of 38

John-

I would sort the data and select the first and final 100 values.

–Loren