# Partial sorting in MATLAB

- Does MATLAB have partial sort functions?
- Will they be faster for his application?

I was recently working with a researcher who had to sort arrays of double precision numbers thousands of times. In his application, however, he didn't need the full sorted list. He only wanted the lowest N/2 elements of the array sorted. As such, he had two questions.

- Does MATLAB have partial sort functions?
- Will they be faster for his application?

The answer to the first question is easy. MATLAB has mink(x,k) and maxk(x,k) which find the lowest or highest sorted k elements from the array x respectively.

The answer to the second question "Will partial sorting be faster?" depends on k as you might expect. Let's investigate how

We create a 100,000 element array of doubles

x = rand(100000,1);

I'm going to sort this array 300 times to match how it's used in the user's application. Of course, in the real application, x varies every iteration but I want to isolate just the sort

tic

for i=1:300

y = sort(x);

res = y(1:10); % We only want the lowest 10

end

sortTime=toc

If all we want is the 10 smallest numbers in x sorted, we can alternatively use mink

tic

for i=1:300

res = mink(x,10);

end

partialsortTime=toc

fprintf("Partial sort is %fx faster than full sort for k=10",sortTime/partialsortTime)

So, as you might expect, partial sorting is very useful when you only want a small number of sorted elements from a large array

Next, I use the mink function to request the lowest k numbers where k varies from 100 to 15000, As before, I repeat each sort operation 300 times.

minkResults = zeros(150,1);

minkSize = zeros(150,1);

for j=1:150

tic

for i=1:300

y=mink(x,j*100);

end

minkResults(j)=toc;

minkSize(j)=j*100;

end

Plotting the results, we see that a partial sort is only faster than a full sort if we are requesting fewer than 4200 elements -- that's less than 5% of the full vector.

plot(minkSize,minkResults,'.')

yline(sortTime)

legend("Partial sort", "full sort")

xlabel("Number requested in partial sort")

ylabel("Time (s)")

title("Time taken to sort " + numel(x) + " elements 300 times")

crossing = minkSize(find(minkResults>sortTime,1,"first"));

fprintf("For x of size %d, mink(x,k) is faster than sort(x) only if k < %d}",numel(x),crossing)

You may be disappointed with this result because, for this example at least, partial sorting is only the better option if you want fewer than 5% of the total number of elements sorted. The primary reason for this is that MATLAB's sort function is both fast and multithreaded. It's pretty hard to beat!

Let's take a look at how sort scales with 100,000 elements. I'll repeat the computation 300 times to get a stable average.

% Scaling of sort with respect to threads

arraySize = 100000;

repeats = 300;

sortSpeedTest(arraySize,repeats)

The algorithm used in the mink function, on the other hand, does not lend itself to multithreading. As such, it is completely single threaded. Let's see this explicity using the same sized input matrix, requesting the lowest 10,000 numbers be sorted.

%Scaling of mink with respect to threads

k=10000;

minkSpeedTest(arraySize,repeats,k)

It's pretty clear that mink doesn't benefit from differing numbers of threads. Any differences we see are just measurement noise. If sort(x) were not mulithreaded, mink(x,k) would be faster here but multithreading helps a lot.

As we increase k, there will eventually come a point where even comparing single threaded sort with mink results in mink losing. On my machine, k=16000 does it

%Scaling of mink with respect to threads

k=16000;

minkSpeedTest(arraySize,repeats,k)

Slightly slower than a single threaded full sort! A possible future enhacement to MATLAB's mink function would be to switch to using sort internally for large k since, sooner or later, mink's algorithm just isn't the right choice.

You will probably have different results from me if you run these tests on your machine. As such, some information about my machine may be useful. Here, I use the excellent cpuinfo function by Ben Tordoff, available on the MATLAB File Exchange.

cpuinfo

In summary, partial sorting in MATLAB can be useful but only when used appropriately. Have you used mink or maxk in your work...or have you just discovered them now? What additional sorting functionality do you think MATLAB should have?

function result = sortSpeedTest(arraySize,repeats)

% Runs sort on an array of size arraySize using differing numbers of threads

format short

averageTime = zeros(4,1);

SpeedUp = zeros(4,1);

numThreads = [1;2;4;8];

x = rand(1,arraySize);

for testNum = 1:4

threads = numThreads(testNum);

maxNumCompThreads(threads);

tic

for i=1:repeats

y = sort(x);

end

averageTime(testNum) = toc/repeats;

SpeedUp(testNum) = averageTime(1)/averageTime(testNum);

end

result = table(numThreads,averageTime,SpeedUp);

end

function result = minkSpeedTest(arraySize,repeats,k)

% Runs mink on an array of size arraySize using differing numbers of threads

format short

averageTime = zeros(4,1);

SpeedUp = zeros(4,1);

numThreads = [1;2;4;8];

x = rand(1,arraySize);

for testNum = 1:4

threads = numThreads(testNum);

maxNumCompThreads(threads);

tic

for i=1:repeats

y = mink(x,k);

end

averageTime(testNum) = toc/repeats;

SpeedUp(testNum) = averageTime(1)/averageTime(testNum);

end

result = table(numThreads,averageTime,SpeedUp);

end

## 评论

要发表评论，请点击 此处 登录到您的 MathWorks 帐户或创建一个新帐户。