Loren on the Art of MATLAB

Constrained Sorting of Arrays 37

Posted by Loren Shure,

In a variety of applications, customers have requested a way to sort arrays subject to some constraints. One use case is when a row in an array represents a data record. Clearly, to sensibly sort data of this kind, you need to keep each row in tact. The sort function in MATLAB gives you one tool, the permutations, as the second output so you can do secondary array manipulations with it. With character arrays and cell arrays of strings, it made sense to do a lexical sort, which you can do with the function sortrows.

Contents

Regular Sorting

Suppose you have a numeric array A and want to sort the vytalues in each column. All you have to do is apply the functionjj sort.

A = [1 200 300;
2 200 200;
3 100 100;];
sort(A)
ans =
     1   100   100
     2   200   200
     3   200   300

To generate the permutations used in the sorting, use a second output.

[~, ind] = sort(A);

Sort by a Particular Column

I can also choose to sort all of A by a particular column. First sort that column, and then use those indices to sort the entire array.

[~, index2] = sort(A(:,2));
Asort2 = A(index2,:)
Asort2 =
     3   100   100
     1   200   300
     2   200   200

I can reorder another matrix B according to the sort order for column 2 generated by sorting A. You might want to do something akin to this if you reorder an array of eigenvalues and want to reorder the associated eigenvectors.

B = reshape(1:9,3,3)
B(index2,:)
B =
     1     4     7
     2     5     8
     3     6     9
ans =
     3     6     9
     1     4     7
     2     5     8

Use sortrows for an easier way to keep row data together upon sorting. I can do so while choosing which column to sort by.

Asortrows2 = sortrows(A,2)
Asortrows2 =
     3   100   100
     1   200   300
     2   200   200

This is the same result as that achieved earlier with sort and indexing combined.

isequal(Asort2, Asortrows2)
ans =
     1

Sort with Secondary Sort

Suppose I want to keep rows in tact, and sort an array according to values in a particular column, and break ties in sorting according to another column. I can do that with sortrows as well.

sortrows(A, [2, 1])
ans =
     3   100   100
     1   200   300
     2   200   200

Sorting Strings

I can use sortrows to sort a cell array of strings alphabetically.

str = {'one'; 'two'; 'three'; 'four'; 'five'}
strsort = sortrows(str)
str = 
    'one'
    'two'
    'three'
    'four'
    'five'
strsort = 
    'five'
    'four'
    'one'
    'three'
    'two'

Sorting a 2-D Array of Strings

Suppose I have a 2-D cell array of strings now. I can sort each column of strings, independent of each other (like sort does for numeric values), getting the sorting permutation in a second output if I wish.

xt = str';
xta  = [xt; xt(randperm(5)); xt(randperm(5))]
[xtasort,indexxta] = sortrows(xta)
xta = 
    'one'     'two'      'three'    'four'     'five'
    'five'    'four'     'one'      'three'    'two' 
    'five'    'three'    'one'      'four'     'two' 
xtasort = 
    'five'    'four'     'one'      'three'    'two' 
    'five'    'three'    'one'      'four'     'two' 
    'one'     'two'      'three'    'four'     'five'
indexxta =
     2
     3
     1

And I can sort the string rows according to contents of specific columns.

[xta23,indexxta23] = sortrows(xta,[2,3])
xta23 = 
    'five'    'four'     'one'      'three'    'two' 
    'five'    'three'    'one'      'four'     'two' 
    'one'     'two'      'three'    'four'     'five'
indexxta23 =
     2
     3
     1

Assorted Sorting?

In addition to what I've mentioned, you can choose to sort, via the function sort, in descending order in addition to ascending. What sort of sorting do you need to do with your data and algorithms? Do you use lexical sorting? How about dependent sorts, e.g., according to some specific columns, or according to the sort order of an array of data? Let me know here.


Get the MATLAB code

Published with MATLAB® 7.9

37 CommentsOldest to Newest

Getting the permutation indexes and then using them to reorder other columns / matrices is something I do every now and then, and so far it has been the only type of sorting I need.

It would be nice to be able to sort an array of objects of a custom class, provided the class implements lt() or gt(). Right now I find myself often do things like

keys = getKey(arrayOfObjects);   % need to implement getKey() 
[foo,order] = sort(keys);
arrayOfObjects = arrayOfObjects(order);

as compared to

arrayOfObjects = sort(arrayOfObjects);

Alex-

You could overload sort in the same way that you create any class method. That would get you the shorter syntax. You might put in an enhancement request (support link to the right) to make the bigger suggestion about supplying lt, gt and having sort just work for the class.

–Loren

Hi.
I have two matrices A and B. They have the same dimension and their elements are corresponding. For example, consider A to be the matrix of asset prices and B to be the corresponding matrix of returns. I want to sort A by every 12th column and have the corresponding element in B moved along when the elements in A are sorted. Please, how can I do this. Thank you.

Waleem,

Something like this:

[a,ind] = sort(a(:,12));
[A(ind,:),B(ind,:)]

If you want to do it every 12th column, you could use a for-loop to do the sort and copy chunk by chunk.

–Loren

Hi Loren,

Say I have data Y below. Column 1 indicates category and column 2 is data value within that category. I would now like to group values from column 2 according to numerical category and create 3 new variables.

Normally I would say

group1=Y(1:4,:);
group2=Y(5:8,:);
group3=Y(9:12,:);

However, if I have a large data set with unequal numbers, it would be efficient to have a function according to which I can group my data in one column according to values in another column.

Do you have any suggestions?

Martin

Y =
1.0000 9.1000
1.0000 9.1000
1.0000 9.1000
1.0000 9.8000
2.0000 8.1000
2.0000 9.2000
2.0000 10.0000
2.0000 10.4000
3.0000 11.5000
3.0000 11.6000
3.0000 11.7000
3.0000 11.8000

Martin-

You might be able to use some functionality from Statistics Toolbox like grpstats. Here’s something simple that might work in any case.

group1 = Y(Y(:,1)==1,:)
group2 = Y(Y(:,1)==2,:)

etc. If you have a lot more bins, you could use a for loop, but I wouldn’t put the values into different names then, I would place them in a struct with meaningful fieldnames or into a cellarray so unequal numbers can be collected.

–Loren

Hi Loren,

This works perfectly well – thank you very much! As you said I can then arrange my data in a struct:

data.group1=Y(Y(:,1)==1,:);
data.group2=Y(Y(:,1)==2,:); 
data.group3=Y(Y(:,1)==3,:);

data = 

    group1: [16x2 double]
    group2: [10x2 double]
    group3: [15x2 double] 

Best wishes,
Martin

Martin-

Glad that solution works for you. It’s much more scalable because of dynamic fieldname references than just naming the variables sequentially.

–Loren

Dear Loren,

I have the following problem. I have an array ‘surv’ consisting of 2 columns:

Y =

    1.0000   10.1081
    1.0000    7.3539
    1.0000   11.0171
    1.0000    4.7315
    2.0000    2.1726
    2.0000    5.1508
    2.0000    5.3562
    1.0000    4.9946
    ...       ...

I would not only like to group values of column 2 according to values 1 in column 1 as done in the above example:

data.group1=Y(Y(:,1)==1,:);

but I would like to combine that inclusion criterion with the additional criterion that values in column 2 are numerically above 5. In this case I am looking for a way to combine 2 sorting criteria.

Do you have any suggestions?

Very best wishes,
Martin

Martin-

Could you please post a TINY example of exactly what output you want based on a VERY SMALL input data set? Thanks.

–Loren

Hi Loren,

For example, from this nx2 input

input =

    1.0000    6.1081
    1.0000    2.3539
    1.0000    4.7315
    2.0000    2.1726
    2.0000    5.1508
    2.0000    4.3562

I would like the following nX2 output

output =

    1.0000    6.1081
    2.0000    2.1726
    2.0000    5.1508
    2.0000    4.3562

excluding those data rows where the value of the first column equals 1 AND the value of the second column is less than 5.000

Martin

Martin-

Try this:

exclude = (input(:,1) == 1 & input(:,2) < 5)
output = input;
output(exclude,:) = []

I build a logic vector for which rows to exclude from input. Make a copy of input into output. Delete excluded rows.

--Loren

Dear Loren,

How can I sort an array according to a particular column and eliminate duplicates?

For example:


A = [1 200 300;
1 200 300;
2 200 200;
3 100 100;];

sortrows(A,2)

does not eliminate duplicate rows (2 and 3).

How can I sort the matrix and eliminate the duplicate rows?

Thanks very much for your time,
Luis

Dear Loren,
[B, IX] = sort(A) gives A(IX) = B but I want to find IX such that B(IX) = A. Can I do this?

(A more detailed view of the problem:
I have a 5d array of reals which I want to sort (in descending order). Then I want to select the first N elements in my sorted array and identify where they were originally. How can I do this? At the moment I proceed as follows:

 % bins is 5d array, change to column vector for sort
counts = reshape(bins,1,numel(bins)); 
[index,desc_counts] = sort(counts,'descending');
selected = desc_counts(1:N);

but I don’t know how to find out where selected(1:N) was in bins.)

It would be useful if the help file noted that A(IX) = B *and* B(IX) = A for a column vector. This was not intuitively obvious! (Nor, consequently, did I achieve anything looking at the documentation, which was why I had posted in the first place. It would be greatly improved by the inclusion of two or so less trivial examples, one showing the complex sorting procedure as this is different to the logical operators.)

Francesca-

I think that in general, the B(IX)=A equation is not true. At least a simple example with 10 random number proved it to be false. I guess, I once came across the same problem of “inverse indexing”. What helped me out is the following line of code (using your variable naming from above):

[~, indexOfSelectedInCounts] = ismember(selected , counts)

Hi Loren,
I have 2 3D arrays: list(m,n,p) and pby(m,n,p). I sorted the pby using

for i = 1:m
   for j = 1:n
      [pby(i,m,:), ind] = sort(pby(i,m,:),2,'descend');
   end
end

now, I want to rearrange the list in the same order according to the sort order of pby. I think, I can do this by using the ind in the for loop but I want to do it at once if possible.
Please, suggest me.
Thank you.

Galib

Galib-

This sentence from the help for sort should be useful:

“[B,IX] = sort(A,…) also returns an array of indices IX, where size(IX) == size(A). If A is a vector, B = A(IX).” Just apply the index to your second array.

–Loren

Hi Loren,

Thanks for your reply. Actually, I did that already. I was just wondering, if there is any way to do that in a line.

Anyway, Thank you very much again.

Regards
Galib

Loren,

Any idea why ‘sort’ (and a couple of other basic functions like ‘max’) handle Complex numbers inconsistently from Reals? That is, for a set R of Reals, ‘sort’ is continuous; it’s even continuous on the Complexes, away from the Real line. But the result of

sort([-2 1 pi])

is NOT close to (lying within ‘eps’ or so of)

sort([-2+i*eps 1 pi])

.

The result is that otherwise innocuous floating-point expressions such as ‘-2+sqrt(tiny number)’ — which IS continuous — produce entirely different sorted results depending on whether the tiny number is positive or negative. Unless, of course, the expression with the negative part is combined in a way that loses enough precision: ‘sqrt(1e20-(1e20+eps))’. Again, the problem arises not from round-off (which is continuous) but from ‘sort’.

Joel-

It was done at the time based on how the different functions seemed to get used most and what the intent was. We do realize it is confusing.

–Loren

Hi Loren,
I want to sort an nx4 matrix first by column 1, then by column 2, keeping all elements in columns 3 and 4 together with their original rows.

For example,
Original matrix = [
1 3 5 6
1 2 7 7
1 7 5 5
3 4 5 5
2 8 7 6
2 5 6 6
]

Desired output = [
1 2 7 7
1 3 5 6
1 7 5 5
2 5 6 6
2 8 7 6
3 4 5 5
]

I have tried doing this using loops but the method I figured out was long and complicated. What’s the easiest way to do this?

Thank you in advance.

Cheers,

scyss

Scyss-

Don’t understand what you mean by keeping columns 3 and 4 with their original rows. That’s not what your output looks like. If I interpret correctly, using sortrows instead of sort should get what you want.
–Loren

Loren-

Yes, thank you! sortrows does what I want. Sorry, I meant ‘keep row integrity’, not to keep rows 3 and 4 in their original positions.

Cheers,

scyss

scyss-

You may want to use sortrows with 2 outputs on just columns 1 and 2 if you don’t also want to sort on columns 3 and 4. Then use the index (2nd output) to rearrange the original matrix.

–Loren

hi Loren
I have a question please.

I would like to sort rows of a huge 2D array based on the indices stored in another matrix. To reduce complexity, I should avoid any for loop in my code.

an example: Each Row of A should be sorted according to the indices in the corresponding row of ind.
A=[2 4 6; 8 3 9; 17 58 11];
ind=[2 1 3; 3 2 1; 1 3 2];
ans=[4 2 6;9 3 8;17 11 58];

Arash-

Why not use a for loop as long as you preallocate the output? How do you get ind? If it’s the result from some other calculation, it might be possible that that calculation can be tweaked to help out.

Also, working with rows is sometimes less efficient than columns in MATLAB. Perhaps you can turn it into a column-based operation?

–Loren

Loren

Thanks for your reply. Using for loop in this case, increases the complexity of the code.

Thanks for your comment on using column-based matrix analysis.

Hi Loren,
please excuse the long preamble, but I figured it might interest you, since it is related to lsqnonneg and I read on your blog that you implemented it in Matlab.

I have 250 000 data vectors with 50 values each that I want to fit to 5-10 models with lsqnonneg. Since lsqnonneg does not allow to solve for more than one data vector at a time, I need to do it in a loop which runs for about 50s (compared to 0.5s for x = C\d). I decided to try vectorizing the code inside lsqnonneg and managed to cut down computation time by half (not quite what I hoped, but still). However, much of the computation time is used for calls to mldivide.

% Compute intermediate solution using only variables in positive set
z(P) = C(:,P)\d;

I could not vectorize these calls as the models in the Positive set P change from one data vector to the next, so I put it in a loop. Simply put, variables z, P and d have an extra dimension which is the number of data vectors to fit. Basically, I loop over all data vectors for which optimization is not done. Now, I would like instead to loop over the unique combinations of columns of P and make a single call to mldivide for all data vectors that share the same positive set for a given iteration.

So, (finally) I need to get the unique columns of P along with the corresponding column indices in P and I must admit I can’t think of an efficient way to do it.

Thanks!
-David

David:

I don’t have any thoughts about finding unique sets of indices. However, if your problems (or right-hand side vectors, d) are similar, one possible way to speed up lsqnonneg is to use what is known as a “warm start”.

Essentially, this means solving one problem and then using the solution to the first problem as a start point (and the “active-set” information P, and Z) for the next problem. If they are similar problems, starting from the previous solution will skip a few to many iterations (and linear solves using “\”).

lsqnonneg doesn’t have a built-in facility for this, but since you are already modifying the file, it should be a simple change. What you need to do is solve a problem with the default initialization:

P = false(n,1);
Z = true(n,1);
x = nZeros;

After lsqnonneg finishes solving that problem, take the results (x, Z, and P) and use them as the initial point for the next problem (and next d vector) in place of the initial values above (zeros, etc.).

If the problems and their solutions are similar enough, this can significantly reduce the number of iterations.

+Steve

Marcelo-

I managed to vectorize most of lsqnonneg, including bundling calls to mldivide for all data vectors that share the same positive set. I do it by finding the unique columns of P, looping over them, and using find() to get the indices of all the data vectors with given column of P. However, this doesn’t seem too efficient to me, which is why I posted originally. When I finish cleaning up and commenting the code, I’ll post it on the file exchange.

Steve-

This sounds like a great idea, especially since I’m fitting curves of pixel intensity vs time and a lot of neighbouring pixels have very similar dynamics. I’ll try your method and compare it to my vectorized code. Thanks a lot!

That being said, I’d still like to know if there’s a more efficient way to find unique columns of a matrix and the column indices corresponding to each of them. What I’d like is :

input = [7 8 9; 1 2 3; 4 5 6; 1 2 3 ; 7 8 9];
% Some snippet of code

to produce the following output, whatever the format :

uniqueCols = [7 8 9; 1 2 3; 4 5 6]
indForCol1 = [1 5]
indForCol2 = [2 4]
indForCol3 = [3]

These postings are the author's and don't necessarily represent the opinions of MathWorks.