# Natural Order Sorting

Sean's pick this week is the suite of natural-order sorting tools by Stephen Cobeldick

If you work with data, there are many different naming schemes for files that you will likely encounter. For some of these the numerical sorting order might be the same as the ASCII order sorting order. But it is not always, and how to deal with this is a common question on MATLAB Answers.

I typically try to set it up the following way when naming my own files using the "%0xi" format in num2str. For example:

for ii = [1 2 17 495 3920]
% %04i - up to four zeros in front of the number.
filenameii = ['file' num2str(ii,'%04i') '.csv'];
disp(filenameii)
end
file0001.csv
file0002.csv
file0017.csv
file0495.csv
file3920.csv


Stephen's natural-order sorting tools help sort files or names that do not necessarily have this setup. For example, let's look at the ASCII order of a few files using sortrows.

files = {'file1.csv','file111.csv','file21.csv','file211.csv'}.';
disp(sortrows(files))
    'file1.csv'
'file111.csv'
'file21.csv'
'file211.csv'


Numerically file111.csv should not be before file21.csv. Now let's use natsort to do this for us:

disp(natsort(files))
    'file1.csv'
'file21.csv'
'file111.csv'
'file211.csv'


The other utilies that Stephen has provided allow more control over this and for the extension of it to not just working on single elements of a cell but on a whole full file path. For example, how should I sort the following?

files = {'C:\Documents\Exp1\test1.csv','C:\Documents\Exp2\test1.csv','C:\Documents\Exp2\test2.csv','C:\Documents\Exp1\test2.csv'}.';
disp(files)
    'C:\Documents\Exp1\test1.csv'
'C:\Documents\Exp2\test1.csv'
'C:\Documents\Exp2\test2.csv'
'C:\Documents\Exp1\test2.csv'


Sorted by experiment:

disp(natsortfiles(files))
    'C:\Documents\Exp1\test1.csv'
'C:\Documents\Exp1\test2.csv'
'C:\Documents\Exp2\test1.csv'
'C:\Documents\Exp2\test2.csv'


To sort by test, we can split the file path into pieces and then use natsortrows on the pieces:

% Split on file separators
filepieces = regexp(files, ['' filesep ''], 'split');
filepieces = vertcat(filepieces{:});
disp(filepieces)
    'C:'    'Documents'    'Exp1'    'test1.csv'
'C:'    'Documents'    'Exp2'    'test1.csv'
'C:'    'Documents'    'Exp2'    'test2.csv'
'C:'    'Documents'    'Exp1'    'test2.csv'

% Sort them by fourth column (test) then third column (experiment)
[~, idx] = natsortrows(filepieces,[4 3]);
disp(files(idx))
    'C:\Documents\Exp1\test1.csv'
'C:\Documents\Exp2\test1.csv'
'C:\Documents\Exp1\test2.csv'
'C:\Documents\Exp2\test2.csv'


These files provide excellent help and are well documented.

My only suggestion for Stephen would be to provide these files together in one File Exchange entry (or as a fourth entry). This is solely for the reason that I am lazy and downloading all of the separate zip files and unpacking them took an extra few minutes. However, since the second two files depend on natsort, they wouldn't work on their own without this process.

If given the choice, how do you choose to store your files? What challenges do you face when you receive files from others or from hardware? Let us know below.

Give it a try and let us know what you think here or leave a comment for Stephen.

Published with MATLAB® R2014b

|