# Natural Order Sorting3

Posted by Sean de Wolski,

Sean's pick this week is the suite of natural-order sorting tools by Stephen Cobeldick

If you work with data, there are many different naming schemes for files that you will likely encounter. For some of these the numerical sorting order might be the same as the ASCII order sorting order. But it is not always, and how to deal with this is a common question on MATLAB Answers.

I typically try to set it up the following way when naming my own files using the "%0xi" format in num2str. For example:

for ii = [1 2 17 495 3920]
% %04i - up to four zeros in front of the number.
filenameii = ['file' num2str(ii,'%04i') '.csv'];
disp(filenameii)
end
file0001.csv
file0002.csv
file0017.csv
file0495.csv
file3920.csv


Stephen's natural-order sorting tools help sort files or names that do not necessarily have this setup. For example, let's look at the ASCII order of a few files using sortrows.

files = {'file1.csv','file111.csv','file21.csv','file211.csv'}.';
disp(sortrows(files))
    'file1.csv'
'file111.csv'
'file21.csv'
'file211.csv'


Numerically file111.csv should not be before file21.csv. Now let's use natsort to do this for us:

disp(natsort(files))
    'file1.csv'
'file21.csv'
'file111.csv'
'file211.csv'


The other utilies that Stephen has provided allow more control over this and for the extension of it to not just working on single elements of a cell but on a whole full file path. For example, how should I sort the following?

files = {'C:\Documents\Exp1\test1.csv','C:\Documents\Exp2\test1.csv','C:\Documents\Exp2\test2.csv','C:\Documents\Exp1\test2.csv'}.';
disp(files)
    'C:\Documents\Exp1\test1.csv'
'C:\Documents\Exp2\test1.csv'
'C:\Documents\Exp2\test2.csv'
'C:\Documents\Exp1\test2.csv'


Sorted by experiment:

disp(natsortfiles(files))
    'C:\Documents\Exp1\test1.csv'
'C:\Documents\Exp1\test2.csv'
'C:\Documents\Exp2\test1.csv'
'C:\Documents\Exp2\test2.csv'


To sort by test, we can split the file path into pieces and then use natsortrows on the pieces:

% Split on file separators
filepieces = regexp(files, ['' filesep ''], 'split');
filepieces = vertcat(filepieces{:});
disp(filepieces)
    'C:'    'Documents'    'Exp1'    'test1.csv'
'C:'    'Documents'    'Exp2'    'test1.csv'
'C:'    'Documents'    'Exp2'    'test2.csv'
'C:'    'Documents'    'Exp1'    'test2.csv'

% Sort them by fourth column (test) then third column (experiment)
[~, idx] = natsortrows(filepieces,[4 3]);
disp(files(idx))
    'C:\Documents\Exp1\test1.csv'
'C:\Documents\Exp2\test1.csv'
'C:\Documents\Exp1\test2.csv'
'C:\Documents\Exp2\test2.csv'


These files provide excellent help and are well documented.

My only suggestion for Stephen would be to provide these files together in one File Exchange entry (or as a fourth entry). This is solely for the reason that I am lazy and downloading all of the separate zip files and unpacking them took an extra few minutes. However, since the second two files depend on natsort, they wouldn't work on their own without this process.

If given the choice, how do you choose to store your files? What challenges do you face when you receive files from others or from hardware? Let us know below.

Give it a try and let us know what you think here or leave a comment for Stephen.

Get the MATLAB code

Published with MATLAB® R2014b

### Note

Peter Mao replied on : 1 of 3

This looks like a very useful piece of code, but when I run into the problem of inconsistent file naming/numbering, I go into the emacs wdired mode (writable directory editor) and use [regex] search/replace, rectangular cut/paste, rectangular number insert (gse-number-rectangle), etc. All the power of emacs to make the file names consistent.

I realize emacs is a hard pill to swallow for many, but there’s nothing else like wdired-mode in any OS or application that I know of. Changing filenames in any directory structure is trivially easy in emacs.

Sean de Wolski replied on : 2 of 3

Good to know, thanks Peter!

Stephen Cobeldick replied on : 3 of 3

Thank you for the your comments, and for selecting these functions for POTW.

The purpose of “natsort” and “natsortrows” should be fairly clear from the function names, but the function “natsortfiles” is actually a little bit more subtle than the examples given above. It sorts the filename and file extension separately so that the file extension-separator (period character) does not affect the sort results. This can be summarized in the following example:

A = {‘test_x.m’; ‘test-x.m’; ‘test.m’};
sort(A)
ans =
test-x.m
test.m
test_x.m
natsortfiles(A)
ans =
test.m
test-x.m
test_x.m

This concept is then generalized to apply to the file-separator character as well, so that each directory level is sorted independently of the separator character.

I considered placing these together in one submissions, but it seems that most users arrive via search engine and are intimidated by collections of functions (rather than single files). The description and examples for each function require a reasonable amount of space, which would complicate this browsing. Perhaps others have had similar experiences?