File Exchange Pick of the Week

file1, file10, file2 sorting problem: solved 2

Posted by Doug Hull,

Brett suggested this Pick of the Week, and I very much agree with his review:

It seems that every computer user, at one point or another, has been faced with sorting a selection of strings that contain numerical elements. Often, these strings represent filenames that were auto-generated in some sort of continuous scheme. Because the numbers embedded in the strings are themselves strings, sorting can be problematic. For instance, suppose your digital camera captured and named images sequentially, tacking a number to the end of a string. After a couple hundred shots, your file list might include an array of filenames like this:

filenames = {
'MyImage20.jpg',
'MyImage40.jpg',
'MyImage60.jpg',
'MyImage80.jpg',
'MyImage100.jpg',
'MyImage120.jpg',
'MyImage140.jpg',
'MyImage160.jpg',
'MyImage180.jpg'
'MyImage200.jpg’};

If you were to sort this list using MATLAB’s built-in SORT function, you would misrepresent the order of your files:

>> sort(filename)

'MyImage100.jpg'
'MyImage120.jpg'
'MyImage140.jpg'
'MyImage160.jpg'
'MyImage180.jpg'
'MyImage20.jpg'
'MyImage200.jpg'
'MyImage40.jpg'
'MyImage60.jpg'
'MyImage80.jpg'

The typical way one gets around this is by using zero-padded digits in the strings. Douglas Schwarz’s SORT_NAT obviates this step by treating string-embedded digits as numbers, rather than characters. In a nice bit of code, Doug uses regular expressions to pre-parse the strings; the resulting sort gets it just right:

>> sort_nat(filename)

'MyImage20.jpg'
'MyImage40.jpg'
'MyImage60.jpg'
'MyImage80.jpg'
'MyImage100.jpg'
'MyImage120.jpg'
'MyImage140.jpg'
'MyImage160.jpg'
'MyImage180.jpg'
'MyImage200.jpg'

2 CommentsOldest to Newest

interesting: ASORT has been out for over a year… nevertheless: congrats from one schwarz to another…
us

I have always been dissatisfied with one aspect of sort_nat, namely that it doesn’t sort strings containing equal numbers and leaves those strings in their original order. For example, {‘a000′,’a0′,’a00′} would be left in that order. I have fixed this and imposed a sort order that is the same as what one would get with a normal sort: {‘a0′,’a00′,’a000′}. I have also relaxed the version requirement so sort_nat should work with much older versions of MATLAB. Oh yeah, it’s also a little faster. Enjoy!
Doug

These postings are the author's and don't necessarily represent the opinions of MathWorks.