File Exchange Pick of the Week

September 11th, 2006

file1, file10, file2 sorting problem: solved

Brett suggested this Pick of the Week, and I very much agree with his review:

It seems that every computer user, at one point or another, has been faced with sorting a selection of strings that contain numerical elements. Often, these strings represent filenames that were auto-generated in some sort of continuous scheme. Because the numbers embedded in the strings are themselves strings, sorting can be problematic. For instance, suppose your digital camera captured and named images sequentially, tacking a number to the end of a string. After a couple hundred shots, your file list might include an array of filenames like this:

filenames = {
‘MyImage20.jpg’,
‘MyImage40.jpg’,
‘MyImage60.jpg’,
‘MyImage80.jpg’,
‘MyImage100.jpg’,
‘MyImage120.jpg’,
‘MyImage140.jpg’,
‘MyImage160.jpg’,
‘MyImage180.jpg’
‘MyImage200.jpg’};

If you were to sort this list using MATLAB’s built-in SORT function, you would misrepresent the order of your files:

>> sort(filename)

‘MyImage100.jpg’
‘MyImage120.jpg’
‘MyImage140.jpg’
‘MyImage160.jpg’
‘MyImage180.jpg’
‘MyImage20.jpg’
‘MyImage200.jpg’
‘MyImage40.jpg’
‘MyImage60.jpg’
‘MyImage80.jpg’

The typical way one gets around this is by using zero-padded digits in the strings. Douglas Schwarz’s SORT_NAT obviates this step by treating string-embedded digits as numbers, rather than characters. In a nice bit of code, Doug uses regular expressions to pre-parse the strings; the resulting sort gets it just right:

>> sort_nat(filename)

‘MyImage20.jpg’
‘MyImage40.jpg’
‘MyImage60.jpg’
‘MyImage80.jpg’
‘MyImage100.jpg’
‘MyImage120.jpg’
‘MyImage140.jpg’
‘MyImage160.jpg’
‘MyImage180.jpg’
‘MyImage200.jpg’

2 Responses to “file1, file10, file2 sorting problem: solved”

  1. Urs (us) Schwarz replied on :

    interesting: ASORT has been out for over a year… nevertheless: congrats from one schwarz to another…
    us

  2. Doug Schwarz replied on :

    I have always been dissatisfied with one aspect of sort_nat, namely that it doesn’t sort strings containing equal numbers and leaves those strings in their original order. For example, {’a000′,’a0′,’a00′} would be left in that order. I have fixed this and imposed a sort order that is the same as what one would get with a normal sort: {’a0′,’a00′,’a000′}. I have also relaxed the version requirement so sort_nat should work with much older versions of MATLAB. Oh yeah, it’s also a little faster. Enjoy!
    Doug

Leave a Reply

Wrap code fragments inside <pre> tags, like this:

<pre class="code">
a = magic(3);
sum(a)
</pre>

If you have a "<" character in your code, either follow it with a space or replace it with "&lt;" (including the semicolon).


Bob, Brett & Jiro share their favorite user-contributed submissions from the File Exchange.

  • Zach: Hi Doug and Les, I didn’t have a lot of time to mess with this, but I did find a work-around. I plotted...
  • hamed: k
  • Les: @Zach This isn’t exactly what you are looking for but at least it puts all three parameters on the same...
  • Zach: Thanks for your suggestions Doug. I’ll give that a shot and see what happens. I’ve seen many of...
  • Doug: @Zach, I would say to use plotYYY, because that is close to what you want, but using depth as Y makes sense....
  • Doug: @Teja, I think this will work: http://www.mathworks .com/access/helpdesk /help/techdoc/ref...
  • Gify: merry christmas :) nice christmas tree! Regards, Janet Gify
  • Teja: Dear Doug Is there anyway to plot a surface from nonuniform data without meshgrid and griddata? Basically i...
  • Zach: I’m working with geophysical data, so I’d like to produce a depth profile. The y-axis would be...
  • Doug: @Ashok First, please do not use variable names that are MATLAB commands (std and mean). Second, p(j) should be...

These postings are the author's and don't necessarily represent the opinions of The MathWorks.