File Exchange Pick of the Week

Our best user submissions

file1, file10, file2 sorting problem: solved

Brett suggested this Pick of the Week, and I very much agree with his review:

It seems that every computer user, at one point or another, has been faced with sorting a selection of strings that contain numerical elements. Often, these strings represent filenames that were auto-generated in some sort of continuous scheme. Because the numbers embedded in the strings are themselves strings, sorting can be problematic. For instance, suppose your digital camera captured and named images sequentially, tacking a number to the end of a string. After a couple hundred shots, your file list might include an array of filenames like this:

filenames = {
'MyImage20.jpg',
'MyImage40.jpg',
'MyImage60.jpg',
'MyImage80.jpg',
'MyImage100.jpg',
'MyImage120.jpg',
'MyImage140.jpg',
'MyImage160.jpg',
'MyImage180.jpg'
'MyImage200.jpg’};

If you were to sort this list using MATLAB’s built-in SORT function, you would misrepresent the order of your files:

>> sort(filename)

‘MyImage100.jpg’
‘MyImage120.jpg’
‘MyImage140.jpg’
‘MyImage160.jpg’
‘MyImage180.jpg’
‘MyImage20.jpg’
‘MyImage200.jpg’
‘MyImage40.jpg’
‘MyImage60.jpg’
‘MyImage80.jpg’

The typical way one gets around this is by using zero-padded digits in the strings. Douglas Schwarz’s SORT_NAT obviates this step by treating string-embedded digits as numbers, rather than characters. In a nice bit of code, Doug uses regular expressions to pre-parse the strings; the resulting sort gets it just right:

>> sort_nat(filename)

‘MyImage20.jpg’
‘MyImage40.jpg’
‘MyImage60.jpg’
‘MyImage80.jpg’
‘MyImage100.jpg’
‘MyImage120.jpg’
‘MyImage140.jpg’
‘MyImage160.jpg’
‘MyImage180.jpg’
‘MyImage200.jpg’

|
  • print

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.