File Exchange Pick of the Week

Our best user submissions

Glob File Searching in MATLAB

Contents

Glob File Searching in MATLAB

Greg’s pick this week is Expand wildcards for files and directory names by Peter van den Biggelaar.

You want to list all the files in a folder, including those in subfolders. So you read the MATLAB documentation for DIR and LS.

You find you can do name pattern matching using the '*' character. But you soon learn that you have to call those functions recursively in order to include the files in subfolders.

Operating systems have commands with this ability, why not MATLAB***?

I’m not sure I can answer the question “Why not MATLAB?”, but Peter has shared an excellent solution. It is one I have used in my own development projects.

Here’s an example that finds all of the MATLAB-files in one of my larger projects:

glob('C:\X\Project\PMSM\Demo\**.m')
C:\X\Project\PMSM\Demo\startDemo.m                                                                                                                     
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t0_initWorkFolder.m                                                                                       
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t1_openTestBench.m                                                                                        
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t2_importEnums.m                                                                                          
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t3_partitionData.m                                                                                        
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t4_multipleDictionaries.m                                                                                 
C:\X\Project\PMSM\Demo\+task\+mac2015\t1_openTestBench.m                                                                                               
C:\X\Project\PMSM\Demo\+task\+mac2015\t2_generateCodeAndCopyToParentProject.m                                                                          
C:\X\Project\PMSM\Demo\+task\+pcgF28035\t1_openTestBench.m                                                                                             
C:\X\Project\PMSM\Demo\+task\+pcgF28035\t2_generateCodeAndCopyToParentProject.m                                                                        
...

Note:*** As of R2016b, the DIR function supports recursive searches.

What’s a Glob?

A glob is a pattern that includes wildcard characters that represent other sets of characters. The best well known is the "*" which represents any number of any characters. There’s a good definition here along with some examples.

How does this differ from using the DIR function in MATLAB?

Prior to R2016b the DIR function can only return elements found in the root of the search folder, and does not return elements found in child or subfolders.

Results from DIR in R2016a

Now, as of R2016b, you can use wildcard characters to perform recursive searches with the DIR function in MATLAB.

dir('C:\X\Project\PMSM\Demo\*\*.m')
Files Found in: C:\X\Project\PMSM\Demo\+test

runAll.m                   runOnlyPcg.m               
runAllForBaselineF28035.m  runShort.m                 
runAllForBaselineF28069.m  

Files Found in: C:\X\Project\PMSM\Demo\Common

addCommonPath.m     removeCommonPath.m  
getCommonPath.m     startup.m           
...

However, even in R2016b, DIR doesn’t support additional glob special characters like ? or character sets using [] or {}.

% Find all MATLAB-files that start with t followed by a single character and an underscore.
glob('C:\X\Project\PMSM\Demo\**\t?_*.m')
% Find all MATLAB-files that start with "t1_", "t2_", or "t3_"
glob('C:\X\Project\PMSM\Demo\**\t[123]_*.m')
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t0_initWorkFolder.m               
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t1_openTestBench.m                
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t2_importEnums.m                  
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t3_partitionData.m                
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t4_multipleDictionaries.m         
C:\X\Project\PMSM\Demo\+task\+mac2015\t1_openTestBench.m                       
C:\X\Project\PMSM\Demo\+task\+mac2015\t2_generateCodeAndCopyToParentProject.m  
C:\X\Project\PMSM\Demo\+task\+pcgF28035\t1_openTestBench.m                     
C:\X\Project\PMSM\Demo\+task\+pcgF28035\t2_generateCodeAndCopyToParentProject.m
C:\X\Project\PMSM\Demo\+task\+pcgF28035\t3_loadAndPlotHwData.m                 
...

Enhance GLOB function in R2016b using the new String class.

Strings have become first-class citizens of the MATLAB Language. Prior to R2016b, if you wanted to represent a set of strings, you had to use a cell array of character arrays (see also: cellstr).

The GLOB function returns a cellstr (a cell array of character arrays):

files = glob('C:\X\Project\PMSM\Demo\**.m');
files(1:3)
class(files)
class(files{1})
ans =

  3×1 cell array

    'C:\X\Project\PMSM\Demo\startDemo.m'
    'C:\X\Project\PMSM\Demo\+task\+dataDictionary\t0_initWorkFolder.m'
    'C:\X\Project\PMSM\Demo\+task\+dataDictionary\t1_openTestBench.m'


...

Remember you have to use the {} when extracting elements of the cell array.

The String class to the rescue!

files = string(files)
class(files)
files = 

  1414×1 string array

    "C:\X\Project\PMSM\Demo\startDemo.m"
    "C:\X\Project\PMSM\Demo\+task\+dataDictionary\t0_initWorkFolder.m"
    "C:\X\Project\PMSM\Demo\+task\+dataDictionary\t1_openTestBench.m"
    "C:\X\Project\PMSM\Demo\+task\+dataDictionary\t2_importEnums.m"
    "C:\X\Project\PMSM\Demo\+task\+dataDictionary\t3_partitionData.m"
...

My favorite feature of the STRING function is it can convert a cell array of character arrays to an array of Strings.

Now you can start applying common string methods like contains, endsWith, beginsWith, extractBetween, etc to filter and analyze the result

hasF28035 = contains(files, 'F28035');
files(hasF28035)
ans = 

  59×1 string array

    "C:\X\Project\PMSM\Demo\+task\+pcgF28035\t1_openTestBench.m"
    "C:\X\Project\PMSM\Demo\+task\+pcgF28035\t2_generateCodeAndCopyToParent..."
    "C:\X\Project\PMSM\Demo\+task\+pcgF28035\t3_loadAndPlotHwData.m"
    "C:\X\Project\PMSM\Demo\+task\+pcgF28035\t4_openFloatingPointTestBench.m"
    "C:\X\Project\PMSM\Demo\+task\+prototypeF28035\t1_openTestBench.m"
...

Potential Alternative: Use Operating System Commands

This doesn’t necessarily reproduce all of the functionality that the GLOB function provides. But it is a mechanism that I have used to perform similar functions to GLOB.

You can apply operating system commands from MATLAB using the SYSTEM function. On Windows systems you can perform a recursive directory search with the following example.

[~, files] = system('dir /s/B C:\X\Project\PMSM\Demo\*.m')
files =

C:\X\Project\PMSM\Demo\startDemo.m
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t0_initWorkFolder.m
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t1_openTestBench.m
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t2_importEnums.m
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t3_partitionData.m
C:\X\Project\PMSM\Demo\+task\+dataDictionary\t4_multipleDictionaries.m
C:\X\Project\PMSM\Demo\+task\+mac2015\t1_openTestBench.m
...

The variable files is a single character array

s = size(files)
c = class(files)
s =

           1      129792


c =

char

You can split this using the STRSPLIT function

files = strsplit(string(files))'
files = 

  1513×1 string array

    "C:\X\Project\PMSM\Demo\startDemo.m"
    "C:\X\Project\PMSM\Demo\+task\+dataDictionary\t0_initWorkFolder.m"
    "C:\X\Project\PMSM\Demo\+task\+dataDictionary\t1_openTestBench.m"
    "C:\X\Project\PMSM\Demo\+task\+dataDictionary\t2_importEnums.m"
    "C:\X\Project\PMSM\Demo\+task\+dataDictionary\t3_partitionData.m"
...

Any thoughts on this entry?

Let us know here.

Published with MATLAB® 9.1

|
  • print

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.