File Exchange Pick of the Week

Recursive directory searching for multiple file specs? 17

Posted by Brett Shoelson,

Contents

Brett's Pick this week is ... nonexistent. (Instead, here's another challenge.)

A Recursive DIR Command that Accommodates a Complex Filterspec

For a project that I'm currently working on, I want to return a directory listing of all files matching one of several specified formats. I wanted to include in my directory search, for example, all files of type 'doc', 'xml', or 'html'. I can easily select such files using MATLAB's uigetfile command:

[filename, pathname] = uigetfile({'*.doc;*.xml;*.html','Brett''s File Formats';...
        '*.*','All Files' },'Brett''s Search',...
        'C:\Brett\Miscellaneous Documents\myFile.xml',...
        'multiselect','on')

But listing those files in a dir-like format is another (more challenging) matter. Moreover, just to complicate the issue, I want to be able to specify recursive or non-recursive searching--that is, to tell MATLAB to include or exclude, respectively, subdirectories below my initial search path.

The dir command doesn't support that capability.

Rather than write my own custom directory-listing code, my first thought was to search the File Exchange to see if someone had already done so. I was not (initially) disappointed; there are several submissions that appear to do what I need the function to do. (What an amazing resource the File Exchange is!) In fact, there were so many promising files on the Exchange that I decided to get even more restrictive in my criteria. Specifically, I wanted to find a directory-searching function that:

  1. can search for multiple file formats;
  2. can search recursively or non-recursively;
  3. provides a command-line (non-GUI) interface;
  4. works "out-of-the-box" in R2012b--without requiring a lot of effort on my part;
  5. returns results in the same format that the dir command returns; and
  6. is shared under a BSD license.

Would I have any luck with these more restrictive requirements?

As it turns out, not so much. I found many files that satisfy some of my criteria, but none that satisfies them all. Armed with that knowledge, and with some insights I had gained by looking at some other contributors' code, I figured that I could either relax my criteria (some of them are "must-haves," others are just niceties), or I could write my own function. But then I had another idea: this is great fodder for a blog post! Why not challenge readers to create a function to meet those specs? (How far are you willing to go for MATLAB swag, and for recognition in the Pick-of-the-Week blog?) :)

Did I Miss Something?

First, if I overlooked (or misjudged) a particular submission, please let me know. If I agree that a file currently on the exchange meets all of the criteria listed above, I'll recognize your file publicly, apologize humbly, and send you some swag! It's quite possible that I missed something.

So Here's the Challenge

I will send some swag to the first person who shares on the File Exchange some code that meets the six criteria specified above. You can start from scratch, or you can modify an existing file--yours, or someone else's. And I will feature your submission as a Pick-of-the-Week in a future blog.

As always, I welcome your thoughts and comments.


Get the MATLAB code

Published with MATLAB® R2013a

17 CommentsOldest to Newest

@Evgeny:
Absolutely _nothing_ to be ashamed of, Evgeny! In fact, not only was that the fastest response that I’ve ever gotten to a blog challenge, but I was able to confirm that your function meets all of my requirements–except the one that says it must be shared on the File Exchange under a BSD license. And I was able to do it despite the fact that your comments are in Russian! (I think.)

Why not share it on the Exchange? You’ll earn yourself some swag, and a promised Pick-of-the-Week!

Thanks!
Brett

I just submitted my solution to the file exchange. It performs all of the tasks you asked for.

@Jonathan:
Excellent, Jonathan! I’m looking forward to testing it as soon as it goes live. Assuming I can verify your claim, you’ll have earned yourself some swag.

What a great forum–already two potential winners! Keep ‘em coming, folks. Additional prizes may ensue! (Fastest search, best documentation, easiest interface?)

Cheers,
Brett

@bshoelso

Thanks for your review! I think my function is too crude, it is not suitable for use “in production”. :)
I want rewrite it.

The function dir2 written by @Jonathan it’s cool and simple for use! I would choose it… :)
But there is one problem: Error of “recursion limit”.

For example:

dir2(fullfile(matlabroot, ‘toolbox’), ‘-r’, ‘*.cpp’)

This was my function from many moons ago. It does not meet your wants exactly, but a simple loop over SEARCHSTRING input would do the trick. Jonathan’s submission looks much more thought out at first glance.

function files = filesearch(searchString,startPath,recurse)
%FILESEARCH Search (recursively) for files in a directory
%
% See also DIR.

% Version: 1.0
% Date: 2009/08/06
% Author: Shaun

% Some simple error checking ———————————————-
if nargin == 0
help(mfilename);
return;
elseif nargin == 1
startPath = cd;
recurse = 1;
elseif nargin == 2 || ~isnumeric(recurse)
recurse = 1;
end

if ~ischar(searchString) || ~ischar(startPath)
help(mfilename);
return;
end
% ————————————————————————-

% Recursively search from STARTPATH
if recurse
% Generates files which are applicabel to a recursive search
str = genpath(startPath);

% STR is character array which must be broken (i.e. each file is
% separated by a semicolon)
inString = strfind(str,’;');

% Obtain a start/stop index for the file str STR
start = inString+1; start = [1, start(1:end-1)];
stop = inString-1;

% Create a cell array holding file information
fileData = …
arrayfun(@(x) dir([str(start(x):stop(x)) '\' searchString]),…
1:length(start),’UniformOutput’,0);

% Return path with the filename / seems sub-optimal…works
for i = 1:length(start)
for j = 1:length(fileData{i})
fileData{i}(j).name = [str(start(i):stop(i)) '\',...
fileData{i}(j).name];
end
end

% I just want the name of each file
files = cellfun(@(x) {x.name}’,fileData,’UniformOutput’,0);
files = vertcat(files{:});

%Search non-recursively (i.e. just STARTPATH)
else
fn = dir([startPath '\' searchString]);
files = {fn.name}’;
end

% ————————————————-~———————–
% END FUNCTION sps

Thinking about this problem again; I would put a wrapper on the listing below. Of course, I am using windows. DOS-DIR can be recursive or not…multiple extensions or not…everything you want…and fast…super-duper fast

I think UNIX is something “ls -R”.

[~,str] = system(['dir "' fullfile(matlabroot, 'toolbox') '\*.cpp" /s /b']);
str = regexp(str,[strrep(fullfile(matlabroot, 'toolbox'),'\','\\') '\\'],’split’);
str = str(2:end)’;

I’m submitting my solution to the file exchange. It’s a modification from my earlier submission (ls_mod) to meet these targets.

@Evgeny

Thanks for letting me know about this bug. It seems to be present in older versions of MATLAB. I’ve modified the code slightly to account for the different behavior of strcat in older versions of MATLAB, and it is not available on the FEX.

@Shaun: If you could generalize the system call and create that wrapper, that would be a useful and welcome addition to the File Exchange!

@Mikko: I’m looking forward to seeing your revision as soon as it’s available.

@Jonathan: Thanks for the code. I think you meant that the modification is NOW available…not that it is NOT available! (Big difference a single letter can make, eh?) I’ve downloaded your new version and have to say: it rocks! Nice job! (And thanks, Evgeny, for pointing out the recursion issue.)

Just a quick note. I added a self compiling MEX implantation of the code which runs significantly faster (~20x). It is now included in my submission on the file exchange. It is only for windows users, but includes logic to use the standard (non-MEX) implementation for those using other platforms.

These postings are the author's and don't necessarily represent the opinions of MathWorks.