Loren on the Art of MATLAB

Processing a Set of Files 14

Posted by Loren Shure,

There continue to be questions on the MATLAB newsgroup regarding processing a set of files. So, for the record, and even though Steve covered this topic in his blog, I thought I'd get an answer on record here as well.

Contents

Setup

I should start a clean workspace and with no WAV-files in my blog publishing directory.

clear all
delete *.wav

Problem Statement

Suppose I want to convert sounds stored in MATLAB MAT-files to files saved in WAV format for Windows. Without explictly hardcoding in the filenames, here's a way to proceed.

Collect the MAT-files Containing Sounds

matfiles = dir(fullfile(matlabroot,'toolbox','matlab','audiovideo','*.mat'))
matfiles = 
7x1 struct array with fields:
    name
    date
    bytes
    isdir

Check out the Files

We can see from the length of matfiles that I have 7 MAT-files. I know I don't care about the first one in this case so I am starting my analysis with the second one.

load(matfiles(2).name)
whos
  Name           Size                    Bytes  Class

  Fs             1x1                         8  double array
  matfiles       7x1                      2435  struct array
  y          13129x1                    105032  double array

Grand total is 13390 elements using 107475 bytes

Loading the data places the MAT-file contents into a structure from which I can extract the information I need and write it back out. The sound files that MATLAB ships with store the data in y and the sampling frequency in Fs. Let's look at the first signal.

N = length(y);
plot((1:N)/(N*Fs),y), title(matfiles(2).name(1:end-4))

Loops over the Files

for ind = 2:length(matfiles)
    data = load(matfiles(ind).name);
    wavwrite(data.y,data.Fs,matfiles(ind).name(1:end-4));
end
Warning: Data clipped during write to file:gong
Warning: Data clipped during write to file:splat
Warning: Data clipped during write to file:train

What about Those Files?

dir *.wav
chirp.wav     handel.wav    splat.wav     
gong.wav      laughter.wav  train.wav     

Read a File Back for Verification

I'll double-check the last file I just read in and wrote out. ind is still set despite no longer being in the for loop. Let's check both the frequencies and the signals themselves.

[ywav, Fswav] = wavread(matfiles(ind).name(1:end-4));
eqFreqs = isequal(Fswav, data.Fs)
datadiff = norm(ywav-data.y)
eqFreqs =
     1
datadiff =
    0.0010

The data stored in the WAV-file is NOT exactly the same as that stored in the MAT-file. Reading the help for wavwrite gives some insight; the data in the WAV-file, by default, is stored as 16-bit data vs. MATLAB's standard 64-bit double.

Thoughts?

Does it confuse people here that I don't worry about vectorization? Any other thoughts on this topic?


Published with MATLAB® 7.2

14 CommentsOldest to Newest

I don’t worry about vectorization for speed when looping over files since the operations on them is usually much longer then the act of looping.

But, I do find the wonderful, somewhat new, vectorization functions in MATLAB to be a blessing when I need to manipulate the file names.

Today I needed to get a list of file names, tack on a new path to them and string them out as input into a command line function. Arrayfun with anonymous functions makes life a little easier and more pleasant:

files = dir(‘*.mat’);
nameWithSpace = arrayfun(@(x) [newpath x.name ‘ ‘], files, ‘uni’, false); % adds a path and then a space at the end of each file.

system([ blahblahblah ‘ ‘ [nameWithSpace{:}])

I continually find new uses for the combination of anonymous functions and these new vectorization functions. (structrun, arrayfun, cellfun)

Here is a plug for a new request: How about rowfun, colfun, or something that allows me to vectorize over any dimension? That can be very useful. Maybe call it applyfun where you can apply it to any dimension. There are ways to do this with accumarray but it requires a lot of manipulation to do what I think should be straight forward.

Stephen

Gary-

wav-files use a losy compression scheme so when you read the data back in, it doesn’t match the mat-file data exactly.

But I think you are pointing my mistake in the M-code! The for loop should read

%% Loops over the Files
% 
for ind = 2:length(matfiles)
    data = load(matfiles(ind).name);
    wavwrite(data.y,data.Fs,matfiles(ind).name(1:end-4));
end

I have fixed up that and another small bug in the code in the blog itself.

–Loren

To stephen.

I think this maybe what you want

% arr – the n-dimensional input matrix
% dim – the dimension to iterate over
% fh – the function handle to process the iteration
function iterate_dim(arr, dim, fh)
…n = ndims(arr);
…m = size(arr, dim);
…idx = repmat({‘:’}, n);
…for i = 1:m
……idx{dim}=i;
……r = arr(idx{:});
……fh(r);
…end
end

to use

a = rand(10);
iterate_dim(a, 1, @(row) disp(row))
iterate_dim(a, 2, @(col) disp(col))

I have not checked the code above so I don’t know if it works as advertised :). Also I think the inner indexing
will be slow as matlab will not be able to JIT the indexing. Is there a smarter way to do the indexing?

B

@StephenL:

You can have the concatenation even easier, without resorting to arrayfun and anonymous functions:

files = dir(’*.mat’);
namesWithSpace = strcat(newpath,{files.name},{‘ ‘});

Michael

Brad: Thanks. Agree about the speed. I put in a enhancement request in for this.

Michael: Thanks. That’s perfect. To get to the end result needed: [nameWithSpace{:}] strings everything out into a char array.

Stephen

Is it allowed to shamefully plug for my own contribution in FileExchange in this context? Filefun will handle these kind of tasks in a seemingly vectorized fashion á la cellfun et al.

Jerker
——
Yupp, that’s my name. Same as the IKEA desk.

MJ-

Got any guesses since it’s wavwrite to write them out? Try using lookfor at the MATLAB prompt or searching the doc. What happens if you look up wavwrite? Does it help you at all?

–loren

I run my program n times, each time I want to save run report in a file called test_report.xls. during the run time, the file should be test_report1.xls, test_report2.xls, ….,test_reportn.xls. please let me know how can I do that (adding 1, 2 ,…,n to the file name)

Thanks
Ibrahim

Ibrahim-

Use the functional form for writing your report and build up the filename, something like this:

for n = 1:numFiles
  fn = ['report', int2str(n), '.xls');
  % now use this filename, e.g.,
  xlswrite(fn, ...)
end

–Loren

I want to open a list of files with a common name scheme of this type: “filestart + changing characters”.

Example:

MOD08_M3.A2000032.005.2006255182125.hdf
MOD08_M3.A2000061.005.2008271011815.hdf
MOD08_M3.A2000092.005.2006265004700.hdf
MOD08_M3.A2000122.005.2006256101527.hdf

I’m having problems with filling in the changing characters of the filenames… =(

Thanks for your help.

Ricardo-

You have to somehow gather the list of names. Perhaps in your case, you get the names like this:

names = dir(‘MOD08_M3.A*.hdf’);

Once you do, you can loop over the names similar to the example above.

–Loren

I finally found out… More or less this is the code that does the job I want =)

hdf_files = dir('MOD08_M3.A*.hdf');
for k = 1:length(hdf_files)
    filename = hdf_files(k).name;
    data = hdfread(filename, 'LST_Day_CMG');
end

Thank you very much for your help!

These postings are the author's and don't necessarily represent the opinions of MathWorks.