Loren on the Art of MATLAB

Turn ideas into MATLAB

Note

Loren on the Art of MATLAB has been archived and will not be updated.

New MAT-File Functionality in R2011b

Today I’d like to introduce guest blogger Sarah Wait Zaranek who works for the MATLAB Marketing team here at The MathWorks. Sarah previously has written about speeding up code from a customer to get acceptable performance. She and I will be writing about the new capabilities for MAT-files in R2011b.

Contents

What's New for MAT-Files?

The new matfile function in R2011b allows you to efficiently load or save to parts of variables to MAT-Files. For you who are running into memory limits, loading part of a variable requires less memory than loading the entire contents of that variable. Previously, there was a way to read in separate variables from a MAT-file but not load parts of a single variable. But now - it is possible to load in parts of your variable. Let’s see this in action!

Example: Using Partial Reading

We have a MAT-file (myBigData.mat) that contains the variable X. You can use the following to to see what variables are available in your MAT-file

whos -file myBigData
  Name          Size                   Bytes  Class     Attributes

  X         10000x10000            800000000  double              

Wow - X is almost a gigabyte in size. Perfect for trying out this new partial read functionality.

To read in part of a variable, first you create a object that corresponds to a MAT-File, such as

matObj = matfile('myBigData.mat');

By running whos you can see that matObj is a matlab.io.MatFile object:

whos matObj
  Name        Size            Bytes  Class                Attributes

  matObj      1x1               112  matlab.io.MatFile              

Now you can access variables in the MAT-file as properties of matObj, with dot notation. This is similar to how you access the fields of structures in MATLAB.

loadedData = matObj.X(1:4,1:4);
disp(loadedData)
      0.90579      0.35507      0.89227      0.31907
      0.12699        0.997       0.2426      0.98605
      0.91338      0.22417       0.1296      0.71818
      0.63236      0.65245      0.22507      0.41318

This now loads the data into the workspace. Indices can be a single value, a range of values, or a colon (:). However, note that using the end syntax causes MATLAB to load the entire variable in to memory.

Index this way

[nrows, ncols] = size(matObj, 'X');
loadedData = matObj.X(nrows-10:nrows, ncols-10:ncols);

instead of this way to avoid loading the whole variable.

loadedData2 = matObj.X(end-10:end, end-10:end);

You can now treat your loaded data just as you would any data in MATLAB. Let’s calculate and plot the row average of our data. Every read has a bit of an overhead, so we want to balance the number of reads and the size of the data we bring into MATLAB's memory.

[nrows, ncols] = size(matObj,'X');
dataAvg = zeros(nrows,1);
stepSize = 100;

for ii =1:stepSize:nrows
    loadedData = matObj.X(ii:ii+stepSize-1,:);
    dataAvg(ii:ii+stepSize-1) = mean(loadedData,2);
end

plot(dataAvg)

Example: Using Partial Writing

Again we create an object that corresponds to a MAT-File but focus on writing instead of reading data.

matObj = matfile('myBigData2.mat','Writable',true);

For existing files you will need to set the ‘Writable’ flag to allow you to write to the file. For new files, the default behavior is to allow write permissions.

To partially write to a MAT-file, replace the existing data by the new data just as you normally would for variables in MATLAB.

matObj.X(81:100,81:100) = magic(20);

If you do not index into the variable, the full variable is replaced.

You can also create variables or append to existing data

whos -file myBigData2

matObj.NewVar = rand;
matObj.X(:,10001) = rand(10000,1);

whos -file myBigData2
  Name          Size                   Bytes  Class     Attributes

  X         10000x10000            800000000  double              

  Name            Size                   Bytes  Class     Attributes

  NewVar          1x1                        8  double              
  X           10000x10001            800080000  double              

Format and Indexing Limits

matfile only supports partial loading and saving for MAT-files in Version 7.3 format. If you index into a variable in a Version 7 or earlier MAT-file, MATLAB warns and temporarily loads the entire contents of the variable. To save as version 7.3 format, use the following syntax:

save('mydata.mat','-v7.3');

matfile does not support linear indexing or multilevel indexing, such as indexing into cells of cell arrays or fields of structure arrays. To save structure fields as separate variables to your MAT-file, you can use the following:

S.a = 'try to save';
S.b = 42;
S.c = magic(10);

save('newstruct.mat', '-struct', 'S','-v7.3')
save('newstruct2.mat', '-struct', 'S', 'a', 'c','-v7.3')

whos -file newstruct.mat
whos -file newstruct2.mat
  Name       Size            Bytes  Class     Attributes

  a          1x11               22  char                
  b          1x1                 8  double              
  c         10x10              800  double              

  Name       Size            Bytes  Class     Attributes

  a          1x11               22  char                
  c         10x10              800  double              

Learn More

You can learn more about this new functionality, by reading the matfile reference page or by watching this video. There are known performance issues in some cases when using matfile that are currently being look into here at MathWorks. We will update this blog with suggestions on best practices for performance using matfile later.

Here's How to Create myBigData

Run the code in MakeData, and when it's complete, make a copy of the large MAT-file myBigData.mat to myBigData2. Then you should be able to run this example with your R2011b installation.

type MakeData
% Create a large data file
matObj = matfile('myBigData.mat','Writable',true); 

matObj.X(10000,10000) = 0;

for ii = 1:10
   rangmin = (ii-1)*1000 + 1;
   rangmax = rangmin + 999;
   matObj.X(rangmin:rangmax,1:10000) = rand(1000,10000);
end

Try it Out

Experiment with this new functionality, and let us know what you think about it by leaving a comment here.




Published with MATLAB® 7.13


  • print