Loren on the Art of MATLAB

November 19th, 2009

Coordinating Zero Removals from Multiple Arrays

I've fielded some questions recently about how to coordinate multiple arrays changing simultaneously. One example is removing elements for two arrays in the case where either array holds a zero for the location. This is a good opportunity to reiterate the use of logical arrays and some useful associated functions (such as any and all).

Contents

Identify Pairs to Remove

Let's say I have 2 arrays

a = [ 1  4  9  0 25  0 49  0]
b = [ 1  0  3  0  0  6  7  8]
a =
     1     4     9     0    25     0    49     0
b =
     1     0     3     0     0     6     7     8

and I would like to delete the corresponding elements in a and b when either of them contains a zero value.

First Algorithm

There are several possible algorithms, each with their own trade-offs. Here's the first one.

anyzero = any([a;b] == 0)
a(anyzero) = []
b(anyzero) = []
anyzero =
     0     1     0     1     1     1     0     1
a =
     1     9    49
b =
     1     3     7

This algorithm combines the two arrays into one, a potentially costly move if the arrays are large. Then check for values that equal zero. And finally, check columnwise, using the function any, to identify the columns that have at least one zero. Finally, use this array of logical indices to delete the appropriate elements of a and b.

Second Algorithm

This algorithm (courtesy of Mirek L. in this post doesn't suffer from combining the two arrays.

x1 = a(a.*b ~= 0)
y1 = b(a.*b ~= 0)
x1 =
     1     9    49
y1 =
     1     3     7

But it calculates the same temporary array twice (and it's the size of one of the vectors). To be able to recalculate the temporary array this way, I can't overwrite the initial arrays as you see in the first algorithm. And finally, is there is a NaN or Inf corresponding to a 0, this algorithm won't find it.

Always Tradeoffs

There are always tradeoffs to make like the ones I mention here, at least when I program. How do you choose which tradeoffs to make? Which one would you choose here? Or would you choose an entirely different algorithm (which I hope you'll post). Let us know here.


Get the MATLAB code

Published with MATLAB® 7.9

November 12th, 2009

Empty Arrays with Flow of Control and Logical Operators

After reading last week's post on calculating with empty arrays, one of my colleagues mentioned some other behaviors with empty arrays that have tripped him up in the past. Today I will discuss how empty arrays work in the contexts of flow of control expressions (both conditional and looping, i.e., if and while) and short-circuit operators (i.e., && and | |).

Contents

Empty Arrays in Flow of Control

Let me first start with plain empty arrays in flow of control situations. For example, what will this code do?

    E = [];
    if E
       disp('Empty is true')
    else
       disp('Empty is false')
    end

Readers who remember my comment on last week's blog will correctly guess that the empty expression for the if statement is treated as false. Why? The way I think about it is this. If I am looking for locations of some condition in an array, and I don't find them, I end up with an empty output. This very output is the kind of expression I am likely to want to use, somehow, in an if statement. Let's try it to be sure.

E = [];
if E
    disp('Empty is true')
else
    disp('Empty is false')
end
Empty is false

The situation gets a bit more complicated if there is a logical expression for the if or while statement that has an empty array as one of its elements. Let me show you what I mean. Paraphrasing the documentation,

 There are some conditions however under which while evaluates as true
 on an empty array. Two examples of this are
                  A = [];
                  while all(A), doSomething, end
                  while 1|A, doSomething, end

Let's see what's going on in each of these examples. In the first one, the function all is being called with an empty input. According to the second reference below (on empty arrays), the function all is one of the functions that returns a nonzero value for empty input. Let's see.

allE = all(E)
allEislogical = islogical(allE)
allE =
     1
allEislogical =
     1

The way I think about this is that there are no false values in E, hence the true result.

Empty Arrays with Logical Operators

The second expression involves an elementwise logical operator ( | ). In this case, the first part of the expression, 1, is true, so the second part, after the elementwise or, is never evaluated. So the fact that an empty result returns false never comes into play here. Why? Because & and | operators short-circuit when and only when they are in the context of if or while expressions. Otherwise, the elementwise operators do not short-circuit.

In contrast, the logical operators, && and | |, always short-circuit, regardless of context.

Short-circuit Logical Operators (| | and &&)

The next important idea to remember is that the short-circuit logical operators expect scalars as the inputs for the expressions. This means that an empty array, not being a scalar, may cause you some grief if you are unprepared for that situation. Let me show you what I mean. Compare the following 2 code snippets.

true || E
ans =
     1
try
    E || true
catch ME
    disp(ME.message)
end
Operands to the || and && operators must be convertible to logical scalar values.

In the second snippet, the expression E || true

produced an error, because E isn't a scalar value. Once the error occurs, the second operand is never evaluated. Contrast that with the snippet, where the first input evaluates to true. Short-circuiting then takes over and the second operand, which would cause an error in this context, is never evaluated.

Examples

Here are a few more code examples to help you see the patterns. Try to figure out the answers before reading the results.

if []
    disp('hello')
else
    disp('bye')
end
bye
true | []
ans =
     []
[] | true
ans =
     []
true || []
ans =
     1
try
    [] || true
catch ME
    disp(ME.message)
end
Operands to the || and && operators must be convertible to logical scalar values.
if true | []
    disp('hello')
else
    disp('bye')
end
hello
if [] | true
    disp('hello')
else
    disp('bye')
end
bye
if true || []
    disp('hello')
else
    disp('bye')
end
hello
try
    if [] || true
        disp('hello')
    else
        disp('bye')
    end
catch ME
    disp(ME.message)
end
Operands to the || and && operators must be convertible to logical scalar values.

References

Here are a bunch of references to the MATLAB documentation where all of this information is covered.

Empty Thoughts?

The behaviors with empties in MATLAB are, I believe, consistent and useful. Nonetheless, the behaviors have lots of details to master and can be confusing. If you have any thoughts on the matter, please respond here.


Get the MATLAB code

Published with MATLAB® 7.9

November 4th, 2009

Calculus with Empty Arrays

MATLAB has had empty arrays since before I started using the program. When I started, the only size empty array was 0x0. When version 5 was released, empty arrays came along for the N-dimensional ride and got more shapely.

Contents

Even relatively simple expressions involving empty arrays cause confusion from time to time, especially in concert with other rules in MATLAB (such as NaN values usually propagate from inputs to outputs). Let's play around a little with some empty arrays to get some insight.

From the Newsgroup

This post on the MATLAB newsgroup motivated me to talk about empty arrays. Here's is an excerpt from the original post:

 I knew that Nan+4 = NaN, but why is []+4 = [] ? Is there more 'black hole
 behaviour' I should know about?

Dimensions Matter in MATLAB

Dimensions matter in MATLAB and you will get error messages when dimensions don't agree. Early on, we found it was convenient to treat scalars as an exception with operators (e.g., ,.) and to treat them as if they were expanded to the size of the other operand. So

rand(3,3) + pi
ans =
          4.10648118878907          4.09875960183274          3.28347899221701
          3.29920573526734          3.62696830231263          3.56335393621607
          4.11218543535041          3.94187312247859          4.05732817877886

adds the value pi to the random 3x3 just created. It's as if a 3x3 constant array filled with the value pi was created and added elementwise to the other array.

So what does that mean for an empty operand? Its size has at least one 0 value. So MATLAB expands pi in this expression [] + pi to be the same size as [] (which happens to be 0x0 here). When that happens, MATLAB creates a second empty array, and then adds the two empty arrays. Hence we get

[] + pi
ans =
     []

returing an empty output.

MATLAB applies the scalar expansion rule to operators. So, for example, size(anything+scalar) is size(anything). As a logical but sometimes surprising consequence, empty arrays often (but not always) propagate through calculations. When doesn't that happen? With some functions where mathematically logical analysis demands different results.

sum([])
ans =
     0
prod([])
ans =
     1

So, why is the sum zero and the product 1? Because they are the identity elements (as in group theory) for sum and product respectively. Think about how to start the computation for a sum or a product and how you would initialize the first value. That's the value given before adding or multiplying any of the array elements, hence the values 0 and 1 respectively.

Empty Array Shapes

Empty arrays can be N-dimensional, and don't need all dimensions to be 0. However, they still must obey the rules about dimensions needing to agree. There are consequences that may surprise you, but they do follow logically. Here's an example of trying to add two empty arrays, ending up in an error!

a = zeros(0,1)
b = zeros(1,0)
try
    a+b
catch MEplus
    fprintf('\n')
    disp(MEplus.message)
end
a =
   Empty matrix: 0-by-1
b =
   Empty matrix: 1-by-0

Matrix dimensions must agree.

Reference

For more information about empty arrays, check out this page in the documentation.

Got an Empty Question?

Got any empty questions (really, questions about empty!)? Or comments? Post them here.


Get the MATLAB code

Published with MATLAB® 7.9

October 21st, 2009

Dealing with Cells

A customer recently asked me this question at the MATLAB Virtual Conference.

Contents

Question about Summing Cell Rows

  I was hoping you would cover cells some day. Here is a particular
  problem I was hoping to have a more elegant solutions for.
  A is a cell that has String (say names) in the first column, Numbers
  (say scores in tests 1 and 2) in the next two columns. Assume that each
  cell has only one value. Is there an easier way to calculate, say the
  sum of the two test scores than a for loop?

Example

Let's make some sample data.

c(1:4,1) = {'Fred' 'Alice' 'Lucy' 'Tom'};
c(1:4,2) = { 90  80  55 102 };
c(1:4,3) = { 43  91  80  44 }
c = 
    'Fred'     [ 90]    [43]
    'Alice'    [ 80]    [91]
    'Lucy'     [ 55]    [80]
    'Tom'      [102]    [44]

Answer

To sum the entries in a row for this cell array, I can simply turn the numeric values into a numeric array via comma-separated list notation, and then sum the rows. Let's see the pieces for clarity. Here's the comma-separated list.

c{:,2:3}
ans =
    90
ans =
    80
ans =
    55
ans =
   102
ans =
    43
ans =
    91
ans =
    80
ans =
    44

Now place the results into a vector.

vec = [c{:,2:3}]
vec =
    90    80    55   102    43    91    80    44

Reshape the vector appropriately into a matrix.

array = reshape(vec,[],2)
array =
    90    43
    80    91
    55    80
   102    44

Now sum the results along the rows.

tot = sum(array,2)
tot =
   133
   171
   135
   146

Or, in one bigger statement, try this.

tot = sum(reshape([c{:,2:3}],[],2),2)
tot =
   133
   171
   135
   146

Cell Array Questions

Do you use cell arrays? Can you do what you want without undo contortions? Let me know here.


Get the MATLAB code

Published with MATLAB® 7.9

October 15th, 2009

Concatenating structs

From time to time, I get asked or see queries about how to concatenate two struct arrays to merge the fields. There's even a section in the documentation covering this topic. I thought I'd show it here to help people out.

Contents

Example Data

Suppose I've got some system for which I have collected information on a couple of individuals, including their names and ages.

s1.name = 'fred';
s1.age = 42;
s1(2).name = 'alice';
s1(2).age = 29;

Later, I go back and collect the individual's heights (in cm).

s2.height = 170;
s2(2).height = 160;

It would be great to merge these arrays now.

s1
s2
s1 = 
1x2 struct array with fields:
    name
    age
s2 = 
1x2 struct array with fields:
    height

Let me collect the field names.

fn1 = fieldnames(s1);
fn2 = fieldnames(s2);
fn = [fn1; fn2];

Next, I ensure the fieldnames are unique.

ufn = length(fn) == unique(length(fn))
ufn =
     1

Next let me convert the data in my structs into cell arrays using struct2cell.

c1 = struct2cell(s1)
c2 = struct2cell(s2)
c1(:,:,1) = 
    'fred'
    [  42]
c1(:,:,2) = 
    'alice'
    [   29]
c2(:,:,1) = 
    [170]
c2(:,:,2) = 
    [160]

Next I merge the data from the 2 cell arrays.

c = [c1;c2]
c(:,:,1) = 
    'fred'
    [  42]
    [ 170]
c(:,:,2) = 
    'alice'
    [   29]
    [  160]

And finally, I construct the new struct using cell2struct.

s = cell2struct(c,fn,1)
s = 
1x2 struct array with fields:
    name
    age
    height

I can check the output now.

s(1)
s(2)
ans = 
      name: 'fred'
       age: 42
    height: 170
ans = 
      name: 'alice'
       age: 29
    height: 160

mergeStruct

Here's the function mergeStruct that I created to encapsulate the steps in this process, including some error checks.

dbtype mergeStruct
1     function sout = mergestruct(varargin)
2     %MERGESTRUCT Merge structures with unique fields.
3     
4     %   Copyright 2009 The MathWorks, Inc.
5     
6     % Start with collecting fieldnames, checking implicitly 
7     % that inputs are structures.
8     fn = [];
9     for k = 1:nargin
10        try
11            fn = [fn ; fieldnames(varargin{k})];
12        catch MEstruct
13            throw(MEstruct)
14        end
15    end
16    
17    % Make sure the field names are unique.
18    if length(fn) ~= length(unique(fn))
19        error('mergestruct:FieldsNotUnique',...
20            'Field names must be unique');
21    end
22    
23    % Now concatenate the data from each struct.  Can't use 
24    % structfun since input structs may not be scalar.
25    c = [];
26    for k = 1:nargin
27        try
28            c = [c ; struct2cell(varargin{k})];
29        catch MEdata
30            throw(MEdata);
31        end
32    end
33    
34    % Construct the output.
35    sout = cell2struct(c, fn, 1);

Do You Need to Merge struct data?

Do you ever need to merge data stored in structs when you gather new information? Or are your structs static in nature? If the information you collect is always the same sort of information, then you probably know the form of the structure when you start, even if all the data isn't available at first.

There is code on the File Exchange to concatenate structures; I confess I have not tried it, but it is highly rated. Does this post or the File Exchange contribution help your work? Let me know here.


Get the MATLAB code

Published with MATLAB® 7.9

October 9th, 2009

Handling Discrete Data

Discrete data arise in many applications and the data may be numeric, or non-numeric, often referred to as categorical. Not all data are strictly numeric, and other characteristics can be pertinent or useful. You can use a variety of techniques and data representations in MATLAB for storing and manipulation discrete data.

Contents

Example: Periodic Table of Elements

The periodic table of elements provides a rich basis for this discussion. If you remember your chemistry class (I did, very imperfectly), you probably remember that each element has a fixed number of protons associated with it, also called the atomic number. You might also remember that the periodic table is arranged so that certain columns of elements have certain characteristics. For example, apart from Hydrogen, the left-most column holds the alkali metals (such as sodium and potassium) and the left side generally contains metallic elements. The right-most column holds Noble gases, with the next column to their left holding halogens. Nonmetals are towards the right side of the table. At standard temperature and pressure (known as STP), elements are in one of three states: solid, liquid, or gas.

We've just talked about 3 different things:

  • atomic number (number of protons in the nucleus), positive integers and therefore numeric
  • solid, liquid, gas states, which can be considered ordinal (that is, solid < liquid < gas)
  • element categories include the metals, nonmetals, etc., and these are simply labels or names, and therefore nominal (notice how these categories aren't strictly columnar in the periodic table)
  • Discrete Data Representation Options

    I will try to use the periodic table example for each of the possible discrete data representations, but some of these will be forced examples.

    logical

    You could imagine use a logical variable to indicate elements that are in the Noble category as true and false for the remainder.

    String or Coded Integer

    logical variables are fine if you are representing exactly two possible categories. However, looking at the periodic table, you see that I collapsed a subset of categories into "not Noble" (false). Rather than collapsing the set of categories into Noble and not Noble, the entire collection might be better represented as integers that are arbitrarily given meaning, or by strings. For example, I could code alkali metals as 'alkali metals' or as 1, and so on. The first 3 elements of the table would be represented with [8 10 1] or as {'other nonmetals' 'noble gases' 'alkali metals'}.

    Coded integers are useful for doing comparisons, for example, subsetting the data, and are memory-efficient. However, integers can cause you some mental overhead trying to remember the mapping between them and their labels.

    Strings are good for representing the data in a readable form, but are harder to manipulate, especially for an ordered list such as phase states. Also you may prefer to use numeric type operations such as ==, ~=, and < since they are more direct and show intent. In addition, strings take more memory, especially when your data comprise many repetitions of the same value.

    nominal Array

    You have additional choices with nominal and ordinal arrays from Statistics Toolbox.

    Let's imagine that we create an array representing the element categories by atomic number. The first 10 elements of this array look like this. I collapse some of the traditional categories for brevity.

    catLabels = {'metals', 'metalloids','other nonmetals','halogens', ...
        'noble gases', 'unknown'};
    elemCats = nominal([3 5 1 1 2 3 3 3 4 5]', catLabels, 1:length(catLabels))
    elemNames = {'hydrogen','helium','lithium','beryllium','boron', ...
        'carbon','nitrogen','oxygen','fluorine','neon'}';
    elemCats = 
         other nonmetals 
         noble gases 
         metals 
         metals 
         metalloids 
         other nonmetals 
         other nonmetals 
         other nonmetals 
         halogens 
         noble gases 
    
    

    Here is the list of all possible values for the categories. Notice that this nominal array carries that information with it. (You can modify this by adding, renaming, and deleting as necessary.)

    getlabels(elemCats)
    ans = 
      Columns 1 through 5
        'metals'    'metalloids'    'other nonmetals'    'halogens'    'noble gases'
      Column 6
        'unknown'
    

    Let's find all the 'other nonmetals' in the list.

    nonmets = find(elemCats == 'other nonmetals')
    elemNames(nonmets)
    nonmets =
         1
         6
         7
         8
    ans = 
        'hydrogen'
        'carbon'
        'nitrogen'
        'oxygen'
    

    ordinal Array

    Let's investigate the states of the first 10 elements now. For some purposes, the states can be regarded as nominal. However, they can also be viewed as an ordered set, solid < liquid < gas. (At STP, only mercury and bromine are in liquid state.)

    stLabels = {'solid' 'liquid' 'gas'};
    elemSts = ordinal([3 3 1 1 1 1 3 3 3 3]', stLabels, 1:length(stLabels))
    elemSts = 
         gas 
         gas 
         solid 
         solid 
         solid 
         solid 
         gas 
         gas 
         gas 
         gas 
    
    

    Here is the list of all possible values for the states.

    getlabels(elemSts)
    ans = 
        'solid'    'liquid'    'gas'
    

    Let's find all the gases in the list.

    gases = find(elemSts > 'liquid') % or == 'gas'
    elemNames(gases)
    gases =
         1
         2
         7
         8
         9
        10
    ans = 
        'hydrogen'
        'helium'
        'nitrogen'
        'oxygen'
        'fluorine'
        'neon'
    

    Let's sort the element list by state.

    [sts, elemNo] = sort(elemSts)
    [cellstr(sts) elemNames(elemNo)]
    sts = 
         solid 
         solid 
         solid 
         solid 
         gas 
         gas 
         gas 
         gas 
         gas 
         gas 
    
    elemNo =
         3
         4
         5
         6
         1
         2
         7
         8
         9
        10
    ans = 
        'solid'    'lithium'  
        'solid'    'beryllium'
        'solid'    'boron'    
        'solid'    'carbon'   
        'gas'      'hydrogen' 
        'gas'      'helium'   
        'gas'      'nitrogen' 
        'gas'      'oxygen'   
        'gas'      'fluorine' 
        'gas'      'neon'     
    

    Integer

    For the given list of elements, the atomic number happens to match exactly with the index into the elemNames array. So I only need to create the relevant integer values 1:10 to represent the atomic number. This list is itself clearly ordered, and has numerical meaning (unlike ordinal arrays which are ordered but the spacing between elements has no definition).

    Going All the Way

    To continue this example, you can collect all the information together into a single dataset array (from Statistics Toolbox). The advantages include being able to "index" by the element name!

    atomicNo = (1:length(elemNames))';
    periodicTable = dataset(atomicNo ,elemCats, elemSts, 'obsnames', elemNames)
    periodicTable = 
                     atomicNo    elemCats           elemSts
        hydrogen      1          other nonmetals    gas    
        helium        2          noble gases        gas    
        lithium       3          metals             solid  
        beryllium     4          metals             solid  
        boron         5          metalloids         solid  
        carbon        6          other nonmetals    solid  
        nitrogen      7          other nonmetals    gas    
        oxygen        8          other nonmetals    gas    
        fluorine      9          halogens           gas    
        neon         10          noble gases        gas    
    

    Your Data or Experiment

    Do you have a situation where some of your data can be represented with nominal or ordinal arrays? How do you manage that information now? Let me know by posting here.


    Get the MATLAB code

    Published with MATLAB® 7.9

    October 5th, 2009

    October 14 - Virtual MATLAB Conference

    We're having a MATLAB conference on October 14, in the virtual world. There'll be presentations, forums, chats, live discussion, etc. Please check out the event and join in the fun!

    Here's the presentation schedule to help you find what you'd like to participate in.

    I'm planning to attend some of the forums and listen to many of the talks, including ones focused on MATLAB for education. Hope to meet you there!

    October 2nd, 2009

    Using parfor Loops: Getting Up and Running

    Today I’d like to introduce a guest blogger, Sarah Wait Zaranek, who is an application engineer here at The MathWorks. Sarah previously has written about speeding up code from a customer to get acceptable performance. She again will be writing about speeding up MATLAB applications, but this time her focus will be on using the parallel computing tools.

    Contents

    Introduction

    I wanted to write a post to help users better understand our parallel computing tools. In this post, I will focus on one of the more commonly used functions in these tools: the parfor-loop.

    This post will focus on getting a parallel code using parfor up and running. Performance will not be addressed in this post. I will assume that the reader has a basic knowledge of the parfor-loop construct. Loren has a very nice introduction to using parfor in one of her previous posts. There are also some nice introductory videos.

    Note for clarity : Since Loren's introductory post, the toolbox used for parallel computing has changed names from the Distributed Computing Toolbox to the Parallel Computing Toolbox. These are not two separate toolboxes.

    Method

    In some cases, you may only need to change a for-loop to a parfor-loop to get their code running in parallel. However, in other cases you may need to slightly alter the code so that parfor can work. I decided to show a few examples highlighting the main challenges that one might encounter. I have separated these examples into four encompassing categories:

    • Independence
    • Globals and Transparency
    • Classification
    • Uniqueness

    Background on parfor-loops

    In a parfor-loop (just like in a standard for-loop) a series of statements known as the loop body are iterated over a range of values. However, when using a parfor-loop the iterations are run not on the client MATLAB machine but are run in parallel on MATLAB workers.

    Each worker has its own unique workspace. So, the data needed to do these calculations is sent from the client to workers, and the results are sent back to the client and pieced together. The cool thing about parfor is this data transfer is handled for the user. When MATLAB gets to the parfor-loop, it statically analyzes the body of the parfor-loop and determines what information goes to which worker and what variables will be returning to the client MATLAB. Understanding this concept will become important when understanding why particular constraints are placed on the use of parfor.

    Opening the matlabpool

    Before looking at some examples, I will open up a matlabpool so I can run my loops in parallel. I will be opening up the matlabpool using my default local configuration (i.e. my workers will be running on the dual-core laptop machine where my MATLAB has been installed).

    if matlabpool('size') == 0 % checking to see if my pool is already open
        matlabpool open 2
    end
    Starting matlabpool using the 'local' configuration ... connected to 2 labs.
    

    Note : The 'size' option was new in R2008b.

    Independence

    The parfor-loop is designed for task-parallel types of problems where each iteration of the loop is independent of each other iteration. This is a critical requirement for using a parfor-loop. Let's see an example of when each iteration is not independent.

    type dependentLoop.m
    % Example of a dependent for-loop
    a = zeros(1,10);
    
    parfor it = 1:10 
        a(it) = someFunction(a(it-1));
    end
    

    Checking the above code using M-Lint (MATLAB's static code analyzer) gives a warning message that these iterations are dependent and will not work with the parfor construct. M-Lint can either be accessed via the editor or command line. In this case, I use the command line and have defined a simple function displayMlint so that the display is compact.

    output = mlint('dependentLoop.m');
    displayMlint(output)
    The PARFOR loop cannot run due to 
     the way variable 'a' is used. 
    
    In a PARFOR loop, variable 'a' is 
     indexed in different ways, 
     potentially causing dependencies 
     between iterations. 
    
    

    Sometimes loops are intrinsically or unavoidably dependent, and therefore parfor is not a good fit for that type of calculation. However, in some cases it is possible to reformulate the body of the loop to eliminate the dependency or separate it from the main time-consuming calculation.

    Globals and Transparency

    All variables within the body of a parfor-loop must be transparent. This means that all references to variables must occur in the text of the program. Since MATLAB is statically analyzing the loops to figure out what data goes to what worker and what data comes back, this seems like an understandable restriction.

    Therefore, the following commands cannot be used within the body of a parfor-loop : evalc, eval, evalin, and assignin. load can also not be used unless the output of load is assigned to a variable name. It is possible to use the above functions within a function called by parfor, due to the fact that the function has its own workspace. I have found that this is often the easiest workaround for the transparency issue.

    Additionally, you cannot define global variables or persistent variables within the body of the parfor loop. I would also suggest being careful with the use of globals since changes in global values on workers are not automatically reflected in local global values.

    Classification

    A detailed description of the classification of variables in a parfor-loop is in the documentation. I think it is useful to view classification as representing the different ways a variable is passed between client and worker and the different ways it is used within the body of the parfor-loop.

    Challenges with Classification

    Often challenges arise when first converting for-loops to parfor-loops due to issues with this classification. An often seen issue is the conversion of nested for-loops, where sliced variables are not indexed appropriately.

    Sliced variables are variables where each worker is calculating on a different part of that variable. Therefore, sliced variables are sliced or divided amongst the workers. Sliced variables are used to prevent unneeded data transfer from client to worker.

    Using parfor with Nested for-Loops

    The loop below is nested and encounters some of the restrictions placed on parfor for sliced variables.

    type parforNestTry.m
    A1 = zeros(10,10); 
    
    parfor ix = 1:10
        for jx = 1:10
            A1(ix, jx) = ix + jx;
        end
    end
    
    output = mlint('parforNestTry.m');
    displayMlint(output);
    The PARFOR loop cannot run due to 
     the way variable 'A1' is used. 
    
    Valid indices for 'A1' are 
     restricted in PARFOR loops. 
    
    

    In this case, A1 is a sliced variable. For sliced variables, the restrictions are placed on the first-level variable indices. This allows parfor to easily distribute the right part of the variable to the right workers.

    The first level indexing ,in general, refers to indexing within the first set of parenthesis or braces. This is explained in more detail in the same section as classification in the documentation.

    One of these first-level indices must be the loop counter variable or the counter variable plus or minus a constant. Every other first-level index must be a constant, a non-loop counter variable, a colon, or an end.

    In this case, A1 has an loop counter variable for both first level indices (ix and jx).

    The solution to this is make sure a loop counter variable is only one of the indices of A1 and make the other index a colon. To implement this, the results of the inner loop can be saved to a new variable and then that variable can be saved to the desired variable outside the nested loop.

    A2 = zeros(10,10);
    
    parfor ix = 1:10
        myTemp = zeros(1,10);
        for jx = 1:10
            myTemp(jx) = ix + jx;
        end
        A2(ix,:) = myTemp;
    end

    You can also solve this issue by using cells. Since jx is now in the second level of indexing, it can be an loop counter variable.

    A3 = cell(10,1);
    
    parfor ix = 1:10
        for jx = 1:10
            A3{ix}(jx) = ix + jx;
        end
    end
    
    A3 = cell2mat(A3);

    I have found that both solutions have their benefits. While cells may be easier to implement in your code, they also result in A3 using more memory due to the additional memory requirements for cells. The call to cell2mat also adds additional processing time.

    A similar technique can be used for several levels of nested for-loops.

    Uniqueness

    Doing Machine Specific Calculations

    This is a way, while using parfor-loops, to determine which machine you are on and do machine specific instructions within the loop. An example of why you would want to do this is if different machines have data files in different directories, and you wanted to make sure to get into the right directory. Do be careful if you make the code machine-specific since it will be harder to port.

    % Getting the machine host name
    
    [~,hostname] = system('hostname');
    
    % If the loop iterations are the same as the size of matlabpool, the
    % command is run once per worker.
    
    parfor ix = 1:matlabpool('size')
        [~,hostnameID{ix}] = system('hostname');
    end
    
    % Can then do host/machine specific commands
    hostnames = unique(hostnameID);
    checkhost = hostnames(1);
    
    parfor ix = 1:matlabpool('size')
        [~,myhost] = system('hostname');
        if strcmp(myhost,checkhost)
           display('On Machine 1')
        else
            display('NOT on Machine 1')
        end
    end
    On Machine 1
    On Machine 1
    

    In my case since I am running locally -- all of the workers are on the same machine.

    Here's the same code running on a non-local cluster.

    matlabpool close
    matlabpool open speedy
    parfor ix = 1:matlabpool('size')
        [~,hostnameID{ix}] = system('hostname');
    end
    
    % Can then do host/machine specific commands
    hostnames = unique(hostnameID);
    checkhost = hostnames(1);
    
    parfor ix = 1:matlabpool('size')
        [~,myhost] = system('hostname');
        if strcmp(myhost,checkhost)
           display('On Machine 1')
        else
            display('NOT on Machine 1')
        end
    end
    Sending a stop signal to all the labs ... stopped.
    Starting matlabpool using the 'speedy' configuration ... connected to 16 labs.
    On Machine 1
    On Machine 1
    On Machine 1
    NOT on Machine 1
    On Machine 1
    NOT on Machine 1
    NOT on Machine 1
    NOT on Machine 1
    NOT on Machine 1
    NOT on Machine 1
    NOT on Machine 1
    NOT on Machine 1
    NOT on Machine 1
    NOT on Machine 1
    NOT on Machine 1
    NOT on Machine 1
    

    Note: The ~ feature is new in R2009b and discussed as a new feature in one of Loren's previous blog posts.

    Doing Worker Specific Calculations

    I would suggest using the new spmd functionality to do worker specific calculations. For more information about spmd, check out the documentation.

    Clean up

    matlabpool close
    Sending a stop signal to all the labs ... stopped.
    

    Your examples

    Tell me about some of the ways you have used parfor-loops or feel free to post questions regarding non-performance related issues that haven't been addressed here. Post your questions and thoughts here.


    Get the MATLAB code

    Published with MATLAB® 7.9

    September 11th, 2009

    MATLAB Release 2009b - Best New Feature or ~?

    MATLAB R2009b was recently released. My favorite new language feature is the introduction of the ~ notation to denote missing inputs in function declarations, and missing outputs in calls to functions. Let me show you how this works.

    Contents

    Unused Outputs

    I have occasionally found that I would like the indices from sorting a vector, but I don't need the sorted values. In the past, I wrote one of these code variants :

              [dummy, ind] = sort(X)
              [ind, ind] = sort(X)

    In the first case, I end up with a variable dummy in my workspace that I don't need. If my data to sort, X, has a large number of elements, I will have an unneeded large array hanging around afterwards. In the second case, I am banking on MATLAB assigning outputs in order, left to right, and I create somewhat less legible code, but I don't have an extra array hanging around afterwards.

    Now you can write this instead:

              [~, ind] = sort(X)

    and I hope you find your code readable, with the clear intention to not use the first output variable.

    Unused Inputs

    You can similarly designate unused inputs with ~ in function declarations. Here's how you'd define the interface where the second input is ignored.

              function out = mySpecialFunction(X,~,dim)

    You might ask why that is useful. If I don't use the second input, why put it in at all? The answer is that your function might be called by some other function that expects to send three inputs. This happens for many GUI callbacks, and particularly those you generate using guide. So your function needs to take three inputs. But if it is never going to use the second input, you can denote the second one with ~.

    Can M-Lint Help?

    Yes! Consider this function mySpecialFunction shown here.

    type mySpecialFunction
    function ind = mySpecialFunction(X,second,dim)
    % mySpecialFunction Function to illustrate ~ for inputs and outputs.
    
    [dummy,ind] = sort(X,dim);
    

    Running mlint on this code produces two messages.

    msgs = mlint('mySpecialFunction');
    disp(msgs(1).message(1:50))
    disp(msgs(1).message(51:end))
    disp(' ')
    disp(msgs(2).message(1:49))
    disp(msgs(2).message(50:end))
    Input argument 'second' might be unused, although 
    a later one is used.  Consider replacing it by ~.
     
    The value assigned here to 'dummy' appears to be 
    unused.  Consider replacing it by ~.
    

    Since M-Lint is running continuously in the editor, you would see these messages as you edit the file. Here's a cleaned up version of the file.

    type mySpecialFunction1
    function ind = mySpecialFunction1(X,~,dim)
    % mySpecialFunction Function to illustrate ~ for inputs and outputs.
    
    [~,ind] = sort(X,dim);
    

    And let's see what M-Lint finds.

    mlint mySpecialFunction1

    It finds nothing at all.

    What's Your Favorite New Feature?

    Have you looked through the new features for R2009b? What's your favorite? Let me know here.


    Get the MATLAB code

    Published with MATLAB® 7.9

    September 3rd, 2009

    Rounding Results

    There are frequent questions on the MATLAB newsgroup about rounding results to a certain number of decimal places. MATLAB itself doesn't provide this functionality explicitly, though it is easy to accomplish.

    Contents

    Sidetrack : A Little MathWorks History

    MathWorks' first Massachusetts office phone number was 653-1415 (ignoring country and area codes). The astute reader will notice that the last 5 digits are an approximation for (or, in MATLAB, pi). A local resident called one day to say that she kept getting calls for MathWorks and she wasn't sure why. But it was quite inconvenient for her because she spent lots of time on the second floor of her home, and the phone was on the first floor. The excess round-trips were taxing her! To understand what was happening, you should know that in some of the early MathWorks materials, the phone number was listed as .

    Example

    format long
    x = 1.23456789
    x =
       1.234567890000000
    

    Use round. Here we get no decimals at all.

    round(x)
    ans =
         1
    

    There are many ways to get the number of decimals you want. Here's one way to round to 3 decimals.

    round(x*1000)/1000
    ans =
       1.235000000000000
    

    Here's another way.

    sprintf('%0.3f',x)
    ans =
    1.235
    

    And another. In this case, using the "easy" way to specify the format, you need to know how many integral digits there are as well.

    str2num(num2str(x,4))
    ans =
       1.235000000000000
    

    Tools for Rounding Solutions

    There are a lot of tools for helping you round numbers. The functions I list here for MATLAB form the basis of many, if not all, of the specific solutions.

    MATLAB

    Mapping Toolbox

    File Exchange Rounding Tools

    % N= num2str(X,SF);
    % N= str2num(N);

    Question for You

    When you round values, do you want this for display only, or wanted for calculations? Some applications I can think of might include processing data that has a smaller number of bits of precision to start with. What applications do you need rounding for in your calculations? Post here with your thoughts.


    Get the MATLAB code

    Published with MATLAB® 7.8


    Loren Shure works on design of the MATLAB language at The MathWorks. She writes here about once a week on MATLAB programming and related topics.

    • Jun: I totally can not believe it, Loren. You are really helpful. Thank you so much, MATLAB master!
    • Loren: Wow folks- Always lots of interest when there’s a quickie to try out! I will only make 2 general...
    • Loren: Jun- ismember is your friend here: >> [aa,ind] = ismember(Array2,Arra y1) aa = 1 1 1 1 1 1 1 ind = 1 2 1 4 4 3...
    • Dan: I like the first way better than the second way. Combining the arrays into one and running any is nice, although...
    • James Myatt: How about I = (a == 0 | b == 0); a(I) = []; b(I) = [];
    • Tunc: Hello Loren, love your blog because of such inspiring and challenging comments to such ’small’...
    • Pekka Kumpulainen: Here is my tradeoff. I usually want to keep the original variables as they are most probably...
    • Iain: Followup: Of course, to allow NaNs (counting them as non-zero): mask = (a~=0) & (b~=0); The mask says “a...
    • Matt Fig: I would usually go with something like this: y = a&b; x = a(y); y = b(y); But I was surprised to find...
    • kk: c=all([a;b]) a(c) a(b)

    These postings are the author's and don't necessarily represent the opinions of The MathWorks.