Loren on the Art of MATLAB

November 19th, 2009

Coordinating Zero Removals from Multiple Arrays

I've fielded some questions recently about how to coordinate multiple arrays changing simultaneously. One example is removing elements for two arrays in the case where either array holds a zero for the location. This is a good opportunity to reiterate the use of logical arrays and some useful associated functions (such as any and all).

Contents

Identify Pairs to Remove

Let's say I have 2 arrays

a = [ 1  4  9  0 25  0 49  0]
b = [ 1  0  3  0  0  6  7  8]
a =
     1     4     9     0    25     0    49     0
b =
     1     0     3     0     0     6     7     8

and I would like to delete the corresponding elements in a and b when either of them contains a zero value.

First Algorithm

There are several possible algorithms, each with their own trade-offs. Here's the first one.

anyzero = any([a;b] == 0)
a(anyzero) = []
b(anyzero) = []
anyzero =
     0     1     0     1     1     1     0     1
a =
     1     9    49
b =
     1     3     7

This algorithm combines the two arrays into one, a potentially costly move if the arrays are large. Then check for values that equal zero. And finally, check columnwise, using the function any, to identify the columns that have at least one zero. Finally, use this array of logical indices to delete the appropriate elements of a and b.

Second Algorithm

This algorithm (courtesy of Mirek L. in this post doesn't suffer from combining the two arrays.

x1 = a(a.*b ~= 0)
y1 = b(a.*b ~= 0)
x1 =
     1     9    49
y1 =
     1     3     7

But it calculates the same temporary array twice (and it's the size of one of the vectors). To be able to recalculate the temporary array this way, I can't overwrite the initial arrays as you see in the first algorithm. And finally, is there is a NaN or Inf corresponding to a 0, this algorithm won't find it.

Always Tradeoffs

There are always tradeoffs to make like the ones I mention here, at least when I program. How do you choose which tradeoffs to make? Which one would you choose here? Or would you choose an entirely different algorithm (which I hope you'll post). Let us know here.


Get the MATLAB code

Published with MATLAB® 7.9

13 Responses to “Coordinating Zero Removals from Multiple Arrays”

  1. Brian replied on :

    This way is fairly quick:

    idx = ~(a&b);
    a(idx)= [];
    b(idx) = [];
    
  2. Cris Luengo replied on :

    I would have done

    anyzero = (a==0)|(b==0);
    

    Is the option with ANY more efficient?

  3. Kheng replied on :

    Hi Loren,

    How about the straightforward (naive?) way of:

    msk = a == 0 | b == 0
    a(msk) = []
    b(msk) = []
    

    From what I understand, computing mask avoids the costly combining of arrays [a;b]. And as msk is computed once, and stored in a logical vector, there is no duplication in computation.

    Cheers,
    Kheng.

  4. Iain replied on :

    My knee-jerk response was:

    mask = a & b;
    a = a(mask);
    b = b(mask);
    

    This is clearer to me as I don’t have the exact semantics of any() in my head. It deals with Infs ok, but fails on NaNs, which can’t be converted to logicals.

  5. kk replied on :

    c=all([a;b])
    a(c)
    a(b)

  6. Matt Fig replied on :

    I would usually go with something like this:

     
    y = a&b;
    x = a(y);
    y = b(y);
     

    But I was surprised to find that the second solution you posted was faster, even with the multiplication and then logical comparison all happening twice! Why is that?
    I thought maybe it was because the memory for the index wasn’t specifically stored, but this is even slower (as I would have expected):

     
    x2 = a(a&b);
    y2 = b(a&b);
     

    So how can a simple & be slower than element by element multiplication then a logical comparison?
    I used a = round(rand(10000,1)*3); and similar for b.

  7. Iain replied on :

    Followup:

    Of course, to allow NaNs (counting them as non-zero):

    mask = (a~=0) & (b~=0);

    The mask says “a and b should be non-zero”.

    If, on reflection, NaNs should veto inclusion as well as zeros, that’s easy too:

    mask = ~((a==0) | isnan(a) | (b==0) | isnan(b));
    
    
    
    
    
    		
  8. Pekka Kumpulainen replied on :

    Here is my tradeoff.
    I usually want to keep the original variables as they are most probably needed as such later on. So instead of cutting the variables by assigning empty, I would invert the logic.

    a_z = a~=0;
    b_z = b~=0;
    Two logical arrays, which need less memory than temporarily combining a and b
    Third logical
    ind = a_z|b_z;
    or
    ind = a_z&b_z;
    depending on whether you want any or all zeros.
    New variables
    na = a(ind); nb = = b(ind);
    This avoids combining the a and b as in the first example. And also the multiplication that was used in the second.
    If you want to overwrite the original values, just negate the logical operators used here and assign
    a(ind) = [];
    The variables will still be moved to a new location in memory, right? As their size will change we can’t use the (1:end) index trick to keep them in the same location.

  9. Tunc replied on :

    Hello Loren,

    love your blog because of such inspiring and challenging comments to such ‘small’ programming questions. My quick & dirty approach which did the job for me:

    a(find(a~=0 & b~=0))
    b(find(a~=0 & b~=0))
    
  10. James Myatt replied on :

    How about

    I = (a == 0 | b == 0);
    a(I) = [];
    b(I) = [];
    
  11. Dan replied on :

    I like the first way better than the second way. Combining the arrays into one and running any is nice, although as you mention it could be costly. I would do this:

    a = [ 1  4  9  0 25  0 49  0];
    b = [ 1  0  3  0  0  6  7  8];
    anyzero = unique([find(a == 0), find(b == 0)]);
    a(anyzero) = []
    b(anyzero) = []
    

    As with most of my code, it may not work, but it seems to match your answer. It also calls find twice, which may be costly. Finally, I am not sure if unique is better/worse than any.

  12. Loren replied on :

    Wow folks-

    Always lots of interest when there’s a quickie to try out! I will only make 2 general comments so far.

    1) why the math is faster than & may have to do with the JIT and/or multi-threading – but I am not sure.
    2) Solutions using find may be able to be sped up using just the logical expression instead as a logical index. It’s being computed in the find expression already, and doesn’t have to translate to locations and then index back in.

    –Loren

  13. Arthur replied on :

    I spend a lot of time handling many-dimensioned matrices. Even if a concatenate based solution works in the N-dimensional case, I would still avoid a solution with any(matrix) because any()’s behavior is not obvious for non-vectors. It’s a pet peeve of mine that any, all, min, max, sum, and their relatives don’t return a scalar by default. I would much rather any(X) return any(X(:)) instead of its current behavior, any(X,1). In that form, any(X) would actually answer the question “is any element true?” To me, that would make code written with any & friends much more readable and dependable by avoiding unexpected matrix results.


MathWorks
Loren Shure works on design of the MATLAB language at MathWorks. She writes here about once a week on MATLAB programming and related topics.

These postings are the author's and don't necessarily represent the opinions of The MathWorks.