Loren on the Art of MATLAB

June 14th, 2006

Inverse Mapping from Values to Indices

I am aware of frequent customer requests for replacing multiple values in an array. And the result should be compact code and execute quickly, right?! It came up again on a recent thread on the MATLAB newsgroup so it seemed timely to write an article on this topic for the archives. In this recent thread, the question of timing came up - which method is fastest? Your mileage may vary and it would be interesting to compare results, especially if yours are qualitatively different than those run from my laptop. You can post your results here.

Contents

Tidy up to start.

clear
format short g

Sample Input and Results

Supppose I have an array a containing values that I expect to find in another array, Minit.

a = [1 3 5 7 8]
Minit = [1 3 5; ...
    3 5 7; ...
    3 7 8]
a =
     1     3     5     7     8
Minit =
     1     3     5
     3     5     7
     3     7     8

I would like to see a replacement matrix for Minit in which the values are replaced with their indices in the list a.

Mwanted = [1 2 3; ...
     2 3 4; ...
     2 4 5]
Mwanted =
     1     2     3
     2     3     4
     2     4     5

Algorithms

There are at least three ways to accomplish the task, one, a brute force loop, and two that require knowledge of other very useful MATLAB functions, histc and ismember.

Let's first look at the for-loop solution.

type loopFR
function M1 = loopFR(M,A)
% loopFR replaces M with its indices from A.

% First set up the inverse mapping array.  Initialize it to zero, and the
% size is the maximum number the array.  Note that I get away with this
% here because all of my numbers in both A and M are positive integers.
imap = zeros(1,max(A));  % inverse mapping array
% Now fill the inverse map, but placing in the location specified in each
% element of A, the index of that element in A.
% If A = [1 5 4], then imap = [1 0 0 3 2].
imap(A) = 1:length(A);  
M1 = M;
for k = 1:numel(M)
    M1(k) = imap(M(k));
end

Here, I set up imap, an array that allows me to do the inverse mapping from values in a back to indices in a.

And check that loopFR gets the right result.

isequal(loopFR(Minit,a), Mwanted)
ans =
     1

Now let's check both ismember and histc before spending effort timing the results.

[mem, mem] = ismember(Minit,a);
[his, his] = histc(Minit,a);
isequal(Mwanted, mem, his)
ans =
     1

Time the Algorithms

N = [1 10 100 1000 10000 100000];  % repetitions to test
t = zeros(length(N),3); % set up the timing collection array
for n = 1:length(N)
    % Make a larger matrix to search.
    % loopFR
    M = repmat(Minit,N(n),1);
    tic
    M1 = loopFR(M,a);
    t(n,1) = toc;
    % ismember
    tic
    [M2,M2] = ismember(M,a);
    t(n,2) = toc;
    % histc
    tic
    [M3,M3] = histc(M,a);
    t(n,3) = toc;
    if ~isequal(M1,M2,M3)
        disp('oops')
    end
end

Timing Results

disp('       Counter     numel(M)     loop       ismember     histc')
disp([N' 9*N' t]);
subplot(1,2,1)
plot(N', t, '--*')
xlabel('Number of repeated arrays')
ylabel('Execution time')
legend({'loop','ismember','histc'},'Location', 'NorthWest')
subplot(1,2,2)
semilogx(N', t, '--*')
xlabel('Number of repeated arrays')
ylabel('Execution time')
legend({'loop','ismember','histc'},'Location', 'NorthWest')
       Counter     numel(M)     loop       ismember     histc
            1            9  4.5537e-005   0.00012627  1.2292e-005
           10           90  2.8775e-005  8.8838e-005  1.3689e-005
          100          900  6.6489e-005   0.00022042  5.8108e-005
         1000         9000   0.00046542    0.0017497   0.00051291
        10000        90000     0.004678     0.015192    0.0059801
       1e+005       9e+005     0.065959      0.20439     0.063979

Comments

Two important things to point out. First, the matrix M that I use throughout is probably not representative of such arrays as the array size grows since I created it via replication. Often in such a case, the number of unique elements in a, the inverse mapping matrix, would also grow, at least a bit. Second, I used preallocation for the loop code to make sure I didn't get a time hit from growing the array, but I was not expecting the for loop results to be so competitive with those from histc. Please add your thoughts or results here.


Published with MATLAB® 7.2

5 Responses to “Inverse Mapping from Values to Indices”

  1. Markus replied on :

    I suggest to use function cputime for performance measurements, like here:

    t0 = cputime;
    % evaluate function
    dt = cputime - t0;
    disp(sprintf(’Elapsed cpu time = %g seconds.’, dt));

    The difference to tic and toc is that here only the cpu time used by Matlab is counted, not that of other running processes.

  2. Petr Pošík replied on :

    Hi Loren and blog readers!

    There is another simple way of doing that:


    function M1 = loopFR(M,A)
    % loopFR replaces M with its indices from A.

    M1 = M;
    for i = 1:length(A),
    M1(find(M == A(i))) = i;
    end

    This is, however, not a scalable solution as the number of passes through the matrix M increases with the number of vector A elements. In the tests, it is slightly better than the ismember function, maybe they are implemented in a similar way.

  3. Lucio replied on :

    We can also vectorize loopFR, it looks nicer and a little faster

    function M1 = loopFR(M,A)

    imap(A) = 1:length(A);
    M1 = imap(M);

  4. Loren replied on :

    Lucio-

    Thanks! I should have seen that :-(

    –Loren

  5. Ken Crounse replied on :

    Another one function solution — it’s amazing how many Matlab functions do the same exact thing ;-)

    M = interp1(a, 1:numel(a), Minit)

    Ken

Leave a Reply

Wrap code fragments inside <pre> tags, like this:

<pre class="code">
a = magic(3);
sum(a)
</pre>

If you have a "<" character in your code, either follow it with a space or replace it with "&lt;" (including the semicolon).


Loren Shure works on design of the MATLAB language at The MathWorks. She writes here about once a week on MATLAB programming and related topics.

  • Jun: I totally can not believe it, Loren. You are really helpful. Thank you so much, MATLAB master!
  • Loren: Wow folks- Always lots of interest when there’s a quickie to try out! I will only make 2 general...
  • Loren: Jun- ismember is your friend here: >> [aa,ind] = ismember(Array2,Arra y1) aa = 1 1 1 1 1 1 1 ind = 1 2 1 4 4 3...
  • Dan: I like the first way better than the second way. Combining the arrays into one and running any is nice, although...
  • James Myatt: How about I = (a == 0 | b == 0); a(I) = []; b(I) = [];
  • Tunc: Hello Loren, love your blog because of such inspiring and challenging comments to such ’small’...
  • Pekka Kumpulainen: Here is my tradeoff. I usually want to keep the original variables as they are most probably...
  • Iain: Followup: Of course, to allow NaNs (counting them as non-zero): mask = (a~=0) & (b~=0); The mask says “a...
  • Matt Fig: I would usually go with something like this: y = a&b; x = a(y); y = b(y); But I was surprised to find...
  • kk: c=all([a;b]) a(c) a(b)

These postings are the author's and don't necessarily represent the opinions of The MathWorks.