Loren on the Art of MATLAB

Turn ideas into MATLAB

Inverse Mapping from Values to Indices

I am aware of frequent customer requests for replacing multiple values in an array. And the result should be compact code and execute quickly, right?! It came up again on a recent thread on the MATLAB newsgroup so it seemed timely to write an article on this topic for the archives. In this recent thread, the question of timing came up - which method is fastest? Your mileage may vary and it would be interesting to compare results, especially if yours are qualitatively different than those run from my laptop. You can post your results in the comments below.

Contents

Tidy up to start.

clear
format short g

Sample Input and Results

Supppose I have an array a containing values that I expect to find in another array, Minit.

a = [1 3 5 7 8]
Minit = [1 3 5; ...
    3 5 7; ...
    3 7 8]
a =
     1     3     5     7     8
Minit =
     1     3     5
     3     5     7
     3     7     8

I would like to see a replacement matrix for Minit in which the values are replaced with their indices in the list a.

Mwanted = [1 2 3; ...
     2 3 4; ...
     2 4 5]
Mwanted =
     1     2     3
     2     3     4
     2     4     5

Algorithms

There are at least three ways to accomplish the task, one, a brute force loop, and two that require knowledge of other very useful MATLAB functions, histc and ismember.

Let's first look at the for-loop solution.

type loopFR
function M1 = loopFR(M,A)
% loopFR replaces M with its indices from A.

% First set up the inverse mapping array.  Initialize it to zero, and the
% size is the maximum number the array.  Note that I get away with this
% here because all of my numbers in both A and M are positive integers.
imap = zeros(1,max(A));  % inverse mapping array
% Now fill the inverse map, but placing in the location specified in each
% element of A, the index of that element in A.
% If A = [1 5 4], then imap = [1 0 0 3 2].
imap(A) = 1:length(A);  
M1 = M;
for k = 1:numel(M)
    M1(k) = imap(M(k));
end

Here, I set up imap, an array that allows me to do the inverse mapping from values in a back to indices in a.

And check that loopFR gets the right result.

isequal(loopFR(Minit,a), Mwanted)
ans =
     1

Now let's check both ismember and histc before spending effort timing the results.

[mem, mem] = ismember(Minit,a);
[his, his] = histc(Minit,a);
isequal(Mwanted, mem, his)
ans =
     1

Time the Algorithms

N = [1 10 100 1000 10000 100000];  % repetitions to test
t = zeros(length(N),3); % set up the timing collection array
for n = 1:length(N)
    % Make a larger matrix to search.
    % loopFR
    M = repmat(Minit,N(n),1);
    tic
    M1 = loopFR(M,a);
    t(n,1) = toc;
    % ismember
    tic
    [M2,M2] = ismember(M,a);
    t(n,2) = toc;
    % histc
    tic
    [M3,M3] = histc(M,a);
    t(n,3) = toc;
    if ~isequal(M1,M2,M3)
        disp('oops')
    end
end

Timing Results

disp('       Counter     numel(M)     loop       ismember     histc')
disp([N' 9*N' t]);
subplot(1,2,1)
plot(N', t, '--*')
xlabel('Number of repeated arrays')
ylabel('Execution time')
legend({'loop','ismember','histc'},'Location', 'NorthWest')
subplot(1,2,2)
semilogx(N', t, '--*')
xlabel('Number of repeated arrays')
ylabel('Execution time')
legend({'loop','ismember','histc'},'Location', 'NorthWest')
       Counter     numel(M)     loop       ismember     histc
            1            9  4.5537e-005   0.00012627  1.2292e-005
           10           90  2.8775e-005  8.8838e-005  1.3689e-005
          100          900  6.6489e-005   0.00022042  5.8108e-005
         1000         9000   0.00046542    0.0017497   0.00051291
        10000        90000     0.004678     0.015192    0.0059801
       1e+005       9e+005     0.065959      0.20439     0.063979

Comments

Two important things to point out. First, the matrix M that I use throughout is probably not representative of such arrays as the array size grows since I created it via replication. Often in such a case, the number of unique elements in a, the inverse mapping matrix, would also grow, at least a bit. Second, I used preallocation for the loop code to make sure I didn't get a time hit from growing the array, but I was not expecting the for loop results to be so competitive with those from histc. Please add your thoughts or results below.


Published with MATLAB® 7.2

|
  • print
  • send email

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.