The other day, one of my MathWorks buddies, Peter, asked me if I knew any tricks for vectorizing the notion of the colon (:) operator. For me, this sort of inquiry sometimes has an effect similar to waving a flag in front of a bull! The challenge was on.
I know people have posted methods in the past on the MathWorks newsgroup. I quickly had a few more ideas and implemented them.
Contents
Problem Statement
Suppose I have matching pairs of integers depicting the starting and stopping indices for selecting points from another vector. I've shown in other posts that vectorizing the indexing can have tremendous performance benefits for your code. What I'd like to do is create one full list of indices so I ultimately only index into the data vector once.
Solution Variants
Here are a few ways to solve the problem.
- Build up the array of indices, looping (for) over each pair, one at a time.
- Write the indices to a string and have MATLAB evaluate the various : expressions.
- Use arrayfun (two algorithms) as the engine for creating the output.
- Modify the idea of run-length decoding, inspired by Peter Acklam's run length decoder. MathWorks Peter used this as a basis for the rld-based algorithm.
Test the Solutions
Let's prove that the solutions all get the same answers first before delving into the algorithm details. I'll give a vector of starting indices followed by a vector of ending indices. If an ending value is smaller than a starting value, that pair is effectively skipped. Duplicate starting points are allowed.
starts = [1 7 7 15 17]; ends = [3 8 6 15 19]; expectedAnswer = [1 2 3 7 8 15 17 18 19];
Round up the files to run.
d = dir('coloncat*.m');
d.nameans = coloncatarrf1.m ans = coloncatarrf2.m ans = coloncatfor.m ans = coloncatrld.m ans = coloncatstr.m
Run each file with the test data.
for k = 1:length(d) func = str2func(d(k).name(1:end-2)); outputs{k} = func(starts, ends); end
Validate that all the methods calculated the expected results.
rightAnswers = isequal(expectedAnswer,outputs{:})rightAnswers =
1
Look at the Algorithms
Let's look at the variety of algorithms and coding techniques I used. If you look at the first chunk of code in each file, you will see they are algorithmically identical - checking for sequences that have length of at least one (1) and deleting the other entries. What we really want to compare are the lines that follow that section.
Use a for Loop
This is perhaps the most straightforward method.
dbtype coloncatfor1 function x = coloncatfor(start, stop) 2 % COLONCAT Concatenate colon expressions 3 % X = COLONCAT(START,STOP) returns a vector containing the values 4 % [START(1):STOP(1) START(2):STOP(2) START(END):STOP(END)]. 5 6 len = stop - start + 1; 7 8 % keep only sequences whose length is positive 9 pos = len > 0; 10 start = start(pos); 11 stop = stop(pos); 12 len = len(pos); 13 if isempty(len) 14 x = []; 15 return; 16 end 17 18 % find end and beginning indices for each chunk 19 partialLength = cumsum(len); 20 cumStart = [1 partialLength(1:end-1)+1]; 21 22 % preallocate output 23 % then loop through start/stop pairs, in order, and fill 24 numtot = sum(len); 25 x = zeros(1,numtot); 26 for k = 1:length(start) 27 x(cumStart(k):partialLength(k)) = start(k):stop(k); 28 end
After culling the start and stop indices, I calculate where each successive sequence ends (partialLength) and where they each begin (cumStart). I next preallocate the output array to its final size. And finally, as I loop through the matched pairs of start and stop values, I generate the colon sequence and fill these values into the correct output locations.
Use sprintf and eval
This method uses sprintf which allows me to gather all of the index start and stop values into a string without looping. I use eval evaluate the string, thereby computing the output.
dbtype coloncatstr
1 function x = coloncatstr(start, stop)
2 % COLONCAT Concatenate colon expressions
3 % X = COLONCAT(START,STOP) returns a vector containing the values
4 % [START(1):STOP(1) START(2):STOP(2) START(END):STOP(END)].
5
6 len = stop - start + 1;
7
8 % keep only sequences whose length is positive
9 pos = len > 0;
10 start = start(pos);
11 stop = stop(pos);
12 len = len(pos);
13 if isempty(len)
14 x = [];
15 return;
16 end
17
18 % create list of indices for : and evaluate
19 indices = [start; stop];
20 list = sprintf('%d:%d ',indices);
21 x = eval(['[', list, ']']);
Use arrayfun - Method 1
The next two methods use arrayfun to do the work. Here's the first approach.
dbtype coloncatarrf1
1 function x = coloncatarrf1(start, stop)
2 % COLONCAT Concatenate colon expressions
3 % X = COLONCAT(START,STOP) returns a vector containing the values
4 % [START(1):STOP(1) START(2):STOP(2) START(END):STOP(END)].
5
6 len = stop - start + 1;
7
8 % keep only sequences whose length is positive
9 pos = len > 0;
10 start = start(pos);
11 stop = stop(pos);
12 len = len(pos);
13 if isempty(len)
14 x = [];
15 return;
16 end
17
18 % put entries into a cell array for later use by arrayfun
19 cstart = num2cell(start);
20 cstop = num2cell(stop);
21 vals = arrayfun(@(x,y){x{1}:y{1}},cstart,cstop);
22 x = [vals{:}];
Place the start and stop values into cell arrays to feed to arrayfun. The entries for the two arrays act in pairs, and the function that I have arrayfun apply places in a cell array the sequence of values between the numbers in the paired cells. Finally, take all the output cells in the cell array and string out the values into a single numeric vector.
Use arrayfun - Method 2
Here's the second approach using arrayfun. It involves a for loop as well.
dbtype coloncatarrf2
1 function x = coloncatarrf2(start, stop)
2 % COLONCAT Concatenate colon expressions
3 % X = COLONCAT(START,STOP) returns a vector containing the values
4 % [START(1):STOP(1) START(2):STOP(2) START(END):STOP(END)].
5
6 len = stop - start + 1;
7
8 % keep only sequences whose length is positive
9 pos = len > 0;
10 start = start(pos);
11 stop = stop(pos);
12 len = len(pos);
13 if isempty(len)
14 x = [];
15 return;
16 end
17
18 % Find unique sequence lengths
19 uniqlens = unique(len);
20
21 % preallocate space for cell outputs of arrayfun
22 chunks = cell(1,length(len));
23
24 % Now loop over the lengths, being sure to place the
25 % outputs in the right positions.
26 for ilen = uniqlens(:)'
27 idxlen = find(len==ilen);
28 chunks(idxlen) = arrayfun(@(xx,yy)xx:yy,...
29 start(idxlen),stop(idxlen),'UniformOutput',false);
30 end
31 % reassemble chunks into output
32 x = [chunks{:}];
The concept here is to loop over sequences that have the same length. At most, this for loop is the same length as the number of values in the initial start vector. If the length of the inputs is long, however, it might be advantageous to loop over sequences that generate the same length (commensurate size) output. Placing the output into cells allows me to not worry that I might be intermingling output vectors of different sizes. Once all the output sequences are collected in the cell array, I flatten it into a vector the same way I did with method 1.
Use Run-Length Decoding Idea
Here's an algorithm based on the idea of run-length decoding.
dbtype coloncatrld1 function x = coloncatrld(start, stop) 2 % COLONCAT Concatenate colon expressions 3 % X = COLONCAT(START,STOP) returns a vector containing the values 4 % [START(1):STOP(1) START(2):STOP(2) START(END):STOP(END)]. 5 6 % Based on Peter Acklam's code for run length decoding. 7 len = stop - start + 1; 8 9 % keep only sequences whose length is positive 10 pos = len > 0; 11 start = start(pos); 12 stop = stop(pos); 13 len = len(pos); 14 if isempty(len) 15 x = []; 16 return; 17 end 18 19 % expand out the colon expressions 20 endlocs = cumsum(len); 21 incr = ones(1, endlocs(end)); 22 jumps = start(2:end) - stop(1:end-1); 23 incr(endlocs(1:end-1)+1) = jumps; 24 incr(1) = start(1); 25 x = cumsum(incr);
The idea is to expand the colon sequences out. I know the lengths of each sequence so I know the starting points in the output array. Fill the values after the start values with 1s. Then I figure out how much to jump from the end of one sequence to the beginning of the next one. If there are repeated start values, the jumps might be negative. Once this array is filled, the output is simply the cumulative sum or cumsum of the sequence.
Other Methods?
These methods I've shown, for vectorizing the notion of the colon operator, are ones that came very quickly. I haven't spent concerted time looking for other algorithms though I am positive some others, including some clever ones, probably exist. Can you help me and provide other possible algorithmic ideas? Please post them here.
Get
the MATLAB code
Published with MATLAB® 7.7



It’s not super fast, and I haven’t provided error checking, but just another method that comes to mind:
function x = coloncatcell(start, stop)
c = cellfun(@colon,num2cell(start),num2cell(stop),’UniformOutput’,0);
x = cat(2,c{:});
Wow Matt, very slick. I hadn’t realized you could use cellfun to zip together more than one cell array like that. Following your lead, here it is as a one liner:
coloncatshort = @(starts,ends) cell2mat(arrayfun(@(a,b) {a:b},starts,ends))
isequal(coloncatshort(starts,ends),expectedAnswer)
ans = 1
Note that Matt’s method doesn’t need to do any extra checks for positive length sequences. It will do the right thing with negative length sequences (i.e. ignore them).
Here is a vectorized fast solution using “core” matlab, similar to the run-length decoding algorithm. I was inspired by one of my submissions to the FEX (NONES). The drawback is that it creates a (possibly large) intermediate array:
starts = [1 7 7 15 17];
ends = [3 8 6 15 19];
len = ends – starts + 1 ;
A = [len(:) repmat(-1,numel(len),max(len)-1)].’ ;
A = cumsum(A) > 0 ;
B = double(A) ;
B(1,:) = starts ;
B = cumsum(B) ;
C = B(A).’
I’ve been trying to find out a bit more about the performance of cellfun and arrayfun. I use these functions quite regularly to step through large cell arrays and I’m curious how they compare to doing a standard for loop (as discussed in this article.)
Does anyone have any thoughts pertaining to this?
Thanks!
Hi.
When reading the article I realize i didn’t quite understand the actual problem. Now that I do understand it, I wonder if this means that there will be plans to update the colon() to allow array inputs? Right now it seems it takes the very first value fo any array input and ignores the rest.
–DA
Nice additional algorithms, everyone!
Daniel,
This article does not imply any changes to the colon operator, or even the introduction of such a function into MATLAB. We will always consider requests along these lines if we hear enough from customers.
–Loren
Note that the one line solutions easily generalize to variable increments, e.g.,
>> coloncatshort = @(starts,steps,ends) cell2mat(arrayfun(@(a,b,c) {a:b:c},starts,steps,ends))
coloncatshort =
@(starts,steps,ends)cell2mat(arrayfun(@(a,b,c){a:b:c},starts,steps,ends))
>> coloncatshort([1 3 2],[4 4 2],[5 7 8])
ans =
1 5 3 7 2 4 6 8
I am often in need of vectorizing an operator where the inputs aren’t the same size. This seems more challenging to me. For instance: given an array MAT = M(IDX) and a vector of unique values M, find IDX. This involves the == operator with different length inputs. The for loop approach is straightforward:
function [IDX] = getidx(MAT,M)
% Given MAT = M(IDX) and M, this finds IDX.
% Where length(unique(M)) = length(M).
IDX = zeros(size(MAT));
for ii = 1:length(M)
IDX(MAT==M(ii)) = ii;
end
I am at a loss to vectorize this, perhaps I just need more sleep :)
Tom-
Nice contribution!
Matt-
Check out bsxfun. I’ve talked about it in the blog once or twice.
–Loren
Rather than go too off topic (against Loren’s rules!), I created a thread on the newsreader that addresses my question. Anyone who is interested may find it here:
http://www.mathworks.com/matlabcentral/newsreader/view_thread/237608
I am not sure if this is bullet proof for all the cases, but I usually write something like this for this type of problems:
len = max(max(starts),max(ends))+1;
find(cumsum(accumarray(starts’,1,[len,1])-accumarray(ends’+1,1,[len,1])))
Lucio
A problem with the given solutions is that they do not handle the ‘end’ operator — a strange operator that contextually knows what the appropriate limits are for each dimension.
http://groups.google.ca/group/comp.soft-sys.matlab/browse_thread/thread/c1ef54d585340aea/00766f55e26eaf6c
Doug Schwarz wrote:
> b = a(:,setdiff(1:end,1));
which unexpectedly works, and I explored why this is unexpected (and hence why no user-level routine can
duplicate the meaning of ‘end’)
Walter-
You can use “end” in some expressions, IF the expressions are used for indexing into an array as Doug’s expression does. In my tests up above, you can’t use “end” because I am simply generating the sequence without indexing. In addition, Doug’s works because the value of “end” in that context is possible for MATLAB to ascertain. I think many, perhaps all, of the other solutions don’t have the indexing context explicit enough at the call site for MATLAB to discern the meaning of “end”.
–Loren
–Loren