Finding Patterns in Arrays
Recently, my colleague Jeff asked me if I would look at some code he wrote to find a pattern of numbers in a larger array. Without looking at his code, I asked if he had tried using strfind, despite his data not being strings. He found that it solved his problem and was faster than his M-file. In the meantime, I wanted to see what it took for me to write a simple algorithm I was thinking of in an M-file. I show and discuss the results here.
Contents
Simple Test Data
Let me start off with really simple test data to be sure all algorithms are getting the right answers.
a = [0 1 4 9 16 4 9];
b = double('The year is 2003.');First Algorithm : findpattern
Here's the first findpattern algorithm.
type findpattern
function idx = findpattern(in_array, pattern)
%FINDPATTERN Find a pattern in an array.
%
% K = FINDPATTERN(ARRAY, PATTERN) returns the starting indices
% of any occurences of the PATTERN in the ARRAY. ARRAY and PATTERN
% can be any mixture of character and numeric types.
%
% Examples:
% a = [0 1 4 9 16 4 9];
% b = double('The year is 2003.');
% findpattern(a, [4 9]) returns [3 6]
% findpattern(a, [9 4]) returns []
% findpattern(b, '2003') returns 13
% findpattern(b, uint8('2003')) returns 13
%
% See also STRFIND, STRCMP, STRNCMP, STRMATCH.
% Algorithm:
% Find all of the occurrences of each number of the pattern.
%
% For an n element pattern, the result is an n element cell array. The
% i-th cell contains the positions in the input array that match the i-th
% element of the pattern.
%
% When the pattern exists in the input stream, there will be a sequence
% of consecutive integers in consecutive cells.
% As currently implemented, this routine has poor performance for patterns
% with more than half a dozen elements where the first element in the
% pattern matches many positions in the array.
locations = cell(1, numel(pattern));
for p = 1:(numel(pattern))
locations{p} = find(in_array == pattern(p));
end
% Find instances of the pattern in the array.
idx = [];
for p = 1:numel(locations{1})
% Look for a consecutive progression of locations.
start_value = locations{1}(p);
for q = 2:numel(locations)
found = true;
if (~any((start_value + q - 1) == locations{q}))
found = false;
break;
end
end
if (found)
idx(end + 1) = locations{1}(p);
end
end
You'll notice that Jeff chooses to store derived information on the pattern being present in a cell array, and then looks for consecutive locations.
Here are some results using findpattern. First I set f to be a function handle to the function in question. Then I can reuse the same code for the other cases simply by redefining the function.
f = @findpattern t(4) = false; t(1) = isequal(f(a, [4 9]), [3 6]); t(2) = isempty(f(a, [9 4])); t(3) = isequal(f(b, '2003'),13); t(4) = isequal(f(b, uint8('2003')),13); AOK = all(t==true)
f =
@findpattern
AOK =
1
My Homebrew Algorithm : findPattern2
Here's my own algorithm. The idea here is to find possible pattern locations first, and winnow them out, marching through the pattern, which I am assuming is generally smaller, and often a lot smaller, than the data.
type findPattern2
function start = findPattern2(array, pattern)
%findPattern2 Locate a pattern in an array.
%
% indices = findPattern2(array, pattern) finds the starting indices of
% pattern within array.
%
% Example:
% a = [0 1 4 9 16 4 9];
% patt = [4 9];
% indices = findPattern2(a,patt)
% indices =
% 3 6
% Let's assume for now that both the pattern and the array are non-empty
% VECTORS, but there's no checking for this.
% For this algorithm, I loop over the pattern elements.
len = length(pattern);
% First, find candidate locations; i.e., match the first element in the
% pattern.
start = find(array==pattern(1));
% Next remove start values that are too close to the end to possibly match
% the pattern.
endVals = start+len-1;
start(endVals>length(array)) = [];
% Next, loop over elements of pattern, usually much shorter than length of
% array, to check which possible locations are valid still.
for pattval = 2:len
% check viable locations in array
locs = pattern(pattval) == array(start+pattval-1);
% delete false ones from indices
start(~locs) = [];
end
Get results and time it.
f = @findPattern2 t(1) = isequal(f(a, [4 9]), [3 6]); t(2) = isempty(f(a, [9 4])); t(3) = isequal(f(b, '2003'),13); t(4) = isequal(f(b, uint8('2003')),13); AOK = all(t==true)
f =
@findPattern2
AOK =
1
Using strfind
Next I test using the same data with strfind. Despite its name, strfind can happily handle non-character datatypes, and particularly doubles and integers.
f = @strfind t(1) = isequal(f(a, [4 9]), [3 6]); t(2) = isempty(f(a, [9 4])); t(3) = isequal(f(b, '2003'),13); t(4) = isequal(f(b, uint8('2003')),13); AOK = all(t==true)
f =
@strfind
AOK =
1
Use Case and Performance
Jeff described the problem he was solving in more detail. He has a file with multiple images in it, with data stored as uint8. The images are separated by a particular bit pattern. Let me show you one of the images in the sequence, after processing and extracting the frames.
load forloren image(X(:,:,:,17)), axis off whos X
Name Size Bytes Class Attributes X 4-D 26726400 uint8
Now let me show and time finding the pattern in the raw data. The data contain 29 images.
clear
load imrawdata
whos
pattern = [254 255 0 224];
f = @()findpattern(rawdata, pattern);
tfind = timeit(f);
f = @()findPattern2(rawdata, pattern);
tfind(2) = timeit(f);
f = @()strfind(rawdata, pattern);
tfind(3) = timeit(f) Name Size Bytes Class Attributes
rawdata 1x1259716 1259716 uint8
tfind =
0.80941 0.011273 0.019194
Puzzle and Next Steps
In the case of the larger dataset, strfind is not the fastest algorithm, though I found with much smaller data, strfind outperformed findPattern2. Some possible reasons why findPattern2 is the fastest of the three algorithms:
If I find out why, I will let you know. In the meantime, if you have any thoughts to add, please comment here.
- Category:
- Efficiency,
- Less Used Functionality


Comments
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.