# Finding Patterns in Arrays

Recently, my colleague Jeff asked me if I would look at some code he wrote to find a pattern of numbers in a larger array. Without looking at his code,
I asked if he had tried using `strfind`, despite his data not being strings. He found that it solved his problem and was faster than his M-file. In the meantime,
I wanted to see what it took for me to write a simple algorithm I was thinking of in an M-file. I show and discuss the results
here.

### Contents

### Simple Test Data

Let me start off with really simple test data to be sure all algorithms are getting the right answers.

```
a = [0 1 4 9 16 4 9];
b = double('The year is 2003.');
```

### First Algorithm : findpattern

Here's the first `findpattern` algorithm.

`type findpattern`

function idx = findpattern(in_array, pattern) %FINDPATTERN Find a pattern in an array. % % K = FINDPATTERN(ARRAY, PATTERN) returns the starting indices % of any occurences of the PATTERN in the ARRAY. ARRAY and PATTERN % can be any mixture of character and numeric types. % % Examples: % a = [0 1 4 9 16 4 9]; % b = double('The year is 2003.'); % findpattern(a, [4 9]) returns [3 6] % findpattern(a, [9 4]) returns [] % findpattern(b, '2003') returns 13 % findpattern(b, uint8('2003')) returns 13 % % See also STRFIND, STRCMP, STRNCMP, STRMATCH. % Algorithm: % Find all of the occurrences of each number of the pattern. % % For an n element pattern, the result is an n element cell array. The % i-th cell contains the positions in the input array that match the i-th % element of the pattern. % % When the pattern exists in the input stream, there will be a sequence % of consecutive integers in consecutive cells. % As currently implemented, this routine has poor performance for patterns % with more than half a dozen elements where the first element in the % pattern matches many positions in the array. locations = cell(1, numel(pattern)); for p = 1:(numel(pattern)) locations{p} = find(in_array == pattern(p)); end % Find instances of the pattern in the array. idx = []; for p = 1:numel(locations{1}) % Look for a consecutive progression of locations. start_value = locations{1}(p); for q = 2:numel(locations) found = true; if (~any((start_value + q - 1) == locations{q})) found = false; break; end end if (found) idx(end + 1) = locations{1}(p); end end

You'll notice that Jeff chooses to store derived information on the pattern being present in a cell array, and then looks for consecutive locations.

Here are some results using `findpattern`. First I set `f` to be a function handle to the function in question. Then I can reuse the same code for the other cases simply by redefining
the function.

f = @findpattern t(4) = false; t(1) = isequal(f(a, [4 9]), [3 6]); t(2) = isempty(f(a, [9 4])); t(3) = isequal(f(b, '2003'),13); t(4) = isequal(f(b, uint8('2003')),13); AOK = all(t==true)

f = @findpattern AOK = 1

### My Homebrew Algorithm : findPattern2

Here's my own algorithm. The idea here is to find possible `pattern` locations first, and winnow them out, marching through the `pattern`, which I am assuming is generally smaller, and often a lot smaller, than the data.

`type findPattern2`

function start = findPattern2(array, pattern) %findPattern2 Locate a pattern in an array. % % indices = findPattern2(array, pattern) finds the starting indices of % pattern within array. % % Example: % a = [0 1 4 9 16 4 9]; % patt = [4 9]; % indices = findPattern2(a,patt) % indices = % 3 6 % Let's assume for now that both the pattern and the array are non-empty % VECTORS, but there's no checking for this. % For this algorithm, I loop over the pattern elements. len = length(pattern); % First, find candidate locations; i.e., match the first element in the % pattern. start = find(array==pattern(1)); % Next remove start values that are too close to the end to possibly match % the pattern. endVals = start+len-1; start(endVals>length(array)) = []; % Next, loop over elements of pattern, usually much shorter than length of % array, to check which possible locations are valid still. for pattval = 2:len % check viable locations in array locs = pattern(pattval) == array(start+pattval-1); % delete false ones from indices start(~locs) = []; end

Get results and time it.

f = @findPattern2 t(1) = isequal(f(a, [4 9]), [3 6]); t(2) = isempty(f(a, [9 4])); t(3) = isequal(f(b, '2003'),13); t(4) = isequal(f(b, uint8('2003')),13); AOK = all(t==true)

f = @findPattern2 AOK = 1

### Using strfind

Next I test using the same data with `strfind`. Despite its name, `strfind` can happily handle non-character datatypes, and particularly doubles and integers.

f = @strfind t(1) = isequal(f(a, [4 9]), [3 6]); t(2) = isempty(f(a, [9 4])); t(3) = isequal(f(b, '2003'),13); t(4) = isequal(f(b, uint8('2003')),13); AOK = all(t==true)

f = @strfind AOK = 1

### Use Case and Performance

Jeff described the problem he was solving in more detail. He has a file with multiple images in it, with data stored as `uint8`. The images are separated by a particular bit pattern. Let me show you one of the images in the sequence, after processing
and extracting the frames.

load forloren image(X(:,:,:,17)), axis off whos X

Name Size Bytes Class Attributes X 4-D 26726400 uint8

Now let me show and time finding the pattern in the raw data. The data contain 29 images.

```
clear
load imrawdata
whos
pattern = [254 255 0 224];
f = @()findpattern(rawdata, pattern);
tfind = timeit(f);
f = @()findPattern2(rawdata, pattern);
tfind(2) = timeit(f);
f = @()strfind(rawdata, pattern);
tfind(3) = timeit(f)
```

Name Size Bytes Class Attributes rawdata 1x1259716 1259716 uint8 tfind = 0.80941 0.011273 0.019194

### Puzzle and Next Steps

In the case of the larger dataset, `strfind` is not the fastest algorithm, though I found with much smaller data, `strfind` outperformed `findPattern2`. Some possible reasons why `findPattern2` is the fastest of the three algorithms:

If I find out why, I will let you know. In the meantime, if you have any thoughts to add, please comment here.

**Category:**- Efficiency,
- Less Used Functionality