{"id":153,"date":"2008-09-08T09:01:17","date_gmt":"2008-09-08T14:01:17","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/2008\/09\/08\/finding-patterns-in-arrays\/"},"modified":"2008-09-08T09:04:26","modified_gmt":"2008-09-08T14:04:26","slug":"finding-patterns-in-arrays","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2008\/09\/08\/finding-patterns-in-arrays\/","title":{"rendered":"Finding Patterns in Arrays"},"content":{"rendered":"<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n   <introduction>\r\n      <p>Recently, my colleague <a title=\"http:\/\/jeffmatherphotography.com\/dispatches\/ (link no longer works)\">Jeff<\/a> asked me if I would look at some code he wrote to find a pattern of numbers in a larger array.  Without looking at his code,\r\n         I asked if he had tried using <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/strfind.html\"><tt>strfind<\/tt><\/a>, despite his data not being strings.  He found that it solved his problem and was faster than his M-file.  In the meantime,\r\n         I wanted to see what it took for me to write a simple algorithm I was thinking of in an M-file. I show and discuss the results\r\n         here.\r\n      <\/p>\r\n   <\/introduction>\r\n   <h3>Contents<\/h3>\r\n   <div>\r\n      <ul>\r\n         <li><a href=\"#1\">Simple Test Data<\/a><\/li>\r\n         <li><a href=\"#2\">First Algorithm : findpattern<\/a><\/li>\r\n         <li><a href=\"#5\">My Homebrew Algorithm : findPattern2<\/a><\/li>\r\n         <li><a href=\"#7\">Using strfind<\/a><\/li>\r\n         <li><a href=\"#8\">Use Case and Performance<\/a><\/li>\r\n         <li><a href=\"#10\">Puzzle and Next Steps<\/a><\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <h3>Simple Test Data<a name=\"1\"><\/a><\/h3>\r\n   <p>Let me start off with really simple test data to be sure all algorithms are getting the right answers.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">a = [0 1 4 9 16 4 9];\r\nb = double(<span style=\"color: #A020F0\">'The year is 2003.'<\/span>);<\/pre><h3>First Algorithm : findpattern<a name=\"2\"><\/a><\/h3>\r\n   <p>Here's the first <tt>findpattern<\/tt> algorithm.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">type <span style=\"color: #A020F0\">findpattern<\/span><\/pre><pre style=\"font-style:oblique\">\r\nfunction idx = findpattern(in_array, pattern)\r\n%FINDPATTERN  Find a pattern in an array.\r\n%\r\n%    K = FINDPATTERN(ARRAY, PATTERN) returns the starting indices\r\n%    of any occurences of the PATTERN in the ARRAY.  ARRAY and PATTERN\r\n%    can be any mixture of character and numeric types.\r\n%\r\n%    Examples:\r\n%        a = [0 1 4 9 16 4 9];\r\n%        b = double('The year is 2003.');\r\n%        findpattern(a, [4 9])         returns [3 6]\r\n%        findpattern(a, [9 4])         returns []\r\n%        findpattern(b, '2003')        returns 13\r\n%        findpattern(b, uint8('2003')) returns 13\r\n%\r\n%    See also STRFIND, STRCMP, STRNCMP, STRMATCH.\r\n\r\n% Algorithm:\r\n% Find all of the occurrences of each number of the pattern.\r\n%\r\n% For an n element pattern, the result is an n element cell array.  The\r\n% i-th cell contains the positions in the input array that match the i-th\r\n% element of the pattern.\r\n%\r\n% When the pattern exists in the input stream, there will be a sequence\r\n% of consecutive integers in consecutive cells.\r\n\r\n% As currently implemented, this routine has poor performance for patterns\r\n% with more than half a dozen elements where the first element in the\r\n% pattern matches many positions in the array.\r\n\r\nlocations = cell(1, numel(pattern));\r\nfor p = 1:(numel(pattern))\r\n    locations{p} = find(in_array == pattern(p));\r\nend\r\n\r\n% Find instances of the pattern in the array.\r\nidx = [];\r\nfor p = 1:numel(locations{1})\r\n    \r\n    % Look for a consecutive progression of locations.\r\n    start_value = locations{1}(p);\r\n    for q = 2:numel(locations)\r\n        \r\n        found = true;\r\n        \r\n        if (~any((start_value + q - 1) == locations{q}))\r\n            found = false;\r\n            break;\r\n        end\r\n        \r\n    end\r\n    \r\n    if (found)\r\n        idx(end + 1) = locations{1}(p);\r\n    end\r\n    \r\nend\r\n<\/pre><p>You'll notice that Jeff chooses to store derived information on the pattern being present in a cell array, and then looks\r\n      for consecutive locations.\r\n   <\/p>\r\n   <p>Here are some results using <tt>findpattern<\/tt>.  First I set <tt>f<\/tt> to be a function handle to the function in question.  Then I can reuse the same code for the other cases simply by redefining\r\n      the function.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">f = @findpattern\r\nt(4) = false;\r\nt(1) = isequal(f(a, [4 9]), [3 6]);\r\nt(2) = isempty(f(a, [9 4]));\r\nt(3) = isequal(f(b, <span style=\"color: #A020F0\">'2003'<\/span>),13);\r\nt(4) = isequal(f(b, uint8(<span style=\"color: #A020F0\">'2003'<\/span>)),13);\r\nAOK = all(t==true)<\/pre><pre style=\"font-style:oblique\">f = \r\n    @findpattern\r\nAOK =\r\n     1\r\n<\/pre><h3>My Homebrew Algorithm : findPattern2<a name=\"5\"><\/a><\/h3>\r\n   <p>Here's my own algorithm.  The idea here is to find possible <tt>pattern<\/tt> locations first, and winnow them out, marching through the <tt>pattern<\/tt>, which I am assuming is generally smaller, and often a lot smaller, than the data.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">type <span style=\"color: #A020F0\">findPattern2<\/span><\/pre><pre style=\"font-style:oblique\">\r\nfunction start = findPattern2(array, pattern)\r\n%findPattern2 Locate a pattern in an array.\r\n%\r\n%   indices = findPattern2(array, pattern) finds the starting indices of\r\n%   pattern within array.\r\n%\r\n%   Example:\r\n%   a = [0 1 4 9 16 4 9];\r\n%   patt = [4 9];\r\n%   indices = findPattern2(a,patt)\r\n%   indices =\r\n%        3     6\r\n\r\n% Let's assume for now that both the pattern and the array are non-empty\r\n% VECTORS, but there's no checking for this. \r\n% For this algorithm, I loop over the pattern elements.\r\nlen = length(pattern);\r\n% First, find candidate locations; i.e., match the first element in the\r\n% pattern.\r\nstart = find(array==pattern(1));\r\n% Next remove start values that are too close to the end to possibly match\r\n% the pattern.\r\nendVals = start+len-1;\r\nstart(endVals&gt;length(array)) = [];\r\n% Next, loop over elements of pattern, usually much shorter than length of\r\n% array, to check which possible locations are valid still.\r\nfor pattval = 2:len\r\n    % check viable locations in array\r\n    locs = pattern(pattval) == array(start+pattval-1);\r\n    % delete false ones from indices\r\n    start(~locs) = [];\r\nend\r\n<\/pre><p>Get results and time it.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">f = @findPattern2\r\nt(1) = isequal(f(a, [4 9]), [3 6]);\r\nt(2) = isempty(f(a, [9 4]));\r\nt(3) = isequal(f(b, <span style=\"color: #A020F0\">'2003'<\/span>),13);\r\nt(4) = isequal(f(b, uint8(<span style=\"color: #A020F0\">'2003'<\/span>)),13);\r\nAOK = all(t==true)<\/pre><pre style=\"font-style:oblique\">f = \r\n    @findPattern2\r\nAOK =\r\n     1\r\n<\/pre><h3>Using strfind<a name=\"7\"><\/a><\/h3>\r\n   <p>Next I test using the same data with <tt>strfind<\/tt>.  Despite its name, <tt>strfind<\/tt> can happily handle non-character datatypes, and particularly doubles and integers.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">f = @strfind\r\nt(1) = isequal(f(a, [4 9]), [3 6]);\r\nt(2) = isempty(f(a, [9 4]));\r\nt(3) = isequal(f(b, <span style=\"color: #A020F0\">'2003'<\/span>),13);\r\nt(4) = isequal(f(b, uint8(<span style=\"color: #A020F0\">'2003'<\/span>)),13);\r\nAOK = all(t==true)<\/pre><pre style=\"font-style:oblique\">f = \r\n    @strfind\r\nAOK =\r\n     1\r\n<\/pre><h3>Use Case and Performance<a name=\"8\"><\/a><\/h3>\r\n   <p>Jeff described the problem he was solving in more detail.  He has a file with multiple images in it, with data stored as <tt>uint8<\/tt>.  The images are separated by a particular bit pattern.  Let me show you one of the images in the sequence, after processing\r\n      and extracting the frames.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">load <span style=\"color: #A020F0\">forloren<\/span>\r\nimage(X(:,:,:,17)), axis <span style=\"color: #A020F0\">off<\/span>\r\nwhos <span style=\"color: #A020F0\">X<\/span><\/pre><pre style=\"font-style:oblique\">  Name      Size               Bytes  Class    Attributes\r\n\r\n  X         4-D             26726400  uint8              \r\n\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/153\/blogFindPattern_01.png\"> <p>Now let me show and time finding the pattern in the raw data.  The data contain 29 images.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">clear\r\nload <span style=\"color: #A020F0\">imrawdata<\/span>\r\nwhos\r\npattern = [254 255 0 224];\r\nf = @()findpattern(rawdata, pattern);\r\ntfind = timeit(f);\r\nf = @()findPattern2(rawdata, pattern);\r\ntfind(2) = timeit(f);\r\nf = @()strfind(rawdata, pattern);\r\ntfind(3) = timeit(f)<\/pre><pre style=\"font-style:oblique\">  Name         Size                   Bytes  Class    Attributes\r\n\r\n  rawdata      1x1259716            1259716  uint8              \r\n\r\ntfind =\r\n      0.80941     0.011273     0.019194\r\n<\/pre><h3>Puzzle and Next Steps<a name=\"10\"><\/a><\/h3>\r\n   <p>In the case of the larger dataset, <tt>strfind<\/tt> is not the fastest algorithm, though I found with much smaller data, <tt>strfind<\/tt> outperformed <tt>findPattern2<\/tt>.  Some possible reasons why <tt>findPattern2<\/tt> is the fastest of the three algorithms:\r\n   <\/p>\r\n   <li>it is not as general purpose and does no error checking,<\/li>\r\n   <li>it was only written to work on vectors,<\/li>\r\n   <li>it does nothing to handle cases where |NaN|s might be involved.<\/li>\r\n   <p>If I find out why, I will let you know.  In the meantime, if you have any thoughts to add, please comment <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=153#respond\">here<\/a>.\r\n   <\/p><script language=\"JavaScript\">\r\n<!--\r\n\r\n    function grabCode_0bea633db9034cefa598cbe2dfbe163e() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='0bea633db9034cefa598cbe2dfbe163e ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 0bea633db9034cefa598cbe2dfbe163e';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = 'Loren Shure';\r\n        copyright = 'Copyright 2008 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }   \r\n      \r\n-->\r\n<\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_0bea633db9034cefa598cbe2dfbe163e()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code \r\n            <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; 7.6<br><\/p>\r\n<\/div>\r\n<!--\r\n0bea633db9034cefa598cbe2dfbe163e ##### SOURCE BEGIN #####\r\n%% Finding Patterns in Arrays\r\n% Recently, my colleague <http:\/\/jeffmatherphotography.com\/dispatches\/ Jeff> \r\n% asked me if I would look at some code he wrote to find a pattern of\r\n% numbers in a larger array.  Without looking at his code, I asked if he\r\n% had tried using\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/strfind.html |strfind|>,\r\n% despite his data not being strings.  He found that it solved his problem\r\n% and was faster than his M-file.  In the meantime, I wanted to see what it\r\n% took for me to write a simple algorithm I was thinking of in an M-file.\r\n% I show and discuss the results here.\r\n%% Simple Test Data\r\n% Let me start off with really simple test data to be sure all algorithms\r\n% are getting the right answers.\r\na = [0 1 4 9 16 4 9];\r\nb = double('The year is 2003.');\r\n\r\n%% First Algorithm : findpattern \r\n% Here's the first |findpattern| algorithm.\r\ntype findpattern\r\n%%\r\n% You'll notice that Jeff chooses to store derived information on the\r\n% pattern being present in a cell array, and then looks for consecutive\r\n% locations.\r\n%%\r\n% Here are some results using |findpattern|.  First I set |f| to be a\r\n% function handle to the function in question.  Then I can reuse the same\r\n% code for the other cases simply by redefining the function.\r\nf = @findpattern\r\nt(4) = false;\r\nt(1) = isequal(f(a, [4 9]), [3 6]);\r\nt(2) = isempty(f(a, [9 4]));\r\nt(3) = isequal(f(b, '2003'),13);\r\nt(4) = isequal(f(b, uint8('2003')),13);\r\nAOK = all(t==true)\r\n%% My Homebrew Algorithm : findPattern2\r\n% Here's my own algorithm.  The idea here is to find possible |pattern|\r\n% locations first, and winnow them out, marching through the |pattern|,\r\n% which I am assuming is generally smaller, and often a lot smaller, than\r\n% the data.\r\ntype findPattern2\r\n%%\r\n% Get results and time it.\r\nf = @findPattern2\r\nt(1) = isequal(f(a, [4 9]), [3 6]);\r\nt(2) = isempty(f(a, [9 4]));\r\nt(3) = isequal(f(b, '2003'),13);\r\nt(4) = isequal(f(b, uint8('2003')),13);\r\nAOK = all(t==true)\r\n%% Using strfind\r\n% Next I test using the same data with |strfind|.  Despite its name,\r\n% |strfind| can happily handle non-character datatypes, and particularly\r\n% doubles and integers.\r\nf = @strfind\r\nt(1) = isequal(f(a, [4 9]), [3 6]);\r\nt(2) = isempty(f(a, [9 4]));\r\nt(3) = isequal(f(b, '2003'),13);\r\nt(4) = isequal(f(b, uint8('2003')),13);\r\nAOK = all(t==true)\r\n%% Use Case and Performance\r\n% Jeff described the problem he was solving in more detail.  He has a file\r\n% with multiple images in it, with data stored as |uint8|.  The images are\r\n% separated by a particular bit pattern.  Let me show you one of the images\r\n% in the sequence, after processing and extracting the frames.\r\nload forloren\r\nimage(X(:,:,:,17)), axis off\r\nwhos X\r\n%%\r\n% Now let me show and time finding the pattern in the raw data.  The data\r\n% contain 29 images.\r\nclear\r\nload imrawdata\r\nwhos\r\npattern = [254 255 0 224];\r\nf = @()findpattern(rawdata, pattern);\r\ntfind = timeit(f);\r\nf = @()findPattern2(rawdata, pattern);\r\ntfind(2) = timeit(f);\r\nf = @()strfind(rawdata, pattern);\r\ntfind(3) = timeit(f)\r\n%% Puzzle and Next Steps\r\n% In the case of the larger dataset, |strfind| is not the fastest\r\n% algorithm, though I found with much smaller data, |strfind| outperformed\r\n% |findPattern2|.  Some possible reasons why |findPattern2| is the fastest\r\n% of the three algorithms:\r\n%\r\n% # it is not as general purpose and does no error checking,\r\n% # it was only written to work on vectors,\r\n% # it does nothing to handle cases where |NaN|s might be involved.\r\n%\r\n% If I find out why, I will let you know.  In the meantime, if you have any\r\n% thoughts to add, please comment\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=153#respond here>.\r\n##### SOURCE END ##### 0bea633db9034cefa598cbe2dfbe163e\r\n-->","protected":false},"excerpt":{"rendered":"<p>\r\n   \r\n      Recently, my colleague Jeff asked me if I would look at some code he wrote to find a pattern of numbers in a larger array.  Without looking at his code,\r\n         I... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2008\/09\/08\/finding-patterns-in-arrays\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[10,15],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/153"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=153"}],"version-history":[{"count":0,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/153\/revisions"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=153"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=153"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=153"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}