{"id":5816,"date":"2015-01-30T09:00:41","date_gmt":"2015-01-30T14:00:41","guid":{"rendered":"https:\/\/blogs.mathworks.com\/pick\/?p=5816"},"modified":"2015-01-29T00:03:41","modified_gmt":"2015-01-29T05:03:41","slug":"fast-programmatic-string-search-in-matlab-files","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/pick\/2015\/01\/30\/fast-programmatic-string-search-in-matlab-files\/","title":{"rendered":"Fast, programmatic string search in MATLAB files"},"content":{"rendered":"<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n   <p><a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/authors\/15007\">Jiro<\/a>'s pick this week is <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/48065-fast--programmatic-string-searching-in-directories-of-matlab-code-files\"><tt>findInM<\/tt><\/a> by our very own <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/845693\">Brett Shoelson<\/a>.\r\n   <\/p>\r\n   <p>If you know Brett, which you probably do if you've spent any time on MATLAB Central, you know about all the useful <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/845693?detail=fileexchange\">File Exchange entries<\/a> he has contributed. It's no surprise he's ranked number 8. Aside from many of the entries related to image processing, he\r\n      has uploaded a number of utility functions, and <tt>findInM<\/tt> is a must-have. The title of the File Exchange entry starts with \"FAST, PROGRAMMATIC string searching...\", and that's exactly\r\n      what it is. It searches for a string of text in MATLAB files, but it's fast and it's programmatic. There are already a number\r\n      of entries in File Exchange that searches for a text within files, including <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/813-mfilegrep\"><tt>mfilegrep<\/tt><\/a>, <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/9594-mgrep\"><tt>mgrep<\/tt><\/a>, and <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/9647-grep--a-pedestrian--very-fast-grep-utility\"><tt>grep<\/tt><\/a>. There is also an <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/matlab_env\/finding-files-and-folders.html\">interactive way<\/a> of searching from the toolstrip.\r\n   <\/p>\r\n   <p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/jiro\/potw_findinm\/potw_findinm_screen.png\"> <\/p>\r\n   <p>From the description of Brett's entry, <tt>findInM<\/tt> \"can be much faster than any other method [he's] seen.\" The way Brett accomplishes this efficient search is by first creating\r\n      an index of the MATLAB files in the folders. This step takes some time, but afterwards, the search happens on the index file\r\n      and is very efficient.\r\n   <\/p>\r\n   <p>In this example, I first created an index of my \/toolbox folder (and its subfolders) of my MATLAB installation. Then searching\r\n      for some text from over 20000 files took less than 10 seconds.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">tic\r\ns = findInM(<span style=\"color: #A020F0\">'graph theory'<\/span>,<span style=\"color: #A020F0\">'toolbox'<\/span>)\r\ntoc<\/pre><pre style=\"font-style:oblique\">\r\nSORTED BY DATE, MOST RECENT ON TOP:\r\n\r\ns = \r\n    'C:\\Program Files\\MATLAB\\R2014b\\toolbox\\bioinfo\\bioinfo\\Contents.m'\r\n    'C:\\Program Files\\MATLAB\\R2014b\\toolbox\\bioinfo\\biodemos\\graphtheorydemo.m'\r\n    'C:\\Program Files\\MATLAB\\R2014b\\toolbox\\matlab\\sparfun\\Contents.m'\r\n    'C:\\Program Files\\MATLAB\\R2014b\\toolbox\\matlab\\sparfun\\gplot.m'\r\n    'C:\\Program Files\\MATLAB\\R2014b\\toolbox\\bioinfo\\graphtheory\\Contents.m'\r\nElapsed time is 8.334972 seconds.\r\n<\/pre><p>As a comparison, the interactive \"Find Files\" tool in MATLAB took over 5 minutes to do the same search.<\/p>\r\n   <p>Thanks for this great tool, Brett! I do have a couple of suggestions for improvement.<\/p>\r\n   <div>\r\n      <ul>\r\n         <li>Every 7 days, the function prompts the user to see if he\/she wants to re-generate the index file. Perhaps this could be somewhat\r\n            automated if the indexing process captured the state of the files (file sizes, modified dates). It could automatically recommend\r\n            re-generating the index if it notices a change in the state.\r\n         <\/li>\r\n         <li>The index file is a DOC file. It's easy to open\/edit a DOC file. It might be better to use a non-standard extension, so that\r\n            it can't be accidentally opened and is easily distinguished from a regular DOC file. For example, in Windows, some folders\r\n            with images have a thumbnail database called \"Thumbs.db\". Perhaps <tt>findInM<\/tt> can create a file called \"mIndex.mi\" or something like that.\r\n         <\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <p><b>Comments<\/b><\/p>\r\n   <p>Give a try and let us know what you think <a href=\"https:\/\/blogs.mathworks.com\/pick\/?p=5816#respond\">here<\/a> or leave a <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/48065-fast--programmatic-string-searching-in-directories-of-matlab-code-files#comments\">comment<\/a> for Brett.\r\n   <\/p><script language=\"JavaScript\">\r\n<!--\r\n\r\n    function grabCode_9a1d2af7e3f9462293bfd7a4d871a14e() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='9a1d2af7e3f9462293bfd7a4d871a14e ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 9a1d2af7e3f9462293bfd7a4d871a14e';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = '';\r\n        copyright = 'Copyright 2015 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }   \r\n      \r\n-->\r\n<\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_9a1d2af7e3f9462293bfd7a4d871a14e()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code \r\n            <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2014b<br><\/p>\r\n<\/div>\r\n<!--\r\n9a1d2af7e3f9462293bfd7a4d871a14e ##### SOURCE BEGIN #####\r\n%%\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/authors\/15007\r\n% Jiro>'s pick this week is\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/48065-fast--programmatic-string-searching-in-directories-of-matlab-code-files |findInM|> by\r\n% our very own\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/845693 Brett\r\n% Shoelson>.\r\n%\r\n% If you know Brett, which you probably do if you've spent any time on\r\n% MATLAB Central, you know about all the useful\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/845693?detail=fileexchange\r\n% File Exchange entries> he has contributed. It's no surprise he's ranked\r\n% number 8. Aside from many of the entries related to image processing, he\r\n% has uploaded a number of utility functions, and |findInM| is a must-have.\r\n% The title of the File Exchange entry starts with \"FAST, PROGRAMMATIC\r\n% string searching...\", and that's exactly what it is. It searches for a\r\n% string of text in MATLAB files, but it's fast and it's programmatic.\r\n% There are already a number of entries in File Exchange that searches for\r\n% a text within files, including\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/813-mfilegrep |mfilegrep|>,\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/9594-mgrep |mgrep|>, and\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/9647-grep--a-pedestrian--very-fast-grep-utility |grep|>. There\r\n% is also an\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/matlab_env\/finding-files-and-folders.html\r\n% interactive way> of searching from the toolstrip.\r\n%\r\n% <<potw_findinm_screen.png>>\r\n%\r\n% From the description of Brett's entry, |findInM| \"can be much faster than\r\n% any other method [he's] seen.\" The way Brett accomplishes this efficient\r\n% search is by first creating an index of the MATLAB files in the folders.\r\n% This step takes some time, but afterwards, the search happens on the\r\n% index file and is very efficient.\r\n%\r\n% In this example, I first created an index of my \/toolbox folder (and its\r\n% subfolders) of my MATLAB installation. Then searching for some text from\r\n% over 20000 files took less than 10 seconds.\r\n\r\ntic\r\ns = findInM('graph theory','toolbox')\r\ntoc\r\n\r\n%%\r\n% As a comparison, the interactive \"Find Files\" tool in MATLAB took over 5\r\n% minutes to do the same search. \r\n%\r\n% Thanks for this great tool, Brett! I do have a couple of suggestions for\r\n% improvement.\r\n%\r\n% * Every 7 days, the function prompts the user to see if he\/she wants to\r\n% re-generate the index file. Perhaps this could be somewhat automated if\r\n% the indexing process captured the state of the files (file sizes,\r\n% modified dates). It could automatically recommend re-generating the index\r\n% if it notices a change in the state.\r\n% * The index file is a DOC file. It's easy to open\/edit a DOC file. It\r\n% might be better to use a non-standard extension, so that it can't be\r\n% accidentally opened and is easily distinguished from a regular DOC file.\r\n% For example, in Windows, some folders with images have a thumbnail\r\n% database called \"Thumbs.db\". Perhaps |findInM| can create a file called\r\n% \"mIndex.mi\" or something like that.\r\n%\r\n% *Comments*\r\n%\r\n% Give a try and let us know what you think\r\n% <https:\/\/blogs.mathworks.com\/pick\/?p=5816#respond here> or leave a\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/48065-fast--programmatic-string-searching-in-directories-of-matlab-code-files#comments\r\n% comment> for Brett.\r\n\r\n##### SOURCE END ##### 9a1d2af7e3f9462293bfd7a4d871a14e\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/jiro\/potw_findinm\/potw_findinm_screen.png\" onError=\"this.style.display ='none';\" \/><\/div><p>\r\n   Jiro's pick this week is findInM by our very own Brett Shoelson.\r\n   \r\n   If you know Brett, which you probably do if you've spent any time on MATLAB Central, you know about all the useful File... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/pick\/2015\/01\/30\/fast-programmatic-string-search-in-matlab-files\/\">read more >><\/a><\/p>","protected":false},"author":35,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[16],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/5816"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/users\/35"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/comments?post=5816"}],"version-history":[{"count":6,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/5816\/revisions"}],"predecessor-version":[{"id":5823,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/5816\/revisions\/5823"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/media?parent=5816"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/categories?post=5816"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/tags?post=5816"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}