{"id":11803,"date":"2020-10-15T18:14:19","date_gmt":"2020-10-15T22:14:19","guid":{"rendered":"https:\/\/blogs.mathworks.com\/pick\/?p=11803"},"modified":"2020-10-23T12:42:24","modified_gmt":"2020-10-23T16:42:24","slug":"r2020b-pattern-new-way-to-regular-express","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/pick\/2020\/10\/15\/r2020b-pattern-new-way-to-regular-express\/","title":{"rendered":"R2020b: pattern (new way to regular express)"},"content":{"rendered":"<div class=\"content\"><p><a href=\"http:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/869871\">Jiro<\/a>'s Pick this week is the new <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/pattern.html\">pattern matching<\/a> capabilities that were added in the newest release, R2020b.<\/p><p>I've always had a love-hate relationship with <a href=\"https:\/\/en.wikipedia.org\/wiki\/Regular_expression\">regular expressions<\/a>. It's a powerful technique for string searching. It's powerful, but it's quite complicated as well. The fact that there are books on regular expressions goes to show that it is not a trivial technique to completely master. We have Picked a couple of entries related to regular expressions in the past, such as <a href=\"https:\/\/blogs.mathworks.com\/pick\/2009\/11\/20\/regular-expressionsa-cheat-sheet\/\">a regular express cheat sheet<\/a> and <a href=\"https:\/\/blogs.mathworks.com\/pick\/2013\/09\/27\/regexp-builder\/\">a regular expression builder app<\/a>.<\/p><p>With R2020b, there is a whole new way of searching and modifying text. It involves a much simpler method of building pattern expressions using simple functions.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/jiro\/potw_r2020b_pattern\/pattern_functions.png\" alt=\"\"> <\/p><p>Let's look at an example. Here is a string vector.<\/p><pre class=\"codeinput\">str = [<span class=\"string\">\"When I joined MathWorks, the version of MATLAB was R2006a.\"<\/span>\r\n  <span class=\"string\">\"When I moved to Japan, r2014a had just been released.\"<\/span>\r\n  <span class=\"string\">\"Now, six years later, the current version is R2020B.\"<\/span>];\r\n<\/pre><p>Let's say that I want to extract all of the MATLAB releases from the text. All I need to do is search for patterns starting with \"R\", followed by 4 numbers, followed by \"a\" or \"b\".<\/p><pre class=\"codeinput\">pat = <span class=\"string\">\"R\"<\/span> + digitsPattern(4) + (<span class=\"string\">\"a\"<\/span>|<span class=\"string\">\"b\"<\/span>)\r\n<\/pre><pre class=\"codeoutput\">pat = \r\n  pattern\r\n  Matching:\r\n    \"R\" + digitsPattern(4) + (\"a\" | \"b\")\r\n<\/pre><p>To allow case insensitive search,<\/p><pre class=\"codeinput\">pat = caseInsensitivePattern(pat)\r\n<\/pre><pre class=\"codeoutput\">pat = \r\n  pattern\r\n  Matching:\r\n    caseInsensitivePattern(\"R\" + digitsPattern(4) + (\"a\" | \"b\"))\r\n<\/pre><p>Now we simply extract!<\/p><pre class=\"codeinput\">extract(str, pat)\r\n<\/pre><pre class=\"codeoutput\">ans = \r\n  3&times;1 string array\r\n    \"R2006a\"\r\n    \"r2014a\"\r\n    \"R2020B\"\r\n<\/pre><p>For reference, if we were to do this with regular expressions,<\/p><pre class=\"codeinput\">regexp(str, <span class=\"string\">\"[Rr]\\d{4}[aAbB]\"<\/span>, <span class=\"string\">\"match\"<\/span>)\r\n<\/pre><pre class=\"codeoutput\">ans =\r\n  3&times;1 cell array\r\n    {[\"R2006a\"]}\r\n    {[\"r2014a\"]}\r\n    {[\"R2020B\"]}\r\n<\/pre><p>You be the judge as to which one is easier to understand.<\/p><p>Take a look at <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/matlab_prog\/build-pattern-expressions.html\">this page<\/a> for more examples. Patterns will not be able to completely replace regular expressions, but not to worry! You can use <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2020b\/matlab\/ref\/regexppattern.html\"><tt>regexpPattern<\/tt><\/a> to match regular expressions.<\/p>\r\n<p><\/p>\r\nEDIT: Check out <a href=\"https:\/\/blogs.mathworks.com\/pick\/2020\/10\/15\/r2020b-pattern-new-way-to-regular-express\/#reply_36493\">one of the comments<\/a> below to see a slightly more complicated example.\r\n<p><\/p>\r\n<p><b>Comments<\/b><\/p><p>Give it a try and let us know what you think <a href=\"http:\/\/blogs.mathworks.com\/pick\/?p=11803#respond\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_cc254510ac08407cb9315ce308063a07() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='cc254510ac08407cb9315ce308063a07 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' cc254510ac08407cb9315ce308063a07';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2020 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_cc254510ac08407cb9315ce308063a07()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2020b<br><\/p><p class=\"footer\"><br>\r\n      Published with MATLAB&reg; R2020b<br><\/p><\/div><!--\r\ncc254510ac08407cb9315ce308063a07 ##### SOURCE BEGIN #####\r\n%%\r\n% <http:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/869871 Jiro>'s\r\n% Pick this week is the new\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/pattern.html pattern matching>\r\n% capabilities that were added in the newest release, R2020b.\r\n%\r\n% I've always had a love-hate relationship with\r\n% <https:\/\/en.wikipedia.org\/wiki\/Regular_expression regular expressions>.\r\n% It's a powerful technique for string searching. It's powerful, but it's\r\n% quite complicated as well. The fact that there are books on regular\r\n% expressions goes to show that it is not a trivial technique to completely\r\n% master. We have Picked a couple of entries related to regular expressions\r\n% in the past, such as\r\n% <https:\/\/blogs.mathworks.com\/pick\/2009\/11\/20\/regular-expressionsa-cheat-sheet\/\r\n% a regular express cheat sheet> and\r\n% <https:\/\/blogs.mathworks.com\/pick\/2013\/09\/27\/regexp-builder\/ a regular\r\n% expression builder app>.\r\n%\r\n% With R2020b, there is a whole new way of searching and modifying text.\r\n% It involves a much simpler method of building pattern expressions using\r\n% simple functions.\r\n%\r\n% <<pattern_functions.png>>\r\n%\r\n% Let's look at an example. Here is a string vector.\r\n\r\nstr = [\"When I joined MathWorks, the version of MATLAB was R2006a.\"\r\n  \"When I moved to Japan, r2014a had just been released.\"\r\n  \"Now, six years later, the current version is R2020B.\"];\r\n\r\n%%\r\n% Let's say that I want to extract all of the MATLAB releases from the\r\n% text. All I need to do is search for patterns starting with \"R\", followed\r\n% by 4 numbers, followed by \"a\" or \"b\".\r\n\r\npat = \"R\" + digitsPattern(4) + (\"a\"|\"b\")\r\n\r\n%%\r\n% To allow case insensitive search,\r\n\r\npat = caseInsensitivePattern(pat)\r\n\r\n%%\r\n% Now we simply extract!\r\n\r\nextract(str, pat)\r\n\r\n%%\r\n% For reference, if we were to do this with regular expressions,\r\n\r\nregexp(str, \"[Rr]\\d{4}[aAbB]\", \"match\")\r\n\r\n%%\r\n% You be the judge as to which one is easier to understand.\r\n%\r\n% Take a look at\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/matlab_prog\/build-pattern-expressions.html\r\n% this page> for more examples. Patterns will not be able to completely\r\n% replace regular expressions, but not to worry! You can use\r\n% <https:\/\/www.mathworks.com\/help\/releases\/R2020b\/matlab\/ref\/regexppattern.html\r\n% |regexpPattern|> to match regular expressions.\r\n%\r\n% *Comments*\r\n%\r\n% Give it a try and let us know what you think\r\n% <http:\/\/blogs.mathworks.com\/pick\/?p=11803#respond here>.\r\n\r\n##### SOURCE END ##### cc254510ac08407cb9315ce308063a07\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/jiro\/potw_r2020b_pattern\/pattern_functions.png\" onError=\"this.style.display ='none';\" \/><\/div><p>Jiro's Pick this week is the new pattern matching capabilities that were added in the newest release, R2020b.I've always had a love-hate relationship with regular expressions. It's a powerful... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/pick\/2020\/10\/15\/r2020b-pattern-new-way-to-regular-express\/\">read more >><\/a><\/p>","protected":false},"author":35,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[16],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/11803"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/users\/35"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/comments?post=11803"}],"version-history":[{"count":6,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/11803\/revisions"}],"predecessor-version":[{"id":11833,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/11803\/revisions\/11833"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/media?parent=11803"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/categories?post=11803"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/tags?post=11803"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}