Comments on: MATLAB, Strings, and Regular Expressions https://blogs.mathworks.com/loren/2006/04/05/regexp-how-tos/?s_tid=feedtopost Loren Shure is interested in the design of the MATLAB language. She is an application engineer and writes here about MATLAB programming and related topics. Thu, 28 Jul 2016 19:20:20 +0000 hourly 1 https://wordpress.org/?v=6.2.2 By: Jason Breslau https://blogs.mathworks.com/loren/2006/04/05/regexp-how-tos/#comment-32862 Tue, 27 Dec 2011 15:11:29 +0000 https://blogs.mathworks.com/loren/?p=27#comment-32862 This is a limitation of dynamic expressions which is not well documented.

The expression generated dynamically needs to be a complete expression. What this means is that if you take the pattern returned by the dynamic operator, it should be able to match as a standalone expression.

The dynamic portion of your pattern produces the expression “4” which will match the number “4”, as opposed to being used as a quantifier for the greater expression which contains it. The way you handle this is to generate more of your expression as part of the dynamic component:

>> ptrn = ‘^(\d)((??(\\s+[A-D]){$1}))’;
>> [m,t]=regexpi(tst,ptrn,’match’,’tokens’,’once’)

m =

4 A B C D

t =

‘4’ ‘ A B C D’

A few things to note here:

1) Changing the dynamic component to (??{$1}) is not enough, as it is still not a complete expression, just the quantifier.

2) You have to escape the \s in the expression, as the entire subexpression is parsed like a replace string for regexprep.

3) To capture the second portion, I needed to add another set of parenthesis, as dynamic expressions can not create new capturing groups.

I hope that helps,

-=>J

]]>
By: Adrian Thompson https://blogs.mathworks.com/loren/2006/04/05/regexp-how-tos/#comment-32861 Sat, 24 Dec 2011 03:26:15 +0000 https://blogs.mathworks.com/loren/?p=27#comment-32861 Loren,

Something’s not working for me.

Why does the following use of a dynamic regular expression not return the same result as its static equivalent? Am I missing something about when dynamic expressions can be used in R2011a?

>> tst=’4 A B C D 3′;
>> ptrn=’^(\d)(\s+[A-D]){4}’;
>> [m,t]=regexpi(tst,ptrn,’match’,’tokens’,’once’)

m =

4 A B C D

t =

‘4’ ‘ A B C D’

>> ptrn=’^(\d)(\s+[A-D]){(??$1)}’

ptrn =

^(\d)(\s+[A-D]){(??$1)}

>> [m,t]=regexpi(tst,ptrn,’match’,’tokens’,’once’)

m =

t =

{}

NOTE: The only thing that has changed is that I’ve replaced the number 4 with the dynamic regular expression (??$1) that should equate to exactly the same character ‘4’, as per the tokens shown above. I also have the same problem when I use a dynamic function call, as in (?@return_same_character($1)). The token is definitely captured, but it seems like the string is not being properly updated before the final matching is attempted.

Thanks for you help on this; I’m stymied.

Regards,
Adrian

]]>
By: altreus https://blogs.mathworks.com/loren/2006/04/05/regexp-how-tos/#comment-32548 Sun, 16 Oct 2011 08:07:51 +0000 https://blogs.mathworks.com/loren/?p=27#comment-32548 Although this post is from 2006, it’s now 2011, so we’ve caught up!

The same regex can be used in Perl: they just didn’t because it’s horrible :) Clear code is preferable unless you’re intentionally obfuscating.

use List::Util qw(shuffle); #List::Util is core
while (<>) {
  s{(?<=\w)(\w{2,})(?=\w)}{join '', shuffle split //, $1}e;
  print $_, "\n";
}

Code evaluation like this has always been possible in Perl 5 as far as I can tell.

We also have named capture groups, which are stored in the special hash %+ since changing the way regexes return would introduce inconsistency:

(dump function provided by Data::Dump)

use 5.010;
my @date = '11/26/1977' =~ m{(\d+)/(\d+)/(\d+)};
dump \@date;

# [11, 26, 1977]


'11/26/1977' =~ m{(?\d+)/(?\d+)/(?\d+)};
my %date = %+;

dump \%date;

# { day => 26, month => 11, year => 1977 }

This is a feature of 5.10. Although many places are still running 5.8.8 it’s hardly Perl’s fault that people can’t keep up :) 5.10 is itself end-of-life; 5.14 is current and 5.12 is considered old.

Case preservation is a thing we don’t do. I suspect if you ask one of the core team why they will give you one of two answers:

1. We don’t know of anyone who has ever wanted it
2. We can’t make it work in a consistent way.

Kirk out

]]>
By: Brad Stiritz https://blogs.mathworks.com/loren/2006/04/05/regexp-how-tos/#comment-31924 Mon, 20 Dec 2010 15:32:36 +0000 https://blogs.mathworks.com/loren/?p=27#comment-31924 Hi Loren, thanks for your empathy & encouragement. I’ve submitted an enhancement request & referred to our dialogue here as supporting evidence. Fingers crossed! ;)

Happy holidays,
Brad

]]>
By: Loren https://blogs.mathworks.com/loren/2006/04/05/regexp-how-tos/#comment-31923 Mon, 20 Dec 2010 12:02:52 +0000 https://blogs.mathworks.com/loren/?p=27#comment-31923 Brad-

I have the same troubles with regexp as you describe and would love to see a GUI that could help me out! Please make this into an enhancement request by using the support link on the right side of my blog. Thanks!

–Loren

]]>
By: Brad Stiritz https://blogs.mathworks.com/loren/2006/04/05/regexp-how-tos/#comment-31922 Sun, 19 Dec 2010 17:27:48 +0000 https://blogs.mathworks.com/loren/?p=27#comment-31922 I occasionally need to use regexp() within MATLAB. When I do it’s usually a frustrating experience. The documentation for regular expressions is by necessity & by nature really long & really dense! The doc page: User’s Guide\Programming Fundamentals\ Basic Program Components\Regular Expressions is 48 letter-size pages long, 13,000 words!!

Consequently, If I’m doing anything even slightly non-trivial, it can be very time-consuming to work out the proper match expression by trial-and-error at the MATLAB command-line.

In desperation I Googled for a Reg. Expr. utility & found a free .NET-based offering called Expresso. This is a multi-pane app (similar in look to MATLAB) that allows one to see (in a tree view) how match expressions parse out. It’s really lowered the stress-level for me & speeds things up dramatically when I have to use regexp() in MATLAB.

IMHO, the Mathworks should consider offering some kind of analyzer like Expresso within the MATLAB environment. Otherwise, in my experience, reg. expr’s can be a real bottleneck for rapid code development.

Any comments appreciated,
Respectfully,
Brad

]]>
By: Ray https://blogs.mathworks.com/loren/2006/04/05/regexp-how-tos/#comment-30976 Tue, 19 Jan 2010 04:27:36 +0000 https://blogs.mathworks.com/loren/?p=27#comment-30976 It would be nice to have MATLAB language/text processing toolbox.

]]>
By: Prakash https://blogs.mathworks.com/loren/2006/04/05/regexp-how-tos/#comment-30880 Mon, 07 Dec 2009 06:43:55 +0000 https://blogs.mathworks.com/loren/?p=27#comment-30880 Figured out just now. For getting the number from the 7th row and 31st column of a cellarray, use

str2double(c{7,31}{1})

long live braces. Thanks for the platform.

]]>
By: Prakash https://blogs.mathworks.com/loren/2006/04/05/regexp-how-tos/#comment-30879 Mon, 07 Dec 2009 05:55:35 +0000 https://blogs.mathworks.com/loren/?p=27#comment-30879 I am unable to get numbers from tokens. Say token c is shown as 8X42 (cellarray?). How do I get a number corresponding to say element in 7th row and 31st column?

How do I use str2double? For a 1X1 cellarray
str2double(c{1}{1});
seems to work as an argument to str2double. I cannot figure out what would work for later elements of the cellarray.

]]>
By: Loren https://blogs.mathworks.com/loren/2006/04/05/regexp-how-tos/#comment-30707 Tue, 27 Oct 2009 17:06:17 +0000 https://blogs.mathworks.com/loren/?p=27#comment-30707 Alexander-

The documentation covers what MATLAB supports.

–Loren

]]>