Recently someone at MathWorks asked me how he could automate the renaming of a bunch of M-files containing underscores ('_') in the names with derived names that removed the underscores and used camelCasing instead. You may have similar name manipulation operations you need to perform.
Contents
My First Attempt
Of course I resorted to using MATLAB for the task, despite other options. I chose the following requirements.
- Don't worry about leading _
- Don't worry about cell arrays of strings or string matrices (vectors only need apply)
- Do worry about multiple consecutive _
- Do worry about trailing _
Some Sample Names
I first create a list of some sample names so I have a test suite to try out.
names = {'foo_bar','foo_bar_','foo__bar', ...
'foo_bar__', 'foo_3','foo_3_','foo_3a', ...
'foo_bar____baz___234___'};
allnames = names'allnames =
'foo_bar'
'foo_bar_'
'foo__bar'
'foo_bar__'
'foo_3'
'foo_3_'
'foo_3a'
'foo_bar____baz___234___'
My Solution
Let's first try out my solution on these.
for name = names disp(camelCase(name{1})); end
fooBar fooBar fooBar fooBar foo3 foo3 foo3a fooBarBaz234
And now let's look at the code.
type camelCasefunction y = camelCase(x) %camelCase Convert name with underscores to camelCase. % find the underscores indall = find(x=='_'); % figure out where consecutive _ are % and remove all but the last consec = diff(indall)==1; ind = indall; ind(consec) = []; y = x; y(min(ind+1,end)) = upper(y(min(ind+1,end))); y(indall) = '';
I first find all the underscores. Then I look for consecutive ones since I really only want the last one in each sequence, since it's the following character that I want to turn into upper case. That is, if a following character exists! So I have to check for that too. I then have an array of indices to upper case (though I allow myself to uppercase _ at the end if it's the last character so I don't have to lengthen my input array; upper('_') is the same as '_'). Now, I go back and use the original indices pointing to all the instances of '_' and remove them. Voila!
History Lesson
And then I got some pangs, because I am well aware that MATLAB supports regular expressions. First some history. Did you know that Stephen Kleene, an American mathematician, was the inventor of regular expressions? He has also been credited with developing a very approachable proof to Gödel's incompleteness theorems. And some punster then said, "Kleeneliness is next to Gödeliness".
Using regexprep
My friend, colleague, and regexp guru, Jason Breslau gave me the regexprep solution to the problem. Using the same names as before, I next show you Jason's magical 1-line expression, producing the same output as my M-file above.
for name = names disp(regexprep(name{1}, '_+(\w?)', '${upper($1)}')); end
fooBar fooBar fooBar fooBar foo3 foo3 foo3a fooBarBaz234
Conclusions
My code is still easier for me to understand, and I conclude from that that I should spend some time trying to master regular expressions. In addition, the regular expression code requires no temporary variables, some of which could be large if the input string is long enough. It also occurs to me that regular expressions are a topic worthy of students learning well in college. What do you think? Let me know here.
Get
the MATLAB code
Published with MATLAB® 7.6



Thanks for the great read, Loren. I’ve always been in awe of regular expressions, and those who can leverage them efficiently. They’ve always mystified me, and have long been on my list of something to learn some day. For now, though, I think I’ll strive to writing code as clean as your first stab!
I’m a casual user of regular expressions and found a good tool for providing immediate feedback on example text invaluable. I’ve seen them in Eclipse and jEdit, and I’m happy to see someone may have already written one for Matlab.
RegexpHelper:
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=15215&objectType=file
I suppose it really matters where you’re coming from:
The regexprep answer looks so much more straightforward to me.
Also, regexprep handles cell arrays of strings directly, so the loop isn’t necessary.
You are right Jason, it matters where you are coming from. If you have never written (and debugged) code like Loren did for parsing strings, you might think the regular expression looks weird and complicated. However, after writing such functions several times, you might one day discover regular expressions and their power and in the future will be glad you did discover them!
Markus
Hi,
have a meta-comment, wasn’t sure where to post it. just wanted to let you know that the quotation marks in the title to this post don’t display properly on http://blogs.mathworks.com/
regards,
david
David- Thanks for mentioning the issue. I have passed it along to someone here who I hope can fix it.
Jason- I could have used cellfun in the non-regexprep version of my code, or written a method for cell arrays of strings. I didn’t think that added much to this conversation. The regexprep code still has to do something different (dereferencing?) with cells than with naked strings.
–Loren
My position is more similar to Loren’s. I have used, and will continue to use regular expressions (admittedly only the basic aspects of them), but I have never found a set of documentation which makes it clear to me how to effectively construct the regexp that I want. I do use the regExpHelper from FEX, but if anybody knows of a good reference which does a good job of explaining how to construct the more complex regexp’s I would love to know about it. I think that the regexp suffers from its very power. It has so much capability and so many options that the learning curve is very steep indeed! (Hmmm… come think of it, that’s exactly what a co-worker said to me recently about MATLAB!) I guess that’s another argument for: it all depends on what you’re used to. Anyhow, if anybody knows of the equivalent of “RegExp for Dummies”, please let me know.
Thanks,
Dan
I think the regular expression is one of those: if-you-know-how-to-use-it-more-power-to-you type features. I had a little experience using them some years ago: on the one hand, when they worked they were incredible; but on the other hand when they didn’t work I found them a nightmare to debug. I think it’s well worth remember that they’re available in Matlab, but like many features of computing, unless you have cause to use them on a regular basis it’s unlikely you’ll become a master.
Regular expressions are great -this was the reason why I learned PERL. I use Matlab most of the time and would prefer to use regexps without leaving my favorite tool. However, I am not aware of any book discussing regexps in Matlab.
So here’s a request: a Matlab book on string parsing with regexps oriented to scientific problems (bioinformatics, text mining, etc.). This is much needed.
Antunes
Loren,
A nice little piece. Once again, I find I learn the most about programming by reading OPC (Other People’s Code!). In this case, ‘type’ doesn’t appear as a keyword when I do a search, but it works.
Confronted with this problem, I would think “regexp”, then struggle with the rules for a while until I got a solution, perhaps consulting with a C/C++ colleague down the hall, and perhaps looking at some other code in FEX. It is a shame that there isn’t a usable tool to help with this problem - see my comments on
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=15215&objectType=file
which appears to be a foray in the right direction.
The ability to loop ‘for’ over a list, as in ‘for name = names’ isn’t described in the TMW documentation either. Since the latter structure appears in some languages, I had previously tried this construction, but found that it works for some instances and not others. Having said that, I have just tried to get it to fail, and can’t do it. Since the documentation doesn’t describe this usage, could you go over the allowed syntax?
Scott
Loren,
Having now tried the latest version of regExpHelper, I find that it is better than the initial release. There is enough in the comment lines to sort out how to use it, and I was able to get a part of your regexp code to run:
‘_+(\w?)’. That alone is potentially a great help.
A *usable* tool is certainly available in the form of regExpHelper. Now if only it could learn from examples…
Scott
Scott-
In this part of the doc under “Using Arrays as Indices”, you will see you can loop over the columns of any kind of array, not just a row vector of numbers. MATLAB has behaved like that from the inception.
–Loren
Scott-
Indeed yes, train with examples. That’s the holy grail.
–Loren
I agree, regular expression should be part of the training for programmers or any relevant disciplines.
Though I had fiddled with regular expressions before, they really clicked after reading the first part of “Mastering Regular Expressions” by Jeffrey Friedl. He explains what regular expression engines do internally, which helped me organize the myriad of special regexp symbols and gave me an intuition into what is possible.
Loren - Assume i want to replace ocurrences of ‘Monday’, ‘Tuesday’, ‘Wednesday’, in a string with ‘mon’, ‘tue’, ‘wed’ respectively. Can this be done with one or two lines of code? I am currently using case statements.
Vijay-
I would loop through the replacement strings myself. Not sure if there’s a regexp way to do it.
–Loren