Skip to Main Content Skip to Search
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Loren on the Art of MATLAB

March 27th, 2008

A Way to Automate “Regular” Renaming

Recently someone at MathWorks asked me how he could automate the renaming of a bunch of M-files containing underscores ('_') in the names with derived names that removed the underscores and used camelCasing instead. You may have similar name manipulation operations you need to perform.

Contents

My First Attempt

Of course I resorted to using MATLAB for the task, despite other options. I chose the following requirements.

  • Don't worry about leading _
  • Don't worry about cell arrays of strings or string matrices (vectors only need apply)
  • Do worry about multiple consecutive _
  • Do worry about trailing _

Some Sample Names

I first create a list of some sample names so I have a test suite to try out.

names = {'foo_bar','foo_bar_','foo__bar', ...
    'foo_bar__', 'foo_3','foo_3_','foo_3a', ...
    'foo_bar____baz___234___'};
allnames = names'
allnames = 
    'foo_bar'
    'foo_bar_'
    'foo__bar'
    'foo_bar__'
    'foo_3'
    'foo_3_'
    'foo_3a'
    'foo_bar____baz___234___'

My Solution

Let's first try out my solution on these.

for name = names
    disp(camelCase(name{1}));
end
fooBar
fooBar
fooBar
fooBar
foo3
foo3
foo3a
fooBarBaz234

And now let's look at the code.

type camelCase
function y = camelCase(x)
%camelCase Convert name with underscores to camelCase.

% find the underscores 
indall = find(x=='_');
% figure out where consecutive _ are 
% and remove all but the last 
consec = diff(indall)==1;
ind = indall;
ind(consec) = [];

y = x;
y(min(ind+1,end)) = upper(y(min(ind+1,end)));
y(indall) = '';

I first find all the underscores. Then I look for consecutive ones since I really only want the last one in each sequence, since it's the following character that I want to turn into upper case. That is, if a following character exists! So I have to check for that too. I then have an array of indices to upper case (though I allow myself to uppercase _ at the end if it's the last character so I don't have to lengthen my input array; upper('_') is the same as '_'). Now, I go back and use the original indices pointing to all the instances of '_' and remove them. Voila!

History Lesson

And then I got some pangs, because I am well aware that MATLAB supports regular expressions. First some history. Did you know that Stephen Kleene, an American mathematician, was the inventor of regular expressions? He has also been credited with developing a very approachable proof to Gödel's incompleteness theorems. And some punster then said, "Kleeneliness is next to Gödeliness".

Using regexprep

My friend, colleague, and regexp guru, Jason Breslau gave me the regexprep solution to the problem. Using the same names as before, I next show you Jason's magical 1-line expression, producing the same output as my M-file above.

for name = names
    disp(regexprep(name{1}, '_+(\w?)', '${upper($1)}'));
end
fooBar
fooBar
fooBar
fooBar
foo3
foo3
foo3a
fooBarBaz234

Conclusions

My code is still easier for me to understand, and I conclude from that that I should spend some time trying to master regular expressions. In addition, the regular expression code requires no temporary variables, some of which could be large if the input string is long enough. It also occurs to me that regular expressions are a topic worthy of students learning well in college. What do you think? Let me know here.


Get the MATLAB code

Published with MATLAB® 7.6

15 Responses to “A Way to Automate “Regular” Renaming”

  1. Scott Hirsch replied on :

    Thanks for the great read, Loren. I’ve always been in awe of regular expressions, and those who can leverage them efficiently. They’ve always mystified me, and have long been on my list of something to learn some day. For now, though, I think I’ll strive to writing code as clean as your first stab!

  2. Bob replied on :

    I’m a casual user of regular expressions and found a good tool for providing immediate feedback on example text invaluable. I’ve seen them in Eclipse and jEdit, and I’m happy to see someone may have already written one for Matlab.

    RegexpHelper:
    http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=15215&objectType=file

  3. Jason Breslau replied on :

    I suppose it really matters where you’re coming from:

    The regexprep answer looks so much more straightforward to me.

    Also, regexprep handles cell arrays of strings directly, so the loop isn’t necessary.

  4. Markus replied on :

    You are right Jason, it matters where you are coming from. If you have never written (and debugged) code like Loren did for parsing strings, you might think the regular expression looks weird and complicated. However, after writing such functions several times, you might one day discover regular expressions and their power and in the future will be glad you did discover them!

    Markus

  5. David replied on :

    Hi,
    have a meta-comment, wasn’t sure where to post it. just wanted to let you know that the quotation marks in the title to this post don’t display properly on http://blogs.mathworks.com/

    regards,
    david

  6. Loren replied on :

    David- Thanks for mentioning the issue. I have passed it along to someone here who I hope can fix it.

    Jason- I could have used cellfun in the non-regexprep version of my code, or written a method for cell arrays of strings. I didn’t think that added much to this conversation. The regexprep code still has to do something difference (dereferencing?) with cells than with naked strings.

    –Loren

  7. Dan K replied on :

    My position is more similar to Loren’s. I have used, and will continue to use regular expressions (admittedly only the basic aspects of them), but I have never found a set of documentation which makes it clear to me how to effectively construct the regexp that I want. I do use the regExpHelper from FEX, but if anybody knows of a good reference which does a good job of explaining how to construct the more complex regexp’s I would love to know about it. I think that the regexp suffers from its very power. It has so much capability and so many options that the learning curve is very steep indeed! (Hmmm… come think of it, that’s exactly what a co-worker said to me recently about MATLAB!) I guess that’s another argument for: it all depends on what you’re used to. Anyhow, if anybody knows of the equivalent of “RegExp for Dummies”, please let me know.

    Thanks,
    Dan

  8. Graeme E. Smith replied on :

    I think the regular expression is one of those: if-you-know-how-to-use-it-more-power-to-you type features. I had a little experience using them some years ago: on the one hand, when they worked they were incredible; but on the other hand when they didn’t work I found them a nightmare to debug. I think it’s well worth remember that they’re available in Matlab, but like many features of computing, unless you have cause to use them on a regular basis it’s unlikely you’ll become a master.

  9. Antunes replied on :

    Regular expressions are great -this was the reason why I learned PERL. I use Matlab most of the time and would prefer to use regexps without leaving my favorite tool. However, I am not aware of any book discussing regexps in Matlab.

    So here’s a request: a Matlab book on string parsing with regexps oriented to scientific problems (bioinformatics, text mining, etc.). This is much needed.

    Antunes

  10. Scott Miller replied on :

    Loren,

    A nice little piece. Once again, I find I learn the most about programming by reading OPC (Other People’s Code!). In this case, ‘type’ doesn’t appear as a keyword when I do a search, but it works.

    Confronted with this problem, I would think “regexp”, then struggle with the rules for a while until I got a solution, perhaps consulting with a C/C++ colleague down the hall, and perhaps looking at some other code in FEX. It is a shame that there isn’t a usable tool to help with this problem - see my comments on

    http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=15215&objectType=file

    which appears to be a foray in the right direction.

    The ability to loop ‘for’ over a list, as in ‘for name = names’ isn’t described in the TMW documentation either. Since the latter structure appears in some languages, I had previously tried this construction, but found that it works for some instances and not others. Having said that, I have just tried to get it to fail, and can’t do it. Since the documentation doesn’t describe this usage, could you go over the allowed syntax?

    Scott

  11. Scott Miller replied on :

    Loren,

    Having now tried the latest version of regExpHelper, I find that it is better than the initial release. There is enough in the comment lines to sort out how to use it, and I was able to get a part of your regexp code to run:
    ‘_+(\w?)’. That alone is potentially a great help.

    A *usable* tool is certainly available in the form of regExpHelper. Now if only it could learn from examples…

    Scott

  12. Loren replied on :

    Scott-

    In this part of the doc under “Using Arrays as Indices”, you will see you can loop over the columns of any kind of array, not just a row vector of numbers. MATLAB has behaved like that from the inception.

    –Loren

  13. Loren replied on :

    Scott-

    Indeed yes, train with examples. That’s the holy grail.

    –Loren

  14. Francois replied on :

    I agree, regular expression should be part of the training for programmers or any relevant disciplines.

  15. Gordon Saksena replied on :

    Though I had fiddled with regular expressions before, they really clicked after reading the first part of “Mastering Regular Expressions” by Jeffrey Friedl. He explains what regular expression engines do internally, which helped me organize the myriad of special regexp symbols and gave me an intuition into what is possible.

Leave a Reply


Loren Shure works on design of the MATLAB language at The MathWorks. She writes here about once a week on MATLAB programming and related topics.

  • Loren: Timothee- Anonymous functions can only be a single (complicated) expression. You might be able to do what you...
  • Timothee: Is there a way to combine multiple commands in anonymous functions? ex1: fun=@(A)([V,D]=eig(A ); A*V-V*D)...
  • Loren: Here’s Cleve’s reply to Etienne: The crucial factor is the number and location of the nonzero...
  • Loren: Tristan- Nested functions can be slower in some cases currently. We know we have some opportunities to...
  • Tristan: Wow! I just tried with a global variable and it’s 5 times slower than with a argument! function...
  • Jon: Loren, I encountered this same problem and I attempted to find the answer by looking at the documentation for...
  • Tristan: “One thing that I have long wondered about is relative speed of nested functions relative to...
  • Etienne Non: Hi! I’m trying to understand why the Matlab function LU.m takes almost 20 times more time to...
  • Loren: Jonathan- The behavior you see is because the variable x has to come into inplaceTest and then a copy is made...
  • Jonathan: I am calling it from another function, but have just noticed a bit more odd behavior. Here is what...

These postings are the author's and don't necessarily represent the opinions of The MathWorks.

Related Topics