Loren on the Art of MATLAB

Turn ideas into MATLAB

Singing the Praises of Strings 7

Posted by Loren Shure,

There is a new way to work with textual data in MATLAB R2016b. The new string datatype haven't got enough attention from me until recently. I have been chatting with colleagues Matt Tearle and Adam Sifounakis and we have each discovered a similar beautiful code pattern in MATLAB for generating a sequence of strings.

Contents

MathWorks History with Textual Data

Early on, MATLAB had character arrays. Let's create one.

myCharPets = ['dog ';'cat ';'fish']
myCharPets =
dog 
cat 
fish

Notice how I had to add trailing blanks for the first 2 pets because my final pet, a fish, required more memory (like Dory from Finding Nemo)?.

I can find my second pet, but, to be fair, I also have to remove the trailing blank.

pet2 = deblank(myCharPets(2,:))
pet2 =
cat

With MATLAB 5.0, we introduced cell arrays and then cell arrays of strings. Since each cell contains its own MATLAB array, there is no need for each array to contain the same number of elements. So we can do this, exploiting some "new" syntax.

myCellPets = {'dog';'cat';'fish'}
myCellPets =
  3×1 cell array
    'dog'
    'cat'
    'fish'

I can find the second pet on the list, with some more, but similar, "new" syntax.

pet2 = myCellPets{2}
pet2 =
cat

String Datatype

In MATLAB Release R2016b, we introduced the notion of a string. Now I can create an array of textual data another way.

myStringPets = string(myCellPets)
myStringPets = 
  3×1 string array
    "dog"
    "cat"
    "fish"

And I can find my second pet again

pet2 = myStringPets(2)
pet2 = 
  string
    "cat"

I think the notation feels much more natural. And I can add strings together.

allofmypets = myStringPets(1) + ' & ' + myStringPets(2) + ' & ' + myStringPets(3)
allofmypets = 
  string
    "dog & cat & fish"

Ok, yes, I really should vectorize that. And I can do that with strings!

But wait, there's more!

You may remember that recently, Steve Eddins posted on my blog about implicit expansion? Well, we can take good advantage of that with strings.

Suppose I want to create an array of directory names that are embedded with a sequence of years.

dirnames = string('C:\work\data\yob\') + (2000:2010)'
dirnames = 
  11×1 string array
    "C:\work\data\yob\2000"
    "C:\work\data\yob\2001"
    "C:\work\data\yob\2002"
    "C:\work\data\yob\2003"
    "C:\work\data\yob\2004"
    "C:\work\data\yob\2005"
    "C:\work\data\yob\2006"
    "C:\work\data\yob\2007"
    "C:\work\data\yob\2008"
    "C:\work\data\yob\2009"
    "C:\work\data\yob\2010"

And if I want to add months, I can do that too.

quarterlyMonths = string({'Jan','Apr','Jul','Oct'});
dirname = string('C:\root\') + quarterlyMonths + (2000:2010)'
dirname = 
  11×4 string array
  Columns 1 through 3
    "C:\root\Jan2000"    "C:\root\Apr2000"    "C:\root\Jul2000"
    "C:\root\Jan2001"    "C:\root\Apr2001"    "C:\root\Jul2001"
    "C:\root\Jan2002"    "C:\root\Apr2002"    "C:\root\Jul2002"
    "C:\root\Jan2003"    "C:\root\Apr2003"    "C:\root\Jul2003"
    "C:\root\Jan2004"    "C:\root\Apr2004"    "C:\root\Jul2004"
    "C:\root\Jan2005"    "C:\root\Apr2005"    "C:\root\Jul2005"
    "C:\root\Jan2006"    "C:\root\Apr2006"    "C:\root\Jul2006"
    "C:\root\Jan2007"    "C:\root\Apr2007"    "C:\root\Jul2007"
    "C:\root\Jan2008"    "C:\root\Apr2008"    "C:\root\Jul2008"
    "C:\root\Jan2009"    "C:\root\Apr2009"    "C:\root\Jul2009"
    "C:\root\Jan2010"    "C:\root\Apr2010"    "C:\root\Jul2010"
  Column 4
    "C:\root\Oct2000"
    "C:\root\Oct2001"
    "C:\root\Oct2002"
    "C:\root\Oct2003"
    "C:\root\Oct2004"
    "C:\root\Oct2005"
    "C:\root\Oct2006"
    "C:\root\Oct2007"
    "C:\root\Oct2008"
    "C:\root\Oct2009"
    "C:\root\Oct2010"

How cool is that!

Is There More?

This is just the beginning for strings. You can find out what else is available now.

methods(string)
Methods for class string:

cellstr         extractAfter    le              split           
char            extractBefore   lower           splitlines      
compose         extractBetween  lt              startsWith      
contains        ge              ne              strip           
count           gt              pad             strlength       
double          insertAfter     plus            upper           
endsWith        insertBefore    replace         
eq              ismissing       replaceBetween  
erase           issorted        reverse         
eraseBetween    join            sort            

And you can bet we have plans to add more capabilities for strings over time. What features would you like to see us add? Let us know here.


Get the MATLAB code

Published with MATLAB® R2016b

Note

Comments are closed.

7 CommentsOldest to Newest

Julian replied on : 3 of 7

It’s great to see to see string theory going in MATLAB, but I guess at this time of year many people will be Singing Praises of Hymns….

I independent discovered how you could use the string operator + with the new scalar expansion to do really nice things like this. It leads to the likely retirement of a function strprod I wrote years ago and use frequently for just this problem.

You ask about possible ideas to extend string capability. What I would like is not so much new methods for strings, but that existing functions allow strings to be used interchangeably with cellstr arrays, e.g. accessing variables in a table. Var(“peter”, “var1”)

I really like that string() constructor can work with numbers (as did my own strprod), but it would be even better if the 2nd format argument for string could set the format of numerics and not just datetime, just like the old num2str e.g. string(pi, ‘%6g’) would give us “3.14159”

Merry Christmas & Happy New Year to all at MathWorks.com.

Loren Shure replied on : 4 of 7

@Julian-

Thanks for the feedback. It is the intention to have existing functions deal seamlessly with strings – we didn’t have time to do it all in the first round.

I have passed your feedback on to the development team as well.

And to ALL, a wonderful, festive holiday season to you all!

–Loren

Eric replied on : 5 of 7

This is something of a pain in the neck. Assume I have a function which uses a user-supplied character array input with a function that does not support the new string class. I either have to change the documentation to say “character array” instead of “string” for the description of the input variable or handle the conversion from string to character array in the code. Otherwise the user will assume a string object can be passed in and be surprised when an error is thrown. This is perhaps sloppy documentation on my part, but for a long time “string” and “character array” have been synonymous. I daresay more Matlab users would say ‘hello’ is a string and not a character array.

I do like the new string class, but the roll-out should have been made once its support was much more universal. Even simple functions like dir(), cd(), fullfile(), and fileparts() don’t yet except the new string class. It seems a very minimal effort was made to support the new string class in this initial roll-out. I understand support will get better, but I will have to support users potentially using R2016b for years to come.