Loren on the Art of MATLAB

Turn ideas into MATLAB

String Things

Working with text in MATLAB has evolved over time. Way back, text data was stored in double arrays with an internal flag to denote that it was meant to be text. We then transformed this representation so character arrays were their very own type. And I mentioned earlier that we introduced a string datatype to make working with text data more efficient and natural. Let me show you a little more.

Contents

How to Compare Text: the Olden Days

Early on in MATLAB, we used the function strcmp to compare strings. A big caveat for many people is that strcmp does not behave the same way as its C-language counterpart. We then added over time a few more comparison functions:

to allow case-insensitive matches and to constrain the match to at most n characters.

Let's do some comparisons now. First on cell arrays of strings...

cellChars = {'Mercury','Venus','Earth','Mars'}
cellChars =
  1×4 cell array
    {'Mercury'}    {'Venus'}    {'Earth'}    {'Mars'}
TF = strcmp('fred',cellChars)
TF =
  1×4 logical array
   0   0   0   0
TF = strcmp('Venus',cellChars)
TF =
  1×4 logical array
   0   1   0   0
TF = strncmp('Mars', cellChars, 2)
TF =
  1×4 logical array
   0   0   0   1
TF = strncmp('Marvelous', cellChars, 2)
TF =
  1×4 logical array
   0   0   0   1
TF = strncmp('Marvelous', cellChars, 4)
TF =
  1×4 logical array
   0   0   0   0
TF = strcmpi('mars', cellChars)
TF =
  1×4 logical array
   0   0   0   1
TF = strcmpi('mar', cellChars)
TF =
  1×4 logical array
   0   0   0   0

More Modern, Not Identical Use

We also introduced categorical arrays for cases where limiting the set of string choices was appropriate. When using categorical variables, you may use == for comparisons.

catStr = categorical(cellChars)
catStr = 
  1×4 categorical array
     Mercury      Venus      Earth      Mars 
TF = 'Mars' == catStr
TF =
  1×4 logical array
   0   0   0   1

String Comparisons Circa 2020

And now for string comparisons.

str = string(cellChars) % or ["Mercury","Venus","Earth","Mars"]
str = 
  1×4 string array
    "Mercury"    "Venus"    "Earth"    "Mars"

I can still use the str*cmp* functions. But we are not restricted to them.

TF = strcmp ('Mars', str)
TF =
  1×4 logical array
   0   0   0   1

We can now use == and related operators without worrying about indexing issues that might arise with character arrays.

TF = str ~= "Mars"
TF =
  1×4 logical array
   1   1   1   0

And most recently, we introduced the function matches.

TF = matches(str,"Earth")
TF =
  1×4 logical array
   0   0   1   0

It's got some nice features that allow for handling string arrays very nifty. Like looking for planets with an orbit inside Earth.

TF = matches(str,["Mercury","Venus"])
TF =
  1×4 logical array
   1   1   0   0

And I can, of course, ignore case, with code that, to me, appears less cryptic.

TF = matches(str,"earth","IgnoreCase",true)
TF =
  1×4 logical array
   0   0   1   0

As is true in all of these cases, we can index into the original array with the logical output to extract the relevant item(s).

str(TF)
ans = 
    "Earth"

My Advice: Err on the Side of Code Readability

I haven't touched on performance here, but one of the drivers for the recent string datatype is efficiency and performance. We've worked hard to overlay that with functions that make your code highly readable. This makes code maintenance and code transfer go much more smoothly. I tend to favor this over eking out the last fractional second of speed. In the case of strings, you may not even need to make that tradeoff.

String Adoption

Have you seen enough evidence that string are the future for working with textual data in MATLAB? Tell us what you think here.




Published with MATLAB® R2019b

|
  • print
  • send email

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.