Loren on the Art of MATLAB

Finding Strings 24

Posted by Loren Shure,

Over the years, MATLAB has become a friendlier environment for working with character information. MATLAB has a rich set of text handling functions, ranging from the simple, to the all-powerful regexp functionality (covered here). I'm going to cover a few of the simple and very useful string functions today.

Contents

Use strfind

Use strfind instead of findstr or find for string searches.

  • Preferred
             strfind('abc','a')
  • Not recommended
             findstr('abc','a')

This usage is a bit slower potentially and may cause confusion since there is no way to know which string was found in the other one.

  • Not recommended
             find('abc'=='a')

This usage is about 5 times slower than strfind, and is not robust, since it only works if one of the arguments to == is scalar.

  • Benefits
      - Speed improvement, less memory (no temporary for results of logical statement inside find
      - No ambiguity on which string to index into later, if desired
      - Code is robust compared to using FIND which can't handle as general a case, nor is FIND as fast.

Use strrep

Use strrep instead of replacing values via indexing.

  • Preferred (removing blanks from a string)
             str = strrep(str,' ','')
  • Not recommended
             ind = find(str==' '); str(ind) = []
             str(str==' ') = []
  • Preferred (remove & from strings, e.g., menu accelerators)
             str = strrep(str,'&','')
  • Not recommended
             menuLabelStr(find(menuLabelStr=='&')) = []
  • Benefits
      - speed
      - readability
      - more general, i.e., replacement strings don't need to be the same
        size (or empty) as the strings they replace

Use strncmp

Use strncmp instead of strmatch with literal second input.

  • Preferred
             strncmp(str,'string',length(str))
  • Not recommended
             strmatch(str,'string')
  • Not recommended
             strmatch(str,'string','exact')
  • Benefits
      - speed
  • Note
      - strmatch returns indices where the string is found, while strncmp
        returns true/false, so upgrading code requires more than just copy/paste.

Use strcmpi

Use strcmpi instead of using strcmp with upper or lower.

  • Preferred
             strcmpi(str,'lcstring')
  • Not recommended
             strcmp(lower(str),'lcstring')
  • Benefits
      - speed
           - fewer function calls
           - fewer temporary variables
      - readability

Use ismember

Use ismember to vectorize string finding operations.

  • Preferred
             pets = {'cat';'dog';'dog';'dog';'giraffe';'hamster'}
             species = {'cat' 'dog'}
             [tf, loc] = ismember(pets, species)
  • Not recommended
             locs = zeros(length(pets),1);
             for k = 1:length(species)
                 tf =  strcmp(pets, species(k));
                 locs(tf) = k;
             end
  • Benefits
      - speed
  • Note
      - strfind works on cell arrays of strings and returns results
        in a cell array, with relevant indices.  It does partial matching.
      - ismember requires an exact match.  The outputs are different
        than strfind's, so coding is not just a matter of direct
        substitution.

Summary

I've talked about a few simple string functions available in MATLAB. Do you have some simple string recommendations for users? Post your ideas here.


Get the MATLAB code

Published with MATLAB® 7.3

24 CommentsOldest to Newest

Kathirvel-

Menu accelerators are particular to Windows and they correspond to the underlined letters in the menus that you access with Alt-theChosenLetter. They allow you to navigate the menus without the mouse. They are different than Ctrl-someLetter in that these don’t navigate the menu, they are simply a direct shortcut to a particular action.

–Loren

Loren,
Regarding your comments on strcmpi, the case where I do find myself having to use calls to lower is in switch statements. Is there a way around that? (Other than obviously coding so that I always use lower case, etc…. You know something that doesn’t require me to be smarter : )

Dan-

Instead of using lower in the switch statement itself, you can reduce the burden by having all your switch cases be lower case, then simply lower only the input string before entering the switch statement, like this:


switch lower(method)
   case {'linear','bilinear'}
      disp('Method is linear')
   case 'cubic'
      disp('Method is cubic')
   case 'nearest'
      disp('Method is nearest')
   otherwise
      disp('Unknown method.')
end

But that might be what you meant already. I don’t know of a way to totally avoid the lower, but at least you don’t have to do strcmp(lower(…)) everywhere.

–Loren

Thanks Loren,
That’s what I am already doing, mostly because I lifted the technique out of one of TMW’s toolboxes. I was just wondering if there was a lovely little undocumented switchi out there, or something…
Dan

Hello,
Is there a way to search a vector of numbers for a smaller set of numbers. Say for example I have a vector with the following numbers:

x = [1 2 3 4 5 1 2 3 4 5 4 6 7 1 2 3 4 5];
‘ ‘
Now from this vector I want to know when [4 6 7] occurs or if it even occurs in that order, is there a way to do this?

Thanks,

-M. Zia

as it was mentioned over and over in ML’s NG CSSM, the prefix STR in STRFIND simply means a string (of bits) and does not imply a string of characters (in the end, every data type is represented in the computer’s memory as a boring string of 0s and 1s…)
hence
x=[1 2 3 4 5 1 2 3 4 5 4 6 7 pi 1 2 3 4 5];
ix=strfind(x,[4,6,7,pi])
% ix = 11
us

Hi,
For string comparisions , does using isequal instead of strcmpi or strncmp give any advantage in terms of speed?

Sj-

If you really want to be sure you are comparing strings, you should use the str* functions. isequal doesn’t care about class, so you would get the following to be true:


f = 'hello'
d = double(f)
isequal(f,d)
ans = 
     1
strcmp(f,d)
ans = 
     0

As a result, I think you can expect the string functions to be generally higher performance since no conversions take place.

–Loren

Loren,

Can you please comment on the speed of regexp and regexprep.
I am a perl user and work frequently with regular expressions for string manipulations. Matlab has all the necessary functions in place, but they seem to be quite slow.

Eric

Loren, In the past I have always found ‘ismemeber’ to a v_e_r_y slow function, so I was quite surprised to see your posting recommending its use. Maybe I’m missing something here? I tried 1000 iterations of the code you suggested and found ‘ismember’ to be a factor of 10 or more slower! (I’m running 7.3.0.298 (R2006b) on a 1.67 GHz PowerPC G4 under Mac OS X 10.4.8 with 2 GB of RAM. Would it matter that I’m on a Mac?)

pets = {‘cat';’dog';’dog';’dog';’giraffe';’hamster’};
species = {‘cat’ ‘dog’};

tic
for lp = 1:1000
[tf, loc] = ismember(pets, species);
end
toc

tic
for lp = 1:1000
locs = zeros(length(pets),1);
for k = 1:length(species)
tf = strcmp(pets, species(k));
locs(tf) = k;
end
end
toc

isequal( loc, locs )

% My results: ‘ismember’ first, then ‘strcmp’
%——————————
Elapsed time is 2.670689 seconds.
Elapsed time is 0.100176 seconds.

ans =

1

Your thoughts?

Eric-

What I wrote what I wrote, I was not focusing on performance. In addition, timing results depend on the computer architecture, whether or not MATLAB has a JIT there, and many more parameters. Also, timing depends heavily on the size of problem you pose. I personally am often (but not always) willing to live with lower performance for smaller inputs, provided there is enough benefit for large inputs. The reason I am not always willing to do this is because sometimes those smaller inputs occur in a loop and must be done a huge number of times.

The reason to recommend ismember for vectorizing is if that aspect of the code is helpful to people. Sometimes shorter code is more readable and maintainable, if not quite as fast.

Design trade-offs are hard to make. And they are situational.

–Loren

is there anyway to get the selected string from an editbox into the program workspace, without using CTRL C.

i meant getting a partially selected string , is there any getselected command beacuse “string” property shows entire string. how can we extract a partially selected string by the user.
let us assume a string in the edit box
st=”subject predicate noun”;
if user selects a “noun” inside editbox , and clicks a menu to perform some operation on this selection . how would program know “noun” has been selected out of “subject predicate noun”.
is there any getselected?

Sridhar-

There is nothing built into MATLAB for that. You’d need to write your own code for processing getting the data from the edit box and analyzing it.

–Loren

You are probably not still checking this, but this is the closest info I can find related to my question.

I am trying to compare 2 string arrays and return only the full strings (not sub strings) in common for both (comparing a list of 2 names to find the matching names). Is there a simple way to do this? I can match using single strings, but not the whole array.

Thanks for any help you can provide.

Phoebe-

If you are looking to see if 2 strings are equal, check out the functions in the strcmp or ismember families.

–Loren

Hi Loren,

Thanks to your suggestion of using ‘ismember’ instead of looping with ‘strcmp’ I managed to reduce processing time from 6 hours to 7 seconds. Thank you!

I have a question as to how to compare strings without case sensitivity. I am currently storing strings in an array and then I need to search this array to see of the option is there. I was using ismember but that only works if the user puts it in exactly how it is in the Array. Is there another function that does what ismember does but for an array of string elements.

Thanks!

These postings are the author's and don't necessarily represent the opinions of MathWorks.