Loren on the Art of MATLAB

Unique Values Without Rearrangement 4

Posted by Loren Shure,

In MATLAB, the simplest form of the function unique returns the unique values contained in a numeric vector, with the results sorted. This is often acceptable, but sometimes a user prefers the results in the order originally found in the data.

Contents

Algorithm for unique

The reason the results are sorted is because of the algorithm used by unique. Conceptually, the input data is sorted, and then adjacent elements are compared. If there are equal elements, all elements except the first or the last are removed (depending on how you call the function). Hence, the output is sorted.

Avoid the Sorted Output

To avoid the sorted output, you can simply sort the data first, retaining the indices from the sorting operation. Study the examples for sort to see how to use the second output of indices.

There were a couple of solutions posted with similar ideas, but different implementations. I'll walk you through the one posted by Jan Simon. The idea Jan uses is to take the difference of the sorted results and find where the differences are not zero (i.e., they are different values). Create the correct indices for these now unique values in the logical vector UV. Finally use this set of logical indices to extract the required values from the original data. Notice that this solution doesn't call the function unique and only calls the function sort one time.

Code in Action

Let's create X and see what happens in the code.

myString = 'now is the time for cheering, tgif!';
X = double(myString)
X =
  Columns 1 through 13
   110   111   119    32   105   115    32   116   104   101    32   116   105
  Columns 14 through 26
   109   101    32   102   111   114    32    99   104   101   101   114   105
  Columns 27 through 35
   110   103    44    32   116   103   105   102    33

You can see the data X is now sorted in Xs and SortVec tracks the original locations of the values.

[Xs, SortVec] = sort(X(:))
Xs =
    32
    32
    32
    32
    32
    32
    33
    44
    99
   101
   101
   101
   101
   102
   102
   103
   103
   104
   104
   105
   105
   105
   105
   109
   110
   110
   111
   111
   114
   114
   115
   116
   116
   116
   119
SortVec =
     4
     7
    11
    16
    20
    30
    35
    29
    21
    10
    15
    23
    24
    17
    34
    28
    32
     9
    22
     5
    13
    26
    33
    14
     1
    27
     2
    18
    19
    25
     6
     8
    12
    31
     3

Now place the unique values (when diff isn't 0) into a logical vector according to the sorting.

UV(SortVec) = ([1; diff(Xs)] ~= 0)
UV =
  Columns 1 through 13
     1     1     1     1     1     1     0     1     1     1     0     0     0
  Columns 14 through 26
     1     0     0     1     0     1     0     1     0     0     0     0     0
  Columns 27 through 35
     0     1     1     0     0     0     0     0     1

Use the logical vector to re-scramble the sorting that occurred with the original data.

Y = X(UV)
Y =
  Columns 1 through 13
   110   111   119    32   105   115   116   104   101   109   102   114    99
  Columns 14 through 16
   103    44    33
finalString = char(Y)
finalString =
now isthemfrcg,!

Do You Unique Data Values Unsorted?

Do you need unsorted unique values as part of your data processing? I'd love to hear more here. In the meantime, perhaps you could create a cryptic signature of the day by running your thoughts through this algorithm!


Get the MATLAB code

Published with MATLAB® 7.9

4 CommentsOldest to Newest

Your technique is clever, but it is difficult to follow at a glance. You’ve kind of left to the reader several important details. Like:

1. Padding the output of diff with the leading 1.

2. The details of how the chained use of SortVec & UV assures the order is related to the original order that we want.

Also, won’t the technique work properly with the string left in the character class? e.g., why was it necessary to convert the characters to numbers?

OysterEngineer-

The first value is unique since there are no other values to compare to, that’s why the leading 1.

I didn’t explain the details of the SortVec and UV because I think it helps to study the help and examples for sort/sortrows to understand and I didn’t feel like reproducing that here.

I converted to double since I have the Symbolic Toolbox on my path and there is an overloaded diff for the char datatype there that produces a derivative, which I didn’t want.

–Loren

These postings are the author's and don't necessarily represent the opinions of MathWorks.