# Unique Values Without Rearrangement 4

Posted by **Loren Shure**,

In MATLAB, the simplest form of the function `unique` returns the unique values contained in a numeric vector, with the results sorted. This is often acceptable, but sometimes
a user prefers the results in the order originally found in the data.

### Contents

### Algorithm for unique

The reason the results are sorted is because of the algorithm used by `unique`. Conceptually, the input data is sorted, and then adjacent elements are compared. If there are equal elements, all elements
except the first or the last are removed (depending on how you call the function). Hence, the output is sorted.

### Avoid the Sorted Output

To avoid the sorted output, you can simply `sort` the data first, retaining the indices from the sorting operation. Study the examples for `sort` to see how to use the second output of indices.

There were a couple of solutions posted with similar ideas, but different implementations. I'll walk you through the one
posted by Jan Simon. The idea Jan uses is to take the difference of the sorted results and find where the differences are not zero (i.e., they **are** different values). Create the correct indices for these now *unique* values in the logical vector `UV`. Finally use this set of logical indices to extract the required values from the original data. Notice that this solution
doesn't call the function `unique` and only calls the function `sort` one time.

### Code in Action

Let's create `X` and see what happens in the code.

```
myString = 'now is the time for cheering, tgif!';
X = double(myString)
```

X = Columns 1 through 13 110 111 119 32 105 115 32 116 104 101 32 116 105 Columns 14 through 26 109 101 32 102 111 114 32 99 104 101 101 114 105 Columns 27 through 35 110 103 44 32 116 103 105 102 33

You can see the data `X` is now sorted in `Xs` and `SortVec` tracks the original locations of the values.

[Xs, SortVec] = sort(X(:))

Xs = 32 32 32 32 32 32 33 44 99 101 101 101 101 102 102 103 103 104 104 105 105 105 105 109 110 110 111 111 114 114 115 116 116 116 119 SortVec = 4 7 11 16 20 30 35 29 21 10 15 23 24 17 34 28 32 9 22 5 13 26 33 14 1 27 2 18 19 25 6 8 12 31 3

Now place the unique values (when `diff` isn't 0) into a logical vector according to the sorting.

UV(SortVec) = ([1; diff(Xs)] ~= 0)

UV = Columns 1 through 13 1 1 1 1 1 1 0 1 1 1 0 0 0 Columns 14 through 26 1 0 0 1 0 1 0 1 0 0 0 0 0 Columns 27 through 35 0 1 1 0 0 0 0 0 1

Use the logical vector to re-scramble the sorting that occurred with the original data.

Y = X(UV)

Y = Columns 1 through 13 110 111 119 32 105 115 116 104 101 109 102 114 99 Columns 14 through 16 103 44 33

finalString = char(Y)

finalString = now isthemfrcg,!

### Do You Unique Data Values Unsorted?

Do you need unsorted unique values as part of your data processing? I'd love to hear more here. In the meantime, perhaps you could create a cryptic signature of the day by running your thoughts through this algorithm!

Get the MATLAB code

Published with MATLAB® 7.9

## 4 CommentsOldest to Newest

**1**of 4

My two cents and lines:

[~,ix]=unique(myString, 'first'); finalString=myString(sort(ix))

**2**of 4

Sorry, the two cents just got devaluated. Should have read the discussion first…

**3**of 4

Your technique is clever, but it is difficult to follow at a glance. You’ve kind of left to the reader several important details. Like:

1. Padding the output of diff with the leading 1.

2. The details of how the chained use of SortVec & UV assures the order is related to the original order that we want.

Also, won’t the technique work properly with the string left in the character class? e.g., why was it necessary to convert the characters to numbers?

**4**of 4

OysterEngineer-

The first value is unique since there are no other values to compare to, that’s why the leading 1.

I didn’t explain the details of the SortVec and UV because I think it helps to study the help and examples for sort/sortrows to understand and I didn’t feel like reproducing that here.

I converted to double since I have the Symbolic Toolbox on my path and there is an overloaded diff for the char datatype there that produces a derivative, which I didn’t want.

–Loren

## Recent Comments