Code covered by the BSD License

### Highlights from Determine and count unique values of an array

4.72727
4.7 | 12 ratings Rate this file 15 Downloads (last 30 days) File Size: 2.51 KB File ID: #23333 Version: 1.5

# Determine and count unique values of an array

### Anthony Kendall (view profile)

17 Mar 2009 (Updated )

Very fast function to determine and count unique values of numeric, logical, char, cell arrays.

File Information
Description

This function determines the unique values of an N-D array, and counts instances of those values using MATLAB's accumarray function, or in cases where accumarray is inappropriate the more traditional sort-diff method is used.

Its primary use is to very quickly count the number of instances of unique values within an array.

However, if only returning unique values (and not counts), it is slightly faster than MATLAB's built-in 'unique' function for arrays of intermediate to large sizes. This is particularly true for integer-valued arrays (not necessarily just integer type). For float arrays, the speed increase is due mainly to fewer input validation tests and other options. Unlike 'unique' it does not have 'rows', 'first', or 'last' options.

It returns the unique values in a sorted vector, and counts of those values are in the same order.

- Usage:
>> [uniques] = count_unique(largeArray);
>> [uniques,numUnique] = count_unique(largeArray);

- Example (a rather trivial example):
>> a = repmat((1:1000),1000,1); %build an unsorted array of values
>> [uniques,numUnique] = count_unique(a);
>> size(uniques)
ans =
1000 1
>> numUnique(1:10)
ans =
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000

- Example (a less trivial example):
>> a = floor(rand(1e6,1)*1e6);
>> [uniques,numUnique] = count_unique(a);
>>uniques(1:10),numUnique(1:10)
ans =
0
1
4
5
7
9
11
12
15
19

ans =
2
2
2
1
2
3
1
1
1
1

For returning only unique values:
- Speed Comparison (integer valued):
>> a = floor(rand(1e7,1)*1e6);
>> tic;[uniques] = count_unique(a);toc %count_unique
Elapsed time is 0.603863 seconds.
>>tic;[uniques] = unique(a);toc %built-in MATLAB
Elapsed time is 2.022784 seconds.

- Speed Comparison (decimal valued):
>>a = rand(1e7,1)*1e6;
>>tic;[uniques] = count_unique(a,'float');toc
Elapsed time is 2.159629 seconds.
>>tic;[uniques] = unique(a);toc
Elapsed time is 2.123154 seconds.

MATLAB release MATLAB 7.7 (R2008b)
Other requirements Requires 'accumarray' function, present in recent MATLAB versions.
02 Jul 2016 Manolis Trypakis

25 Oct 2014 Ege

### Ege (view profile)

Is there a way to use this on categorical data?

Comment only
25 Oct 2014 Ege

### Ege (view profile)

When I try this on tables like
[uniques,numUnique] = count_unique(Table(:,1));
I get error You can not subscript a table using only one subscript. Table subscripting requires both row and variable subscripts. Am I using it wrong?

04 Feb 2014 Jason Nicholson

### Jason Nicholson (view profile)

I have used this function over and over. I have referred to many times when solving different problems. This is great work! I am very grateful for your posting of count_unique!

04 Jan 2013 Jason Nicholson

### Jason Nicholson (view profile)

LIne 130 reads: uniqueLocs = [true;~strcmp(x(1:end-1),x(2:end)) ~= 0]

isn't this the same?

uniqueLocs = [true;~strcmp(x(1:end-1),x(2:end))]

12 Oct 2012 Steven

### Steven (view profile)

thank you!

09 Dec 2011 Daniel

### Daniel (view profile)

I haven't actually tried this yet, but how about using the 'sparse' parameter in accumarray? And why not add a 'dim' parameter to your function...I think it's still possible to do this with accumarray.

Comment only
30 Nov 2011 Jhay

### Jhay (view profile)

How to count elements in a row?
For Example: [0 0 0 1 1 1 1 0 0 0 1 1 0 1 ]
I need to find the position of 1's in a row and count it seperately. Can some1 suggest me solution?

16 Feb 2011 Tyson

### Tyson (view profile)

I was wondering if the code could be modified to accept a 'rows' option, like the unique script? I have x,y,z data that can have more than one column identical. Thanks!

Comment only
11 Sep 2010 Alexander

### Alexander (view profile)

Thanks. Nice code. Saves a lot of time over "unique" in counting unique integer values from a relatively large matrix. I didn't try it on integer values.

Comment only
05 Aug 2010 Warwick

### Warwick (view profile)

This is a useful script for me. Just what I needed at the time.

16 Apr 2010 Anthony Kendall

### Anthony Kendall (view profile)

Matt - The N that you refer to is simply the second output of this routine, you just need to request it.

Kristi - Thanks! That helped speed up large integer arrays considerably. I've updated the file to reflect your suggestions.

Comment only
15 Apr 2010 Kristi

### Kristi (view profile)

Inside int_log_unique, instead of maxVal you should calculate the size of the accumarray output matrix as
maxIndex = max(x(:)) - min(x(:)) + 1;
since x may have all negative integers.

Also, the int_log_unique code might be slightly faster if you don't make the second call to accumarray, and you should always subtract out the minVal to make accumarray more efficient.

function [uniques,numUnique] = int_log_unique(x,nOut)
%First, determine the offset for negative values
minVal = min(x(:));
%Check to see if accumarray is appropriate for this function
maxIndex = max(x(:)) - minVal + 1;

if maxIndex / numel(x) > 1000
error('Accumarray is inefficient for arrays when ind values are >> than the number of elements')
end

%Now, offset to get the index
index = x(:) - minVal + 1;

%Count the occurrences of each index value
numUnique = accumarray(index,1);

%Get the values which occur more than once
z=1:length(numUnique);
uniques = z(numUnique>0) + minVal - 1;

if nOut == 2
%Trim the numUnique array
numUnique = numUnique(numUnique>0);
end
end

Comment only
14 Dec 2009 Matt Fetterman

### Matt Fetterman (view profile)

THanks very good program ! I might suggest to add another output N so that N is a vector listing the number of repetitions of each output number. N can be easily found from your data but still it might be convenient to have.

08 Apr 2009 dave matthews

### dave matthews (view profile)

it would be nice to extend this to the n-d case, or at least the 2-d case, i.e., find and list unique pairs?

x = [1 2; 1 3; 1 2];
then, uniques = [1 2], [1 3];
and, numuniques = [2 1];

thanks

08 Apr 2009 Ilya

### Ilya (view profile)

21 Mar 2009 Anthony Kendall

### Anthony Kendall (view profile)

Bruno,
Thanks for making this point. This can be addressed by specifying the 'float' option in count_unique. However, I've added also added a check that when the maximum value of the input array is much greater than its number of elements, accumarray is not used.

I ran your example on the unmodified code, and got results similar to yours. Now, on the modified code (without specifying the 'float' option, which does not use accumarray), the run times are both trivial in your example. The change increased run times for large array integer values slightly, and the description above is modified to reflect that.

Thanks again.

Comment only
21 Mar 2009 Bruno Luong

### Bruno Luong (view profile)

Inside the code, the author uses ACCUMARRAY on the values of the input array. This is a poor choice because it might create a large chunk of memory and slow down greatly.

Here is a code to check this effect. On my computer the worse time is 2.5 s, and unique can accomplish in less than 1 ms.

power=1:64;
time=zeros(length(power),2);

for k=1:length(power)

twoelem=[0 2.^power(k)-1];
disp(twoelem)

tic
count_unique(twoelem);
time(k,1) = toc;

tic
unique(twoelem);
time(k,2) = toc;
end
fprintf('worse time = %g s\n', max(time,[],1))
plot(power,time);
ylabel('second');
title('Time for array of two elements')
legend('count unique','unique')

18 Mar 2009 John D'Errico

### John D'Errico (view profile)

Ok. That was quick. This is looking better now. As a potential downloader, the description gives me some reason to use it, especially if I'm looking to squeeze some speed out of a tight piece of code.

A look inside the code is always interesting. I like to see when the author has taken care in the code. Friendly code worries about things like the presence of nans, infs, different variable classes, etc. Also lots of comments. White space. Intelligently named, mnemonic variable names. These things make any piece of code an easy read, and easy to debug in the future.

Good help, examples. etc. All well done. Thanks for the changes.

18 Mar 2009 Anthony Kendall

### Anthony Kendall (view profile)

John,
The code is primarily used to count the number of each unique instance within an array, but as a side benefit is slightly faster than the existing unique function (and quite a bit faster for integer-valued arrays). I'll modify the description to make this more obvious.

Comment only
18 Mar 2009 John D'Errico

### John D'Errico (view profile)

I'm confused. Even the author admits this is only slightly faster than the existing unique, and that the main reason is probably because it does not bother to check the arguments. He also admits that it has considerably less capability than the existing unique.

Anyone truly in need of the maximum speed is better off with a direct call to accumarray anyway, which will further reduce any overhead.

Why does this exist? Perhaps some time comparisons might be of value, and some examples that show why this is a better choice than the existing alternatives.

The code does look to be carefully written at a glance, and the help seems reasonable, were I to rate this.

Comment only
18 Mar 2009 1.1

An improvement to make code more general.

18 Mar 2009 1.2

Added more thorough description and examples, in response to user comment.

18 Mar 2009 1.3

Further modifying description, correcting an error.

21 Mar 2009 1.4

Added check for arrays with large values but few elements, which are not well-suited to accumarray, at suggestion of user.

16 Apr 2010 1.5

Speed improvement for large integer arrays.