Count unique ID occurrences in a column after grouping based on another column

Hi all,
I'm wondering if exists a way using varfun and unique or something else to solve the following problem replacing the 'for' loop.
I have a table made of two columns (ID, plane). I would like to get unique plane identifiers and, for each of them, sum the number of unique IDs in the first column.
For instance, given the table:
1 plane5
1 plane5
2 plane5
1 plane8
1 plane8
3 plane6
I'd like to get the result:
2 plane5
1 plane8
1 plane6
I've done this with the following code, but I think there is a way to avoid the for loop
unqPlanes = unique(data.Planes);
result = table(zeros(length(unqPlanes),1), unqPlanes);
for i = 1:length(unqPlanes)
result{i,1} = height(unique(data(data.Planes==unqValue(i,1),'ID')));
end
Any suggestion? Thanks

Answers (1)

>> C = {'plane5';'plane5';'plane5';'plane8';'plane8';'plane6'};
>> V = [ 1; 1; 2; 1; 1; 3];
>> [U,~,X] = unique(C);
>> accumarray(V,X,[],@(v)numel(unique(v)))
ans =
2
1
1
>> U
U =
'plane5'
'plane6'
'plane8'

10 Comments

Or since the input is a table:
rowfun(@(v) numel(unique(v)), data, 'GroupingVariables', 'plane')
So referring to the example, the size of my dataset is 1397032. The number of distinct planes is 11203. What I expect at the end is a result with a number of records equal to the same number of distinct planes but I don't (358538). What I notice is that there are a lot of zeros... Should I do some post processing actions?
@Savino Dambra: please show us the code that you tried. Getting zeros is very strange, as the minimum number of unique values should be one.
>> [unqValueA,~,unqValueC] = unique(data.Planes);
>> accumarray(data.ID,unqValueC,[],@(v)numel(unique(v)));
>> length(ans)
ans =
358538
>> length(unqValueA)
ans =
11203
>>length(data.Planes)
ans =
1397032
>>length(data.ID)
ans =
1397032
@Savino Dambra: do NOT use length, as its output is basically useless (it changes which dimension it measures, depending on the input data size. Ugh! Using size or numel is much more robust).
Please do the following:
  • allocate to an output variable, e.g. out = accumarray(...);
  • use size and show us the sizes of all of those variables.
  • upload data.Planes and data.ID in one .mat file (as two separate variables please. I can't open tables on my version, so don't bother with a table).
The results are exactly the same, besides the fact that size also lists the number of columns (1 in all the cases)
>> size(data.Planes)
ans =
1397032 1
>> size(data.ID)
ans =
1397032 1
>> size(out)
ans =
358538 1
>> size(unqValueA)
ans =
11203 1
>> size(unqValueC)
ans =
1397032 1
Does the rowfun version I suggested also return an incorrect result?
Note that it is pretty much equivalent to Stephen's answer. rowfun in effect performs the unique and accumarray for you.
I'm with Stephen on length. I think it should have been retired ages ago. Note that for tables instead of
length(data.ID)
you can use
height(data)
Since by necessity all variables of a table have the same height/length, there's no point querying for both variables.
This answer shows me an error of too many input arguments.
If you're referring to my rowfun suggestion, then you'll get a too many input arguments error if there are other variables than ID and plane in the table. It's easily fixed:
rowfun(@(v) numel(unique(v)), data, 'GroupingVariables', 'plane', 'InputVariables', 'ID')
It works perfectly! Thank you... Maybe the table format does not work properly with accumarray?
Ps: If you want to put your code in a different answer I can accept it. Thanks to everyone

Sign in to comment.

Categories

Asked:

on 7 Sep 2018

Edited:

on 10 Sep 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!