Count unique ID occurrences in a column after grouping based on another column
Show older comments
Hi all,
I'm wondering if exists a way using varfun and unique or something else to solve the following problem replacing the 'for' loop.
I have a table made of two columns (ID, plane). I would like to get unique plane identifiers and, for each of them, sum the number of unique IDs in the first column.
For instance, given the table:
1 plane5
1 plane5
2 plane5
1 plane8
1 plane8
3 plane6
I'd like to get the result:
2 plane5
1 plane8
1 plane6
I've done this with the following code, but I think there is a way to avoid the for loop
unqPlanes = unique(data.Planes);
result = table(zeros(length(unqPlanes),1), unqPlanes);
for i = 1:length(unqPlanes)
result{i,1} = height(unique(data(data.Planes==unqValue(i,1),'ID')));
end
Any suggestion? Thanks
Answers (1)
Stephen23
on 10 Sep 2018
>> C = {'plane5';'plane5';'plane5';'plane8';'plane8';'plane6'};
>> V = [ 1; 1; 2; 1; 1; 3];
>> [U,~,X] = unique(C);
>> accumarray(V,X,[],@(v)numel(unique(v)))
ans =
2
1
1
>> U
U =
'plane5'
'plane6'
'plane8'
10 Comments
Or since the input is a table:
rowfun(@(v) numel(unique(v)), data, 'GroupingVariables', 'plane')
Savino Dambra
on 10 Sep 2018
Savino Dambra
on 10 Sep 2018
@Savino Dambra: do NOT use length, as its output is basically useless (it changes which dimension it measures, depending on the input data size. Ugh! Using size or numel is much more robust).
Please do the following:
- allocate to an output variable, e.g. out = accumarray(...);
- use size and show us the sizes of all of those variables.
- upload data.Planes and data.ID in one .mat file (as two separate variables please. I can't open tables on my version, so don't bother with a table).
Savino Dambra
on 10 Sep 2018
Guillaume
on 10 Sep 2018
Does the rowfun version I suggested also return an incorrect result?
Note that it is pretty much equivalent to Stephen's answer. rowfun in effect performs the unique and accumarray for you.
I'm with Stephen on length. I think it should have been retired ages ago. Note that for tables instead of
length(data.ID)
you can use
height(data)
Since by necessity all variables of a table have the same height/length, there's no point querying for both variables.
Savino Dambra
on 10 Sep 2018
Guillaume
on 10 Sep 2018
If you're referring to my rowfun suggestion, then you'll get a too many input arguments error if there are other variables than ID and plane in the table. It's easily fixed:
rowfun(@(v) numel(unique(v)), data, 'GroupingVariables', 'plane', 'InputVariables', 'ID')
Savino Dambra
on 10 Sep 2018
Edited: Savino Dambra
on 10 Sep 2018
Categories
Find more on Matrices and Arrays in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!