Find duplicate entries and sum up their associated values then put everything back in a table

19 views (last 30 days)
Hello everyone,
I have an excel sheet with two columns. The first column is a list of names, and the second column is the score of each person. Some names have multiple score and I want to find duplicate entries and sum up their associated values then put everything back in a table. Basically clean up the duplicate but want the sum of the scores. The excel file is attached as well
I would really appreciate if you have a script or a function can achieve the results.
Here where I’m at
Data = readtable("Data.xlsx")
unique(Data.Names)
[C,ia,idx] = unique(Data(:,1),'stable');
val = accumarray(idx,Data(:,2),[],@sum);
mat = [C val]

Accepted Answer

dpb
dpb on 8 Jul 2021
tData = readtable("Data.xlsx");
tData.Properties.VariableNames={'Name','Score'}; % set a known set of variable names
tSum=rowfun(@sum,tData,'InputVariables','Score','GrouptingVariables','Name');
writetable(tSum,'PickAnOutputFileName.xls')

More Answers (1)

Cris LaPierre
Cris LaPierre on 8 Jul 2021
It gets a bit trickly since you don't want to change the order of the names, but I think you could do this with groupsummary.
% Load the data and add a column for keeping track of the original order
Data = readtable("AladdinData.xlsx");
Data.Order = (1:height(Data))'
Data = 50×3 table
Names Scores Order _____________________ ______ _____ {'Daniel Nelson' } 78 1 {'Henry Cunningham' } 28 2 {'Adison Dixon' } 41 3 {'Haris Carroll' } 15 4 {'Gianna Jones' } 3 5 {'Ted Alexander' } 64 6 {'Connie Hill' } 43 7 {'Abigail Clark' } 7 8 {'Nicole Brooks' } 2 9 {'Heather Tucker' } 67 10 {'Alen Crawford' } 97 11 {'Dexter Carroll' } 32 12 {'Elian Thompson' } 9 13 {'Nicholas Campbell'} 35 14 {'Alina Davis' } 72 15 {'Adelaide Gibson' } 57 16
% Use groupsummary to find sum and min of scores and order for each unique name
sumTbl = groupsummary(Data,"Names",["sum","min"],["Scores" "Order"])
sumTbl = 18×6 table
Names GroupCount sum_Scores min_Scores sum_Order min_Order _____________________ __________ __________ __________ _________ _________ {'Abigail Clark' } 3 135 7 61 8 {'Adelaide Gibson' } 3 160 45 103 16 {'Adison Dixon' } 4 242 41 107 3 {'Alen Crawford' } 2 101 4 43 11 {'Alina Davis' } 2 76 4 51 15 {'Connie Hill' } 4 81 4 107 7 {'Daniel Nelson' } 2 103 25 44 1 {'Dexter Carroll' } 2 89 32 45 12 {'Elian Thompson' } 2 95 9 47 13 {'Elian Wells' } 2 130 45 55 17 {'Gianna Jones' } 4 122 3 115 5 {'Haris Carroll' } 4 181 15 111 4 {'Heather Tucker' } 3 112 13 67 10 {'Henry Cunningham' } 2 128 28 46 2 {'Leonardo Elliott' } 2 46 9 57 18 {'Nicholas Campbell'} 2 74 35 49 14
% sort the results, which are alphabetical, so they are back in the original order
sumTbl = sortrows(sumTbl,"min_Order");
% create a new table with just Names and total scores
finalData = sumTbl(:,["Names","sum_Scores"]);
% rename 'sum_Scores' back to 'Scores'
finalData.Properties.VariableNames(2) = "Scores"
finalData = 18×2 table
Names Scores _____________________ ______ {'Daniel Nelson' } 103 {'Henry Cunningham' } 128 {'Adison Dixon' } 242 {'Haris Carroll' } 181 {'Gianna Jones' } 122 {'Ted Alexander' } 190 {'Connie Hill' } 81 {'Abigail Clark' } 135 {'Nicole Brooks' } 105 {'Heather Tucker' } 112 {'Alen Crawford' } 101 {'Dexter Carroll' } 89 {'Elian Thompson' } 95 {'Nicholas Campbell'} 74 {'Alina Davis' } 76 {'Adelaide Gibson' } 160
  3 Comments
dpb
dpb on 8 Jul 2021
I didn't think of groupsummary, Chris. Good thinking. Was keeping the order an input requirement? I missed it if it was. I've done same as you before several times; have one humongous spreadsheet that has to get updated piecemeal -- it has a (hidden) column that contains =ROW() precisely for the purpose.
Cris LaPierre
Cris LaPierre on 8 Jul 2021
I had interpreted it as having to be in the same order, but on a re-read, that is not stated. If not, it gets much simpler.
% Load the data and add a column for keeping track of the original order
Data = readtable("AladdinData.xlsx");
% Use groupsummary to find sum scores for each unique name
finalData = groupsummary(Data,"Names","sum")
This keeps the GroupCount information, but that's not bad info to have either.

Sign in to comment.

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!