aggregate data of a dataset

 Accepted Answer

check out splitapply. You may need to change the format of your data, but it does exactly what you want:
G = findgroups(ds.seats);
mean_dist = splitapply(@mean,ds.score,G);
Switching to tables is probably a good idea:
ds = readtable("datasetT.csv");

17 Comments

is there not a aggregate funktion like in R?
I'm not familiar with R, but (based on a little googling of R's aggregate function) it looks like splitapply does basically the same thing, just with a little less in the way of wrapping. Look at the documentation for examples.
Thanks Sindar
@sindar I doesnt work. I just get 2 rows with NaN :/
Okay I did dataset2table. That worked out. Now I have a table
but
splitapply
didn't work.
Do you know why? Now I know it's not because of dataset.
Most likely, you have NaN's in your data. Sounds like you'll need to do some extra work (but, this will help in the future). First, try using the import tool: https://www.mathworks.com/help/matlab/ref/importtool-app.html
This should allow you to figure out why readtable isn't working. Once everything looks good, you can generate code using the arrow just under "import selection"
Then, look here for how to handle missing data (that produced those nans). Some can be done during import, too. https://www.mathworks.com/help/matlab/data_analysis/missing-data-in-matlab.html
Try this to replace any missing values with 0:
fillmissing(ds,'constant',0)
Did you try it out with my table?
I didn't get your last comment :(
YOu can look at my table I dont have missing values
Sindar
Sindar on 16 Feb 2020
Edited: Sindar on 16 Feb 2020
I hadn't tried before, but this works:
ds=readtable('datasetT.xlsx');
G = findgroups(ds.Seat);
mean_dist = splitapply(@mean,ds.score,G);
mean_dist =
3.4286
3.7576
There don't seem to be any missing values or issues with readtable
Oh okay I found it
Megan
Megan on 16 Feb 2020
Edited: Megan on 16 Feb 2020
I added a short version of my dataset here. in my original one I have NaN... Sorry :/
fillmissing(ds,'constant',0)
This is not working.
Error using fillmissing/checkArrayType (line 522)
Invalid fill constant type.
Error in fillmissing/fillTableVar (line 166)
[intConstVj,extMethodVj] = checkArrayType(Avj,intMethod,intConstVj,extMethodVj,x,true);
Error in fillmissing/fillTable (line 144)
B.(vj) =
fillTableVar(indVj,A.(vj),intMethod,intConst,extMethod,x,useJthFillConstant,useJthExtrapConstant);
Error in fillmissing (line 127)
B = fillTable(A,intM,intConstOrWinSize,extM,x,dataVars);
Sorry, I haven't actually used fillmissing much, so I'm not sure what's up. Regardless, I realized removing rows with missing entries is probably better for your purpose:
ds=readtable('datasetT.xlsx');
clean_ds = rmmissing(ds);
G = findgroups(clean_ds.Seat);
mean_dist = splitapply(@mean,clean_ds.score,G);
That worked out well Thanks!!!
One last question: now I have two rows with mean values.
How can I know which row is which seat number?
Look at the second output from findgroups:
[G,G_seat] = findgroups(clean_ds.Seat);
At the end, you can make a summary table:
sum_table = table(G_seat,mean_dist)

Sign in to comment.

More Answers (0)

Asked:

on 16 Feb 2020

Edited:

on 19 Feb 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!