splitapply doesn't split well into bins

לק"י
Hi guys,
I wanted splitapply command to split to 90 different bins. somewhy it returns only 50.
Here is the process I made:
First, 'cell1areas' (size - 18800X1) - a variable that contains vector of areas was loaded.
then 'bins' or 'groups' from 0 to 90000 in 1000 spacing was created in 'edges' variable.
after that, discretize function was applied to the area vector data. the max value of the variable dis is 62 (max(dis)).
valid function was apllied to check rather the data is a number or NaN.
last, splitapply function was called with @sum to sum all values for each group.
The problem is, that the spltsum variable have 50 'bins' or vector elements in it, instead of the desired 90 (which is the number of bins in edges) or even 62(!) like the discretize gave only 62 different numbers and not 90.
Thanks in advace, this community is great and really helpfull!
the code:
edges=[0 0:1000:90000 90000];
dis=discretize(cell1areas, edges);
valid=isfinite(cell1areas);
spltsum=splitapply(@sum , cell1areas(valid) , findgroups(dis(valid)) );

 Accepted Answer

Matt J
Matt J on 11 Oct 2021
Edited: Matt J on 13 Oct 2021
You can use accumarray instead.
spltsum=accumarray(dis(valid), cell1areas(valid) , [90,1]);

5 Comments

לק"י
Hi matt, thanks alot!
i got this error:
>> spltsum=accumarray(dis(valid), cell1areas(valid) , [90,1]);
Error using accumarray
Second input VAL must be a vector with one element for each row in SUBS, or
a scalar.
both dis and cell1areas are the same length but more than that, I couldn't realise what could be the problem.
thanks!
לק"י
Hi matt! I looked up the problem in the forums and saw that someone suggested to transpose the vectors. i did and it worked. but my question is rather it is still good (valid)?
I only added ' to the first 2 elements:
spltsum=accumarray(dis(valid)', cell1areas(valid)' , [90,1]);
Thanks!
Amit.
Matt J
Matt J on 13 Oct 2021
Edited: Matt J on 13 Oct 2021
Hi matt! I looked up the problem in the forums and saw that someone suggested to transpose the vectors. i did and it worked. but my question is rather it is still good (valid)?
The inputs to accumarray should indeed be column vectors. Since you said in your post that cell1areas was a column vector (18800X1), the transpose really shouldn't have been necessary. If it was a row vector, then everything makes sense.
לק"י
thanks!
and another (last) one, I want the data to be splitted in bins defined by:
edges=[0 0:1000:90000 90000];
but as far as I understand the acuumarray arbitrary devides the data into 90 bins without paying attention to the length of the bins required (because of the last argument, [90,1]). is it true?
spltsum=accumarray(dis(valid), cell1areas(valid) , [90,1]);
if so, I need a way that the data will be splitted by the edges vector alone.
or to put it in other words:
I assume accumarray only sums up each value in cell1area that has the same 'bin' (value of bin as an integer).
the binning of cell1area is done primarily by discretize function (dis variable in this example).
accumarray only sums up all the values in cell1area that has the same binnig (by the dis function).
if so, why should I mention in the accumarray function the [90,1] vector/variable. it should know that I want 90 bins that are separated from each other by 1000 untill the value 90000, not arbitrary values that matlab thinks suites to devide the data I give it.
thanks!
Not all 90 bins contain counts. If you don't tell accumarray how many bins you have, it will assume you only have max(dis(valid)) bins.

Sign in to comment.

More Answers (0)

Asked:

on 11 Oct 2021

Commented:

on 13 Oct 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!