Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
average if: average values in column if values in other colum are equal

Subject: average if: average values in column if values in other colum are equal

From: Daphne

Date: 15 Jul, 2011 16:43:26

Message: 1 of 14


I would like to average values in column 2 of a matrix that correspond to equal (and unique) values in column 1, a simple example:

a= [1.0000 0.5000
    1.0000 0.4500
    1.0000 0.6000
    2.0000 1.0000
    2.0000 1.2000
    3.0000 3.0000
    3.0000 3.5000
    3.0000 3.0000];

should become

b = [1.0000 0.5167
    2.0000 1.1000
    3.0000 3.1667];

is there a non-loop way to do this, for matrices with >100k elements?...
I have tried to do logicals in a loop, but these take too long (example of 126 sec on 8653 calls from the profiler). My try:

unique_a = unique(a(:,1));
b = NaN(size(unique_a,1),2);
for unique_ind = 1:size(unique_a,1)
  data2avg = a(a(:,1) == unique_a(unique_ind),2);
  b(unique_ind,:) = [unique_a(unique_ind),mean(data2avg) ];
end

Any ideas on how to improve on this or do this better?

Thanks,
Daphne

Subject: average if: average values in column if values in other colum are equal

From: Matt J

Date: 15 Jul, 2011 18:52:07

Message: 2 of 14

"Daphne" <daphnew_too_nospam@yahoo.com> wrote in message <ivpqne$cbd$1@newscl01ah.mathworks.com>...
>
> I would like to average values in column 2 of a matrix that correspond to equal (and unique) values in column 1, a simple example:
================

 accumarray(a(:,1),a(:,2))./accumarray(a(:,1),1)

Subject: average if: average values in column if values in other colum

From: dpb

Date: 15 Jul, 2011 19:01:33

Message: 3 of 14

On 7/15/2011 11:43 AM, Daphne wrote:
>
> I would like to average values in column 2 of a matrix that correspond
> to equal (and unique) values in column 1, a simple example:
>
> a= [1.0000 0.5000
> 1.0000 0.4500
> 1.0000 0.6000
> 2.0000 1.0000
> 2.0000 1.2000
> 3.0000 3.0000
> 3.0000 3.5000
> 3.0000 3.0000];
>
> should become
> b = [1.0000 0.5167
> 2.0000 1.1000
> 3.0000 3.1667];
>
> is there a non-loop way to do this, for matrices with >100k elements?...
> I have tried to do logicals in a loop, but these take too long (example
> of 126 sec on 8653 calls from the profiler). My try:
> unique_a = unique(a(:,1));
> b = NaN(size(unique_a,1),2);
> for unique_ind = 1:size(unique_a,1)
> data2avg = a(a(:,1) == unique_a(unique_ind),2);
> b(unique_ind,:) = [unique_a(unique_ind),mean(data2avg) ];
> end
>
> Any ideas on how to improve on this or do this better?
...

Not w/o a loop, but preallocating result and cleaning up slightly may
help some...

 >> u=unique(a(:,1));
 >> b=zeros(length(u),2);
 >> b(:,1)=u;
 >> for idx=1:length(u)
      b(idx,2)=mean(a(a(:,1)==u(idx),2));
    end
 >> b
b =
     1.0000 0.5167
     2.0000 1.1000
     3.0000 3.1667
 >>

--

Subject: average if: average values in column if values in other colum are equal

From: Roger Stafford

Date: 15 Jul, 2011 20:13:28

Message: 4 of 14

"Daphne" <daphnew_too_nospam@yahoo.com> wrote in message <ivpqne$cbd$1@newscl01ah.mathworks.com>...
>
> I would like to average values in column 2 of a matrix that correspond to equal (and unique) values in column 1, a simple example:
>
> a= [1.0000 0.5000
> 1.0000 0.4500
> 1.0000 0.6000
> 2.0000 1.0000
> 2.0000 1.2000
> 3.0000 3.0000
> 3.0000 3.5000
> 3.0000 3.0000];
>
> should become
>
> b = [1.0000 0.5167
> 2.0000 1.1000
> 3.0000 3.1667];
>
> is there a non-loop way to do this, for matrices with >100k elements?...
> I have tried to do logicals in a loop, but these take too long (example of 126 sec on 8653 calls from the profiler). My try:
>
> unique_a = unique(a(:,1));
> b = NaN(size(unique_a,1),2);
> for unique_ind = 1:size(unique_a,1)
> data2avg = a(a(:,1) == unique_a(unique_ind),2);
> b(unique_ind,:) = [unique_a(unique_ind),mean(data2avg) ];
> end
>
> Any ideas on how to improve on this or do this better?
>
> Thanks,
> Daphne
- - - - - - - - -
  If elements of a(:,1) are not necessarily indices, do this:

 [u,~,n] = unique(a(:,1));
 sz = [size(u,1),1];
 b = [u,accumarray(n,a(:,2),sz)./accumarray(n,1,sz)];

Roger Stafford

Subject: average if: average values in column if values in other colum are equal

From: Matt J

Date: 15 Jul, 2011 21:31:11

Message: 5 of 14

"Matt J" wrote in message <ivq28n$5qi$1@newscl01ah.mathworks.com>...
> "Daphne" <daphnew_too_nospam@yahoo.com> wrote in message <ivpqne$cbd$1@newscl01ah.mathworks.com>...
> >
> > I would like to average values in column 2 of a matrix that correspond to equal (and unique) values in column 1, a simple example:
> ================
>
> accumarray(a(:,1),a(:,2))./accumarray(a(:,1),1)

=================

BTW, I deliberately do not recommend using
accumarray(...,@mean). This will be considerably slower, as demonstrated in the following test:

%fake data
N=1e6;
K=1000;
a=[randi(K,N,1), rand(N,1)];

tic;
result1=accumarray(a(:,1),a(:,2))./accumarray(a(:,1),1);
toc;
%Elapsed time is 0.060626 seconds.

tic;
 result2=accumarray(a(:,1),a(:,2),[],@mean);
toc;
%Elapsed time is 0.270781 seconds.

Subject: average if: average values in column if values in other colum are equal

From: Daphne

Date: 16 Jul, 2011 06:37:07

Message: 6 of 14


Thanks so much for the great suggestions!
The accumarray approach reduced calculation time to 1.27 seconds in a test case which I ran and took at least minutes (don't know exactly, because I stopped it and waited for a better option).

Thanks again!



"Matt J" wrote in message <ivqbiv$3jm$1@newscl01ah.mathworks.com>...
> "Matt J" wrote in message <ivq28n$5qi$1@newscl01ah.mathworks.com>...
> > "Daphne" <daphnew_too_nospam@yahoo.com> wrote in message <ivpqne$cbd$1@newscl01ah.mathworks.com>...
> > >
> > > I would like to average values in column 2 of a matrix that correspond to equal (and unique) values in column 1, a simple example:
> > ================
> >
> > accumarray(a(:,1),a(:,2))./accumarray(a(:,1),1)
>
> =================
>
> BTW, I deliberately do not recommend using
> accumarray(...,@mean). This will be considerably slower, as demonstrated in the following test:
>
> %fake data
> N=1e6;
> K=1000;
> a=[randi(K,N,1), rand(N,1)];
>
> tic;
> result1=accumarray(a(:,1),a(:,2))./accumarray(a(:,1),1);
> toc;
> %Elapsed time is 0.060626 seconds.
>
> tic;
> result2=accumarray(a(:,1),a(:,2),[],@mean);
> toc;
> %Elapsed time is 0.270781 seconds.

Subject: average if: average values in column if values in other colum are equal

From: Daphne

Date: 16 Jul, 2011 07:17:10

Message: 7 of 14


I've compared the two accumarray approaches suggested here and Matt J's is actually much faster although both give the same results.

A follow up question, how would you calculate the std of the means? that would be a third column in the results matrix.

Thanks!



"Daphne" <daphnew_too_nospam@yahoo.com> wrote in message <ivrbij$kku$1@newscl01ah.mathworks.com>...
>
> Thanks so much for the great suggestions!
> The accumarray approach reduced calculation time to 1.27 seconds in a test case which I ran and took at least minutes (don't know exactly, because I stopped it and waited for a better option).
>
> Thanks again!
>
>
>
> "Matt J" wrote in message <ivqbiv$3jm$1@newscl01ah.mathworks.com>...
> > "Matt J" wrote in message <ivq28n$5qi$1@newscl01ah.mathworks.com>...
> > > "Daphne" <daphnew_too_nospam@yahoo.com> wrote in message <ivpqne$cbd$1@newscl01ah.mathworks.com>...
> > > >
> > > > I would like to average values in column 2 of a matrix that correspond to equal (and unique) values in column 1, a simple example:
> > > ================
> > >
> > > accumarray(a(:,1),a(:,2))./accumarray(a(:,1),1)
> >
> > =================
> >
> > BTW, I deliberately do not recommend using
> > accumarray(...,@mean). This will be considerably slower, as demonstrated in the following test:
> >
> > %fake data
> > N=1e6;
> > K=1000;
> > a=[randi(K,N,1), rand(N,1)];
> >
> > tic;
> > result1=accumarray(a(:,1),a(:,2))./accumarray(a(:,1),1);
> > toc;
> > %Elapsed time is 0.060626 seconds.
> >
> > tic;
> > result2=accumarray(a(:,1),a(:,2),[],@mean);
> > toc;
> > %Elapsed time is 0.270781 seconds.

Subject: average if: average values in column if values in other colum are equal

From: Matt J

Date: 16 Jul, 2011 07:38:09

Message: 8 of 14

"Daphne" <daphnew_too_nospam@yahoo.com> wrote in message <ivrdtm$q64$1@newscl01ah.mathworks.com>...
>
> A follow up question, how would you calculate the std of the means? that would be a third column in the results matrix.
>

Same kind of thing...

N=accumarray(a(:,1),1);
av=accumarray(a(:,1),a(:,2));
avsq=accumarray(a(:,1),a(:,2).^2);

standardDevs=sqrt((avsq-av.^2)./N)

Subject: average if: average values in column if values in other colum are equal

From: Daphne

Date: 16 Jul, 2011 08:03:13

Message: 9 of 14


Perfect!

I noticed a strange phenomenon, the average (and now also the std) gives me a vector that has several NaN and on the bottom the actual values I am looking for.
This didn't happen with the sample matrices, but does for my real data. Any ideas why?
Since the values themselves are fine, I just toss the NaNs and keep the values, but would be happy to know why this is happening.
It doesn't happen in the accumarry approach that Roger Stafford offered.

Thanks,
Daphne



"Matt J" wrote in message <ivrf50$t0k$1@newscl01ah.mathworks.com>...
> "Daphne" <daphnew_too_nospam@yahoo.com> wrote in message <ivrdtm$q64$1@newscl01ah.mathworks.com>...
> >
> > A follow up question, how would you calculate the std of the means? that would be a third column in the results matrix.
> >
>
> Same kind of thing...
>
> N=accumarray(a(:,1),1);
> av=accumarray(a(:,1),a(:,2));
> avsq=accumarray(a(:,1),a(:,2).^2);
>
> standardDevs=sqrt((avsq-av.^2)./N)

Subject: average if: average values in column if values in other colum are equal

From: Matt J

Date: 16 Jul, 2011 08:09:08

Message: 10 of 14

"Matt J" wrote in message <ivrf50$t0k$1@newscl01ah.mathworks.com>...
> "Daphne" <daphnew_too_nospam@yahoo.com> wrote in message <ivrdtm$q64$1@newscl01ah.mathworks.com>...
> >
> > A follow up question, how would you calculate the std of the means? that would be a third column in the results matrix.
> >
>
> Same kind of thing...
>
> N=accumarray(a(:,1),1);
> av=accumarray(a(:,1),a(:,2));
> avsq=accumarray(a(:,1),a(:,2).^2);
>
> standardDevs=sqrt((avsq-av.^2)./N)
==========================

Although, it would be marginally more numerically stable to extend Roger's code as follows:


[u,i,j] = unique(a(:,1));
 sz = size(i);

N=accumarray(j,1,sz);

col1=u;
col2=accumarray(j,a(:,2),sz)./N;
col3=sqrt(accumarray(j, ( a(:,2)-col2(j) ).^2 ,sz)./N);

 result = [col1,col2,col3];

Subject: average if: average values in column if values in other colum are equal

From: Matt J

Date: 16 Jul, 2011 08:20:09

Message: 11 of 14

"Daphne" <daphnew_too_nospam@yahoo.com> wrote in message <ivrgk1$38u$1@newscl01ah.mathworks.com>...
>
> Perfect!
>
> I noticed a strange phenomenon, the average (and now also the std) gives me a vector that has several NaN and on the bottom the actual values I am looking for.
> This didn't happen with the sample matrices, but does for my real data. Any ideas why?
> Since the values themselves are fine, I just toss the NaNs and keep the values, but would be happy to know why this is happening.
===================

Because in your actual data unique(a(:,1)) contains fewer values than
1:max(a(:,1)). ACCUMARRAY assigns zeros to these missing locations and you end up doing a 0/0 operation there.

Roger's use of UNIQUE is more neat and tidy, and more efficient I guess since 0/0 operations are more time-consuming than well-defined division operations.

Subject: average if: average values in column if values in other colum are equal

From: Daphne

Date: 16 Jul, 2011 08:37:09

Message: 12 of 14


Got it.

Well, my final version, which also works if the unique values in column 1 are not integers (thanks for noting that Roger):

[unique_val,~,ind_unique] = unique(a(:,1));
sz = [size(unique_val,1),1];
N = accumarray(ind_unique,1,sz); % number of measurements
mean_a = accumarray(ind_unique,a(:,2),sz)./ N; % average
std_a = sqrt(accumarray(ind_unique, ( a(:,2)-mean_a(ind_unique) ).^2 ,sz) ./ (N-1)); % sample std, use N for population std
b = [unique_val, mean_a, std_a];


Thanks a lot Matt and Roger!
Daphne



"Matt J" wrote in message <ivrhjp$5k6$1@newscl01ah.mathworks.com>...
> "Daphne" <daphnew_too_nospam@yahoo.com> wrote in message <ivrgk1$38u$1@newscl01ah.mathworks.com>...
> >
> > Perfect!
> >
> > I noticed a strange phenomenon, the average (and now also the std) gives me a vector that has several NaN and on the bottom the actual values I am looking for.
> > This didn't happen with the sample matrices, but does for my real data. Any ideas why?
> > Since the values themselves are fine, I just toss the NaNs and keep the values, but would be happy to know why this is happening.
> ===================
>
> Because in your actual data unique(a(:,1)) contains fewer values than
> 1:max(a(:,1)). ACCUMARRAY assigns zeros to these missing locations and you end up doing a 0/0 operation there.
>
> Roger's use of UNIQUE is more neat and tidy, and more efficient I guess since 0/0 operations are more time-consuming than well-defined division operations.

Subject: average if: average values in column if values in other colum are equal

From: Roger Stafford

Date: 16 Jul, 2011 18:47:09

Message: 13 of 14

"Daphne" <daphnew_too_nospam@yahoo.com> wrote in message <ivrijl$80q$1@newscl01ah.mathworks.com>...
> .......
> [unique_val,~,ind_unique] = unique(a(:,1));
> sz = [size(unique_val,1),1];
> N = accumarray(ind_unique,1,sz); % number of measurements
> mean_a = accumarray(ind_unique,a(:,2),sz)./ N; % average
> std_a = sqrt(accumarray(ind_unique, ( a(:,2)-mean_a(ind_unique) ).^2 ,sz) ./ (N-1)); % sample std, use N for population std
> b = [unique_val, mean_a, std_a];
> ........
- - - - - - - - - - -
  To avoid the zero-divided-by-zero phenomenon for single occurrences in a(:,1), you had better divide by max(N-1,1) rather than N-1, just as matlab's 'std' does.

Roger Stafford

Subject: average if: average values in column if values in other colum are equal

From: Daphne

Date: 16 Jul, 2011 20:25:12

Message: 14 of 14


thanks for the tip. I've done that.

Daphne


"Roger Stafford" wrote in message <ivsmbd$452$1@newscl01ah.mathworks.com>...
> "Daphne" <daphnew_too_nospam@yahoo.com> wrote in message <ivrijl$80q$1@newscl01ah.mathworks.com>...
> > .......
> > [unique_val,~,ind_unique] = unique(a(:,1));
> > sz = [size(unique_val,1),1];
> > N = accumarray(ind_unique,1,sz); % number of measurements
> > mean_a = accumarray(ind_unique,a(:,2),sz)./ N; % average
> > std_a = sqrt(accumarray(ind_unique, ( a(:,2)-mean_a(ind_unique) ).^2 ,sz) ./ (N-1)); % sample std, use N for population std
> > b = [unique_val, mean_a, std_a];
> > ........
> - - - - - - - - - - -
> To avoid the zero-divided-by-zero phenomenon for single occurrences in a(:,1), you had better divide by max(N-1,1) rather than N-1, just as matlab's 'std' does.
>
> Roger Stafford

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us