Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
how to find unique rows and calculate the average.

Subject: how to find unique rows and calculate the average.

From: Alan Hitch

Date: 9 Apr, 2013 21:32:09

Message: 1 of 10

I have a matrix (2563x19) with text and numeric characters. The first row is a header.

e.g.
[Site Marshtype Date Species Throw Season Year UniqueID Total Pass1 Pass2 Pass3 Pass4 Pass5 Pass6 Pass7 Pass8 Pass9 Pass10
3BAB Brackish 27-Oct-04 PAPU 2 Fall 2004 3BAB38287PAPU 35 25 9 0 1 0 0 0 0 0 0
3BAB Brackish 27-Oct-04 POLA 2 Fall 2004 3BAB38287POLA 79 46 19 9 5 0 0 0 0 0 0
3BAB Brackish 27-Oct-04 GAAF 2 Fall 2004 3BAB38287GAAF 64 45 13 6 0 0 0 0 0 0 0
3BAB Brackish 27-Oct-04 CYVA 2 Fall 2004 3BAB38287CYVA 11 2 4 2 3 0 0 0 0 0 0
3BAB Brackish 27-Oct-04 LUPA 2 Fall 2004 3BAB38287LUPA 20 11 2 4 2 1 0 0 0 0 0]

I would like to get average values (Total) of particular rows that have identical column values (column of interest is UniqueID) and create a new matrix with the calculated mean and associated UniqueID. For each UniqueID there are between 2 and 3 values.

Like this:

[3BAB38287PAPU 234
3BAB38287POLA 456
3BAB38287GAAF 35]

Could someone help me with this? I have tried a lot of things and nothing seems to be working.

Thanks
Alan

Subject: how to find unique rows and calculate the average.

From: dpb

Date: 10 Apr, 2013 02:54:01

Message: 2 of 10

On 4/9/2013 4:32 PM, Alan Hitch wrote:
> I have a matrix (2563x19) with text and numeric characters. The first
> row is a header.
>
> e.g. [Site Marshtype Date Species Throw Season Year UniqueID Total Pass1
> Pass2 Pass3 Pass4 Pass5 Pass6 Pass7 Pass8 Pass9 Pass10
> 3BAB Brackish 27-Oct-04 PAPU 2 Fall 2004 3BAB38287PAPU 35 25 9 0 1 0 0 0
> 0 0 0
> 3BAB Brackish 27-Oct-04 POLA 2 Fall 2004 3BAB38287POLA 79 46 19 9 5 0 0
> 0 0 0 0
> 3BAB Brackish 27-Oct-04 GAAF 2 Fall 2004 3BAB38287GAAF 64 45 13 6 0 0 0
> 0 0 0 0
> 3BAB Brackish 27-Oct-04 CYVA 2 Fall 2004 3BAB38287CYVA 11 2 4 2 3 0 0 0
> 0 0 0
> 3BAB Brackish 27-Oct-04 LUPA 2 Fall 2004 3BAB38287LUPA 20 11 2 4 2 1 0 0
> 0 0 0]
>
> I would like to get average values (Total) of particular rows that have
> identical column values (column of interest is UniqueID) and create a
> new matrix with the calculated mean and associated UniqueID. For each
> UniqueID there are between 2 and 3 values.
>
> Like this:
>
> [3BAB38287PAPU 234
> 3BAB38287POLA 456
> 3BAB38287GAAF 35]
>
> Could someone help me with this? I have tried a lot of things and
> nothing seems to be working.

Well, not seeing what you tried can't try to fix it, but one
straightforward approach...

 >> type 'alan.dat'

3BAB Brackish 27-Oct-04 PAPU 2 Fall 2004 3BAB38287PAPU 35 25 9 0 1 0 0
0 0 0 0
3BAB Brackish 27-Oct-04 POLA 2 Fall 2004 3BAB38287POLA 79 46 19 9 5 0 0
0 0 0 0
3BAB Brackish 27-Oct-04 GAAF 2 Fall 2004 3BAB38287GAAF 64 45 13 6 0 0 0
0 0 0 0
3BAB Brackish 27-Oct-04 CYVA 2 Fall 2004 3BAB38287CYVA 11 2 4 2 3 0 0
0 0 0 0
3BAB Brackish 27-Oct-04 LUPA 2 Fall 2004 3BAB38287LUPA 20 11 2 4 2 1 0
0 0 0 0

 >> fid=fopen('alan.dat','rt');
 >> C=textscan(fid,[repmat('%*s',1,7) '%s'
repmat('%d',1,11)],'collectoutput',1);
 >> fid=fclose(fid);
 >> u=unique(C{1});
 >> for i=1:length(u),m=mean(C{2}(strcmp(C{1},u(i)),1));disp([u(i) m]),end
     '3BAB38287CYVA' [11]
     '3BAB38287GAAF' [64]
     '3BAB38287LUPA' [20]
     '3BAB38287PAPU' [35]
     '3BAB38287POLA' [79]
 >>

Note I didn't worry about the header and for simplicity of demo just
threw away all the data columns up to the uniqeID one. Then used
'collectoutput' to put rest in a single cell.

Then we found the unique IDs that are in the file (of which there is
only one of each in the sample and ran a loop over it to find the mean
by using a logical vector returned by strcmp as the row index into the
data. I chose the first column arbitrarily.

Salt to suit for your case, of course...

--

Subject: how to find unique rows and calculate the average.

From: Alan Hitch

Date: 10 Apr, 2013 21:10:12

Message: 3 of 10

dpb <none@non.net> wrote in message <kk2k8a$h1m$1@speranza.aioe.org>...
> On 4/9/2013 4:32 PM, Alan Hitch wrote:
> > I have a matrix (2563x19) with text and numeric characters. The first
> > row is a header.
> >
> > e.g. [Site Marshtype Date Species Throw Season Year UniqueID Total Pass1
> > Pass2 Pass3 Pass4 Pass5 Pass6 Pass7 Pass8 Pass9 Pass10
> > 3BAB Brackish 27-Oct-04 PAPU 2 Fall 2004 3BAB38287PAPU 35 25 9 0 1 0 0 0
> > 0 0 0
> > 3BAB Brackish 27-Oct-04 POLA 2 Fall 2004 3BAB38287POLA 79 46 19 9 5 0 0
> > 0 0 0 0
> > 3BAB Brackish 27-Oct-04 GAAF 2 Fall 2004 3BAB38287GAAF 64 45 13 6 0 0 0
> > 0 0 0 0
> > 3BAB Brackish 27-Oct-04 CYVA 2 Fall 2004 3BAB38287CYVA 11 2 4 2 3 0 0 0
> > 0 0 0
> > 3BAB Brackish 27-Oct-04 LUPA 2 Fall 2004 3BAB38287LUPA 20 11 2 4 2 1 0 0
> > 0 0 0]
> >
> > I would like to get average values (Total) of particular rows that have
> > identical column values (column of interest is UniqueID) and create a
> > new matrix with the calculated mean and associated UniqueID. For each
> > UniqueID there are between 2 and 3 values.
> >
> > Like this:
> >
> > [3BAB38287PAPU 234
> > 3BAB38287POLA 456
> > 3BAB38287GAAF 35]
> >
> > Could someone help me with this? I have tried a lot of things and
> > nothing seems to be working.
>
> Well, not seeing what you tried can't try to fix it, but one
> straightforward approach...
>
> >> type 'alan.dat'
>
> 3BAB Brackish 27-Oct-04 PAPU 2 Fall 2004 3BAB38287PAPU 35 25 9 0 1 0 0
> 0 0 0 0
> 3BAB Brackish 27-Oct-04 POLA 2 Fall 2004 3BAB38287POLA 79 46 19 9 5 0 0
> 0 0 0 0
> 3BAB Brackish 27-Oct-04 GAAF 2 Fall 2004 3BAB38287GAAF 64 45 13 6 0 0 0
> 0 0 0 0
> 3BAB Brackish 27-Oct-04 CYVA 2 Fall 2004 3BAB38287CYVA 11 2 4 2 3 0 0
> 0 0 0 0
> 3BAB Brackish 27-Oct-04 LUPA 2 Fall 2004 3BAB38287LUPA 20 11 2 4 2 1 0
> 0 0 0 0
>
> >> fid=fopen('alan.dat','rt');
> >> C=textscan(fid,[repmat('%*s',1,7) '%s'
> repmat('%d',1,11)],'collectoutput',1);
> >> fid=fclose(fid);
> >> u=unique(C{1});
> >> for i=1:length(u),m=mean(C{2}(strcmp(C{1},u(i)),1));disp([u(i) m]),end
> '3BAB38287CYVA' [11]
> '3BAB38287GAAF' [64]
> '3BAB38287LUPA' [20]
> '3BAB38287PAPU' [35]
> '3BAB38287POLA' [79]
> >>
>
> Note I didn't worry about the header and for simplicity of demo just
> threw away all the data columns up to the uniqeID one. Then used
> 'collectoutput' to put rest in a single cell.
>
> Then we found the unique IDs that are in the file (of which there is
> only one of each in the sample and ran a loop over it to find the mean
> by using a logical vector returned by strcmp as the row index into the
> data. I chose the first column arbitrarily.
>
> Salt to suit for your case, of course...
>
> --
dpb,

Thanks for responding. I neglected to tell you that the data is a cell array with different data types and textscan only works with double or string. I read the data in using xlsread and was using the raw data. I am relatively new to MatLab. How do I convert the cell array into a double or string? I tried using cell2mat but the contents of my cellarray are different data types.

Thanks again
Alan

Subject: how to find unique rows and calculate the average.

From: dpb

Date: 11 Apr, 2013 00:37:52

Message: 4 of 10

On 4/10/2013 4:10 PM, Alan Hitch wrote:
> dpb <none@non.net> wrote in message <kk2k8a$h1m$1@speranza.aioe.org>...
...

>> Well, not seeing what you tried can't try to fix it, but one
>> straightforward approach...
>>
>> >> type 'alan.dat'
>>
>> 3BAB Brackish 27-Oct-04 PAPU 2 Fall 2004 3BAB38287PAPU 35 25 9 0 1 0 0
>> 0 0 0 0
>> 3BAB Brackish 27-Oct-04 POLA 2 Fall 2004 3BAB38287POLA 79 46 19 9 5 0
>> 0 0 0 0 0
...


>> >> fid=fopen('alan.dat','rt');
>> >> C=textscan(fid,[repmat('%*s',1,7) '%s'
>> repmat('%d',1,11)],'collectoutput',1);
>> >> fid=fclose(fid);
>> >> u=unique(C{1});
>> >> for i=1:length(u),m=mean(C{2}(strcmp(C{1},u(i)),1));disp([u(i) m]),end
>> '3BAB38287CYVA' [11]
...
>> '3BAB38287POLA' [79]
>> >>
>>
>> Note I didn't worry about the header and for simplicity of demo just
>> threw away all the data columns up to the uniqeID one. Then used
>> 'collectoutput' to put rest in a single cell.
>>
>> Then we found the unique IDs that are in the file (of which there is
>> only one of each in the sample and ran a loop over it to find the mean
>> by using a logical vector returned by strcmp as the row index into the
>> data. I chose the first column arbitrarily.
>>
>> Salt to suit for your case, of course...
...

> Thanks for responding. I neglected to tell you that the data is a cell
> array with different data types and textscan only works with double or
> string. I read the data in using xlsread and was using the raw data. I
> am relatively new to MatLab. How do I convert the cell array into a
> double or string? I tried using cell2mat but the contents of my
> cellarray are different data types.

textscan() can read whatever you give it the formatting for; see above
that reads your file as you said it was/is. All you need is to use the
appropriate 'headerlines',N option to skip them.

Given the above you have

3BAB Brackish 27-Oct-04 PAPU 2 Fall 2004 3BAB38287PAPU

a 2 strings, a date (read as a string also is simplest to deal with), a
string, an integer, a string, integer, string followed by 8 numerics.

Expand the format string I gave as needed to return any/all fields you
wish/need; the 'collectoutput' flag will put them into two cells one
containing the strings, the other the values.

I don't see where the problem is; why can't you use what gave [almost]
directly?

--

Subject: how to find unique rows and calculate the average.

From: Peter Perkins

Date: 11 Apr, 2013 02:16:30

Message: 5 of 10

Alan, if you have access to the Statistics Toolbox, a combination of
cell2dataset (or create the dataset array directly by reading a file)
and grpstats solves this easily. Use UniqueID as the grouping variable.

Hope this helps.


On 4/9/2013 5:32 PM, Alan Hitch wrote:
> I have a matrix (2563x19) with text and numeric characters. The first
> row is a header.
>
> e.g. [Site Marshtype Date Species Throw Season Year
> UniqueID Total Pass1 Pass2 Pass3 Pass4 Pass5
> Pass6 Pass7 Pass8 Pass9 Pass10
> 3BAB Brackish 27-Oct-04 PAPU 2 Fall 2004
> 3BAB38287PAPU 35 25 9 0 1 0 0 0 0 0 0
> 3BAB Brackish 27-Oct-04 POLA 2 Fall 2004
> 3BAB38287POLA 79 46 19 9 5 0 0 0 0 0 0
> 3BAB Brackish 27-Oct-04 GAAF 2 Fall 2004
> 3BAB38287GAAF 64 45 13 6 0 0 0 0 0 0 0
> 3BAB Brackish 27-Oct-04 CYVA 2 Fall 2004
> 3BAB38287CYVA 11 2 4 2 3 0 0 0 0 0 0
> 3BAB Brackish 27-Oct-04 LUPA 2 Fall 2004
> 3BAB38287LUPA 20 11 2 4 2 1 0 0 0 0 0]
>
> I would like to get average values (Total) of particular rows that have
> identical column values (column of interest is UniqueID) and create a
> new matrix with the calculated mean and associated UniqueID. For each
> UniqueID there are between 2 and 3 values.
>
> Like this:
>
> [3BAB38287PAPU 234
> 3BAB38287POLA 456
> 3BAB38287GAAF 35]
>
> Could someone help me with this? I have tried a lot of things and
> nothing seems to be working.
>
> Thanks
> Alan

Subject: how to find unique rows and calculate the average.

From: dpb

Date: 11 Apr, 2013 02:50:08

Message: 6 of 10

On 4/10/2013 4:10 PM, Alan Hitch wrote:
...

> Thanks for responding. I neglected to tell you that the data is a cell
> array with different data types and textscan only works with double or
> string. I read the data in using xlsread and was using the raw data. I
> am relatively new to MatLab. How do I convert the cell array into a
> double or string? I tried using cell2mat but the contents of my
> cellarray are different data types.

OK, it just dawned on me--your problem is trying to use RAW; use TEXT as
the C{1} cell in my example and NUM as the C{2}. Since XLSREAD returns
the numeric fields as a double array instead of cell, just dispense w/
the curly brackets to dereference the cell contents and just use the
subscripting expressions directly into NUM. (Of course, use whatever
variable names you wish instead of the ML placeholders...)

--

Subject: how to find unique rows and calculate the average.

From: Bruno Luong

Date: 11 Apr, 2013 06:38:09

Message: 7 of 10

>> c = {'3BAB38287PAPU' 234
 '3BAB38287POLA' 456
 '3BAB38287GAAF' 35
 '3BAB38287PAPU' 232
 '3BAB38287POLA' 406
 '3BAB38287GAAF' 29}

c =

    '3BAB38287PAPU' [234]
    '3BAB38287POLA' [456]
    '3BAB38287GAAF' [ 35]
    '3BAB38287PAPU' [232]
    '3BAB38287POLA' [406]
    '3BAB38287GAAF' [ 29]

[id, ~, J] = unique(c(:,1));
[id num2cell(accumarray(J, [c{:,2}]', [], @mean))]

ans =

    '3BAB38287GAAF' [ 32]
    '3BAB38287PAPU' [233]
    '3BAB38287POLA' [431]

% Bruno

Subject: how to find unique rows and calculate the average.

From: Alan Hitch

Date: 11 Apr, 2013 22:50:08

Message: 8 of 10

dpb

I know this will work once I get the textscan line of code to work. If I am reading in a xls file with (txt,num,raw) which array should I put in the "fid" place? That is confusing me right now. After that is done everything should work fine.
Thanks again for all your help.
Alan




dpb <none@non.net> wrote in message <kk58cv$2k3$1@speranza.aioe.org>...
> On 4/10/2013 4:10 PM, Alan Hitch wrote:
> ...
>
> > Thanks for responding. I neglected to tell you that the data is a cell
> > array with different data types and textscan only works with double or
> > string. I read the data in using xlsread and was using the raw data. I
> > am relatively new to MatLab. How do I convert the cell array into a
> > double or string? I tried using cell2mat but the contents of my
> > cellarray are different data types.
>
> OK, it just dawned on me--your problem is trying to use RAW; use TEXT as
> the C{1} cell in my example and NUM as the C{2}. Since XLSREAD returns
> the numeric fields as a double array instead of cell, just dispense w/
> the curly brackets to dereference the cell contents and just use the
> subscripting expressions directly into NUM. (Of course, use whatever
> variable names you wish instead of the ML placeholders...)
>
> --

Subject: how to find unique rows and calculate the average.

From: dpb

Date: 11 Apr, 2013 23:51:45

Message: 9 of 10

On 4/11/2013 5:50 PM, Alan Hitch wrote:

...[top-posting repaired--don't do that; hard conversation follow makes]...

> dpb <none@non.net> wrote in message <kk58cv$2k3$1@speranza.aioe.org>...
>> On 4/10/2013 4:10 PM, Alan Hitch wrote:
>> ...
>>
>> > Thanks for responding. I neglected to tell you that the data is a cell
>> > array with different data types and textscan only works with double or
>> > string. I read the data in using xlsread and was using the raw data. I
>> > am relatively new to MatLab. How do I convert the cell array into a
>> > double or string? I tried using cell2mat but the contents of my
>> > cellarray are different data types.
>>
>> OK, it just dawned on me--your problem is trying to use RAW; use TEXT
>> as the C{1} cell in my example and NUM as the C{2}. Since XLSREAD
>> returns the numeric fields as a double array instead of cell, just
>> dispense w/ the curly brackets to dereference the cell contents and
>> just use the subscripting expressions directly into NUM. (Of course,
>> use whatever variable names you wish instead of the ML placeholders...)

...

> I know this will work once I get the textscan line of code to work.
> If I am reading in a xls file with (txt,num,raw) which array should I
> put in the "fid" place? That is confusing me right now. After that is
> done everything should work fine. Thanks again for all your help.

That's what I was saying just realized--if it is an Excel spreadsheet
that you're trying to read, TEXTSCAN() can't do it--it reads, well, 'text'.

I was just pointing out that when you use XLSREAD that text that is the
ID column, etc., is in the TXT cell array while the numeric values are
in NUM. The TEXTSCAN line goes away; fid is the file handle obtained
from fopen() on the file but xlsread takes the filename and opens it
internally instead.

I have to admit I've never used XLSREAD so I'm not absolutely positive
how it structures the cell array(s) it returns--one thing that is clear
is that you'll have to either start the location where it reads the file
from the cell below the header line whichever line that is on inside the
sheet or it will also be mixed up in the TXT array and I've no clue what
it'll do about the NUM array to match.

Alternatively, if TXT is a cell array of rows, it should be reasonably
easy to use Bruno's solution on it--I have only recently been blessed w/
a new-enough release of Matlab to incorporate accumarray and various
other new-fangled things despite 20 yrs w/ Matlab. Consequently I tend
to forget about them.

In summary, the TEXTSCAN solution works only if you save the file to a
text file and read it; you'll have to look at how XLSREAD returns the
data to figure out how to do the equivalent addressing that is shown in
both Bruno's and my approaches to find the correct rows and associated data.

--

Subject: how to find unique rows and calculate the average.

From: dpb

Date: 12 Apr, 2013 22:31:00

Message: 10 of 10

On 4/11/2013 6:51 PM, dpb wrote:
...
>>> OK, it just dawned on me--your problem is trying to use RAW; ...
...
> That's what I was saying just realized--if it is an Excel spreadsheet
> that you're trying to read, TEXTSCAN() can't do it--it reads, well, 'text'.
>
> I was just pointing out that when you use XLSREAD that text that is the
> ID column, etc., is in the TXT cell array while the numeric values are
> in NUM. The TEXTSCAN line goes away; fid is the file handle obtained
> from fopen() on the file but xlsread takes the filename and opens it
> internally instead.
>
> I have to admit I've never used XLSREAD so I'm not absolutely positive
> how it structures the cell array(s) it returns--one thing that is clear
> is that you'll have to either start the location where it reads the file
> from the cell below the header line whichever line that is on inside the
> sheet or it will also be mixed up in the TXT array and I've no clue what
> it'll do about the NUM array to match.
>
> Alternatively, if TXT is a cell array of rows, it should be reasonably
> easy to use Bruno's solution on it--I have only recently been blessed w/
> a new-enough release of Matlab to incorporate accumarray and various
> other new-fangled things despite 20 yrs w/ Matlab. Consequently I tend
> to forget about them.
...

Well decided I'd try a new and old together...I took the lines you
posted and the header line and put them in a .xls file (something I
almost also never use, but that aside...) and named it Alan.xls

Then the following using Bruno's solution works just fine on the RAW
cella array if you specify the lines w/o the header on the XLSREAD call
(another reason to avoid Excel--having to know the sizes a priori)...

 >> [~,~,R]=xlsread('alan.xls',1,'A2:S6');
 >> [id, ~, J] = unique(R(:,8));
 >> [id num2cell(accumarray(J, [R{:,9}]', [], @mean))]
ans =
     '3BAB38287CYVA' [11]
     '3BAB38287GAAF' [64]
     '3BAB38287LUPA' [20]
     '3BAB38287PAPU' [35]
     '3BAB38287POLA' [79]
 >>

There's a mishmash if you don't exclude the header and particularly when
separating the text and numeric data--it's confused by the ID columns
that thinks maybe should have been valid hex or somesuch it
seems--they're returned as NaN's. Consequently it's another hassle to
figure out which column it is to operate on there but at least while
it's counting, you can rely on which is which in the RAW format.

Would be simpler it still seems to me to import a text file instead form
that standpoint albeit another step to save the xls file as csv or
something.

Anyway, should solve your problem...

--

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us