Thread Subject: combine data?

Subject: combine data?

From: jay vaughan

Date: 16 Jul, 2008 08:00:20

Message: 1 of 8

Hi,

I am trying to find a way to combine data without using a
loop. My data are similar to the following.

width = [1 4 4 5 6 10 10 10 16];
weight = [1 1 2 1 1 4 2 2 1];

I would like to find a way to combine all entries where the
width was the same, finding the total weight, like below.

combined_width = [1 4 5 6 10 16];
combined_weight = [1 3 1 1 8 1];

I have to do this a million times or so and was hoping to
do it efficiently. Any ideas on how to vectorize something
like this?

Thanks,
J

Subject: combine data?

From: vedenev

Date: 16 Jul, 2008 08:15:42

Message: 2 of 8

here is the code:

v=[1 4 4 5 6 10 10 10 16]
dv=diff(v);
r=find(dv);
v1=[v(r) v(end)]

------------------------------------
Maxim Vedenev, Matlab freelancer
http://simulations.narod.ru/

Subject: combine data?

From: jay vaughan

Date: 16 Jul, 2008 08:37:10

Message: 3 of 8

Hi Maxim,

thanks for the speedy reply. Your code essentially does the
following...

v1=unique(v);

What I would like to do is to find the sum of those
elements in w whose corresponding elements in v are the
same. For example, in v the 2nd and 3rd elements are both
4, so I summed the 2nd & 3rd elements from w in the output
to get 1+2=3 in w_summed, etc.

v = [1 4 4 5 6 10 10 10 16]
w = [1 1 2 1 1 4 2 2 1];

v_unique = [1 4 5 6 10 16];
w_summed = [1 3 1 1 8 1];


Any thoughts? Thanks,
J

vedenev <vedenev.maxim@gmail.com> wrote in message
<2bc34cc4-1c0a-401f-af4a-
0c5479ec0e4c@c65g2000hsa.googlegroups.com>...
> here is the code:
>
> v=[1 4 4 5 6 10 10 10 16]
> dv=diff(v);
> r=find(dv);
> v1=[v(r) v(end)]
>
> ------------------------------------
> Maxim Vedenev, Matlab freelancer
> http://simulations.narod.ru/

Subject: combine data?

From: Rune Allnor

Date: 16 Jul, 2008 10:37:34

Message: 4 of 8

On 16 Jul, 10:00, "jay vaughan" <jvaughan5.nos...@gmail.com> wrote:
> Hi,
>
> I am trying to find a way to combine data without using a
> loop.

Sigh... the vixen of vectorisation in all her splendour...

It can't be done. There is no way to do this without a loop.

> My data are similar to the following.
>
> width =A0=3D [1 4 4 5 6 10 10 10 16];
> weight =3D [1 1 2 1 1 4 =A02 =A02 =A01];
>
> I would like to find a way to combine all entries where the
> width was the same, finding the total weight, like below.
>
> combined_width =A0=3D [1 4 5 6 10 16];
> combined_weight =3D [1 3 1 1 8 =A01];
>
> I have to do this a million times or so and was hoping to
> do it efficiently.

That's a completely different issue. What I would do
if I was constrained to matlab (not tested!):

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
wiv =3D unique(width);
wev =3D zeros(size(wiv));

for n=3D1:length(wev)
   idx=3Dfind(width=3D=3Dwiv(n));
   wev(n)=3Dsum(weight(idx));
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

If this is too slow, first learn a proper programming
language and then read

Knuth: "The Art of ComputerProgramming, vol 3:
       Sorting and Searching."

In fact, that's not such a bad idea even if the script
above runs fast enough for you.

Rune

Subject: combine data?

From: Peter Boettcher

Date: 16 Jul, 2008 13:16:11

Message: 5 of 8

"jay vaughan" <jvaughan5.nospam@gmail.com> writes:

> Hi,
>
> I am trying to find a way to combine data without using a
> loop. My data are similar to the following.
>
> width = [1 4 4 5 6 10 10 10 16];
> weight = [1 1 2 1 1 4 2 2 1];
>
> I would like to find a way to combine all entries where the
> width was the same, finding the total weight, like below.
>
> combined_width = [1 4 5 6 10 16];
> combined_weight = [1 3 1 1 8 1];
>
> I have to do this a million times or so and was hoping to
> do it efficiently. Any ideas on how to vectorize something
> like this?

We used to do this with the "sparse" function, which had the neat side
effect of summing any values that were specified with repeat indices.
Now there is a dedicated function for it: accumarray.

[combined_width, a, b] = unique(width);
combined_weight=accumarray(b', weight).';

-Peter

Subject: combine data?

From: Steven Lord

Date: 16 Jul, 2008 13:21:38

Message: 6 of 8


"jay vaughan" <jvaughan5.nospam@gmail.com> wrote in message
news:g5k9qk$kc5$1@fred.mathworks.com...
> Hi,
>
> I am trying to find a way to combine data without using a
> loop. My data are similar to the following.
>
> width = [1 4 4 5 6 10 10 10 16];
> weight = [1 1 2 1 1 4 2 2 1];
>
> I would like to find a way to combine all entries where the
> width was the same, finding the total weight, like below.
>
> combined_width = [1 4 5 6 10 16];
> combined_weight = [1 3 1 1 8 1];
>
> I have to do this a million times or so and was hoping to
> do it efficiently. Any ideas on how to vectorize something
> like this?

Use ACCUMARRAY. If your widths are not integers, UNIQUE them and use the
unique indices to construct the subs input.

--
Steve Lord
slord@mathworks.com


Subject: combine data?

From: jay vaughan

Date: 16 Jul, 2008 16:57:04

Message: 7 of 8

"Steven Lord" <slord@mathworks.com> wrote in message <g5ksl2
$ij3$1@fred.mathworks.com>...
>
> "jay vaughan" <jvaughan5.nospam@gmail.com> wrote in
message
> news:g5k9qk$kc5$1@fred.mathworks.com...
> > Hi,
> >
> > I am trying to find a way to combine data without using
a
> > loop. My data are similar to the following.
> >
> > width = [1 4 4 5 6 10 10 10 16];
> > weight = [1 1 2 1 1 4 2 2 1];
> >
> > I would like to find a way to combine all entries where
the
> > width was the same, finding the total weight, like
below.
> >
> > combined_width = [1 4 5 6 10 16];
> > combined_weight = [1 3 1 1 8 1];
> >
> > I have to do this a million times or so and was hoping
to
> > do it efficiently. Any ideas on how to vectorize
something
> > like this?
>
> Use ACCUMARRAY. If your widths are not integers, UNIQUE
them and use the
> unique indices to construct the subs input.
>
> --
> Steve Lord
> slord@mathworks.com
>
>

Thanks Peter and Steve, accumarray works great!
J

Subject: combine data?

From: Rune Allnor

Date: 21 Jul, 2008 23:09:43

Message: 8 of 8

On 16 Jul, 12:37, Rune Allnor <all...@tele.ntnu.no> wrote:
> On 16 Jul, 10:00, "jay vaughan" <jvaughan5.nos...@gmail.com> wrote:
>
> > Hi,
>
> > I am trying to find a way tocombinedatawithout using a
> > loop.
>
> Sigh... the vixen of vectorisation in all her splendour...

It took a little bit of time to get this test done, but
at last some hard facts.

I implemented the three algorithms suggested in this thread:
Peter and Steven's use of ACCUMARRAY, my matlab code and
my C++ MEX program.

Below is the report from the profiler (R2006a) as well as
the code for the various files. My 'naive' matlab code
runs in half the time of the 'terse' code by Peter &
Steven - note that it is the call to UNIQUE which
takes the most time. My C++ MEX code runs in 10% of that
time again.

The C++ code is likely to become relatively slower
when there are a lot of different 'width' classes
(there are only 11 below); if there are only one or
two weights per width, the C++ code might in fact become
slower than alt least the 'naive' matlab code.

Conclusions? Don't mistake the number of typed
characters for run-time efficiency...

Rune


%%%%% Profiler report %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Function Calls Total time Self-time

test 1 0.698 s 0.070 s
unique 2 0.458 s 0.458 s
alt2 (Peter & Steven) 1 0.386 s 0.027 s
alt1 (Rune) 1 0.213 s 0.113 s
dataweights (MEX-function) 1 0.029 s 0.029 s
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%% test.m %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function test

N = 10;
M = 1000000;
width=int32(rand(M,1)*N);
weight=rand(M,1);

[wid1,wei1]=dataweights(width,weight);
[wid2,wei2]=alt1(width,weight);
[wid3,wei3]=alt2(width,weight);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%% alt1.m %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [wiv,wev]=alt1(width,weight)

wiv = unique(width);
wev = zeros(size(wiv));


for n=1:length(wev)
   idx=find(width==wiv(n));
   wev(n)=sum(weight(idx));
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%%%%%% alt2.m %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [wid,wei]=alt2(width,weight);

[wid, a, b] = unique(width);
wei=accumarray(b, weight).';
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

/// dataweights.cpp //////////////////////////////////////////////
#include<map>
#include "mex.h"

using namespace std;

extern void _main();

void mexFunction(
int nlhs,
mxArray *plhs[],
int nrhs,
const mxArray *prhs[]
)
{

    if (nrhs != 2)
    {
        mexErrMsgTxt("Function requires two arguments");
    }

    if (nlhs != 2)
    {
        mexErrMsgTxt("Function returns two results");
    }

    if ((mxGetClassID(prhs[0]) != mxINT32_CLASS)||
        (mxIsComplex(prhs[0])==true))
    {
        mexErrMsgTxt("Argument 1 must be real int32");
    }

    if ((mxGetClassID(prhs[1]) != mxDOUBLE_CLASS)||
        (mxIsComplex(prhs[1])==true))
    {
        mexErrMsgTxt("Argument 2 must be real doubles");
    }

    int MWeights = mxGetM(prhs[0]);
    int NWeights = mxGetN(prhs[0]);
    int MWidths = mxGetM(prhs[1]);
    int NWidths = mxGetN(prhs[1]);

    if (!((NWeights == 1) &&(NWidths == 1)))
    {
        mexErrMsgTxt("Arguments must be column vectors");
    }

    if (MWeights != MWidths)
    {
        mexErrMsgTxt("Arguments must be same lengths");
    }

    std::map<int,double> data;
    int* weight = (int*)mxGetData(prhs[0]);
    double* widths = (double*)mxGetData(prhs[1]);

    for (int n=0;n<MWeights;n++)
    {
        data[weight[n]]+=widths[n];
    }

    int dims[2];
    dims[0]=data.size();
    dims[1]=1;
    plhs[0]=mxCreateNumericArray(2,dims,mxINT32_CLASS,mxREAL);
    plhs[1]=mxCreateNumericArray(2,dims,mxDOUBLE_CLASS,mxREAL);

    weight = (int*)mxGetData(plhs[0]);
    widths = (double*)mxGetData(plhs[1]);
    int i = 0;
    std::map<int,double>::const_iterator j;
    for (j=data.begin();j!=data.end();++j)
    {
        weight[i]=(*j).first;
        widths[i]=(*j).second;
        ++i;
    }

    return;
}
///////////////////////////////////////////////////////////////////

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
vectorize jay vaughan 16 Jul, 2008 04:00:22
rssFeed for this Thread
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com