Thread Subject: Histogram with weighting factors?

Subject: Histogram with weighting factors?

From: Riku Jarvinen

Date: 11 Nov, 2009 12:48:01

Message: 1 of 7

I need to produce a histogram of one dimensional data with weighting factors. By this I mean that in addition to a data vector (d) I have also a vector w telling how many counts each element in d represents.

Here's a poor man's solution which just duplicates data elements according to their weight.

d = rand(10,1); % my data
w = [1 5 2 3 1 1 2 1 7 1]'; % weighting factors

% create a new data vector
n = 1;
for(i=1:10)
% add multiple counts
for(j=1:w(i))
dnew(n) = d(i);
n=n+1;
end
end

hist(dnew);

Any ideas how to do this fast and not waste memory? My data vectors are quite long and weighting factors vary several orders of magnitude. I'm using Matlab 6.1.

Subject: Histogram with weighting factors?

From: ImageAnalyst

Date: 11 Nov, 2009 14:25:05

Message: 2 of 7

On Nov 11, 7:48 am, "Riku Jarvinen" <riku.jarvi...@fmi.fi> wrote:
> I need to produce a histogram of one dimensional data with weighting factors. By this I mean that in addition to a data vector (d) I have also a vector w telling how many counts each element in d represents.
>
> Here's a poor man's solution which just duplicates data elements according to their weight.
>
> d = rand(10,1); % my data
> w = [1 5 2 3 1 1 2 1 7 1]'; % weighting factors
>
> % create a new data vector
> n = 1;
> for(i=1:10)
> % add multiple counts
> for(j=1:w(i))
> dnew(n) = d(i);
> n=n+1;
> end
> end
>
> hist(dnew);
>
> Any ideas how to do this fast and not waste memory? My data vectors are quite long and weighting factors vary several orders of magnitude. I'm using Matlab 6.1.

----------------------------------------------------
Short on memory? Just how large are your arrays? What is "quite
long"? Hundreds of megabytes? If they're only a few thousand
elements long I probably wouldn't worry about it - it'll be lightning
fast the way you have it plus it's readable, intuitive, and
understandable. (Sometimes I find the "one liner" solutions so
cryptic and hard to understand that it's not worth it in terms of code
readability.)

Subject: Histogram with weighting factors?

From: Riku Jarvinen

Date: 12 Nov, 2009 00:00:18

Message: 3 of 7

ImageAnalyst <imageanalyst@mailinator.com> wrote in message <e9d4bece-d445-48ef-8088-593b134fd7cd@k4g2000yqb.googlegroups.com>...

> Short on memory? Just how large are your arrays? What is "quite
> long"? Hundreds of megabytes? If they're only a few thousand
> elements long I probably wouldn't worry about it - it'll be lightning
> fast the way you have it plus it's readable, intuitive, and
> understandable. (Sometimes I find the "one liner" solutions so
> cryptic and hard to understand that it's not worth it in terms of code
> readability.)

I agree with you, complex one liner vectorizations aren't always the way to go. However, in this case perhaps some more clever methods are needed.

I have typically 1...10 million elements in the data vector*. What is problematic is the weighting factors since they can vary many orders of magnitude (for example w=1e16...1e22). This means that even if I normalize the weighting factors (smallest one to unity), I still get million copies of the data element with the largest weight (w=1...1e6).

Now, depending on the distribution of the weighting factors, I can have 1e6*1e6 elements in the above poor man's duplicated vector and 1e6*1e6 loop evaluations are needed to calculate those. So it's a problem with both memory and number crushing power.

Here follows a bit more realistic (and more vectorized) example of what I'm trying to do. With this method I'd need to be able to use nd = 1e6 and wmagn = 1e6 which is just not possible in terms of memory or CPU power!

nd=1e3; % number of data points
wmagn=1e3; % max weighting magnitude

d=rand(nd,1); % my data
w=randn(nd,1); % weighting factors
w=round((w+abs(min(w)))/(max(w)-min(w))*wmagn);

% create a new data vector
n = 1;
for(i=1:length(d))
a=n+w(i);
dnew(n:a-1)=repmat(d(i),w(i),1);
n=a;
end

hist(dnew);

*) The data I'm dealing with are, for example, particle energies from a kinetic plasma simulation. The w factors are statistical weights of the simulation particles and I'm trying to calculate weighted (energy) spectra.

Subject: Histogram with weighting factors?

From: ImageAnalyst

Date: 12 Nov, 2009 01:38:53

Message: 4 of 7

If you can quantize your original data so that there's one data point
per bin then you can just multiply your weights by your histogram.
For example, let's say your data goes up to a million and you specify
that you have a million bins and no bin has more than one data point
in it. Or at least if there's more than one data point in it that the
weights for those two data points are the same. In that case, just
ask for a hist with a million bins, then multiply the histogram counts
by the weights for each value - essentially boosting the count in each
bin by a factor of the weight. Then you can re-bin into bigger bins
if you want.

Subject: Histogram with weighting factors?

From: Bruno Luong

Date: 12 Nov, 2009 06:50:19

Message: 5 of 7

The below should work, you might change EDGE to stick striclty with the center-convention of HIST (without C). But the idea there:

edges=linspace(min(d),max(d),10);
[trash bin]=histc(d,edges);
count=accumarray(bin,w(:));

bar(count)

Bruno

Subject: Histogram with weighting factors?

From: Riku Jarvinen

Date: 23 Nov, 2009 16:23:19

Message: 6 of 7

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <hdgb7b$v1$1@fred.mathworks.com>...
> The below should work, you might change EDGE to stick striclty with the center-convention of HIST (without C). But the idea there:
>
> edges=linspace(min(d),max(d),10);
> [trash bin]=histc(d,edges);
> count=accumarray(bin,w(:));
>
> bar(count)

Thanks a lot! This works really nicely. My 6.1 version don't have the accumarray function but I tried your solution with a newer version. Seems like an interesting function in general too.

Riku

Subject: Histogram with weighting factors?

From: Riku Jarvinen

Date: 23 Nov, 2009 16:45:27

Message: 7 of 7

"Riku Jarvinen" <riku.jarvinen@fmi.fi> wrote in message <heectn$eoc$1@fred.mathworks.com>...

> > edges=linspace(min(d),max(d),10);
> > [trash bin]=histc(d,edges);
> > count=accumarray(bin,w(:));

> Thanks a lot! This works really nicely. My 6.1 version don't have the accumarray function but I tried your solution with a newer version. Seems like an interesting function in general too.

I also found that I can do this without accumarray almost as fast with a simple for loop:

[n,bin]=histc(d,edges);
a=zeros(size(n));
for(i=1:length(w))
a(bin(i)) = a(bin(i)) + w(i);
end
bar(edges,a,'histc');

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
histc Riku Jarvinen 23 Nov, 2009 11:45:58
hist Riku Jarvinen 23 Nov, 2009 11:45:54
accumarray Riku Jarvinen 23 Nov, 2009 11:24:37
weighting Riku Jarvinen 11 Nov, 2009 07:49:06
weight Riku Jarvinen 11 Nov, 2009 07:49:06
histogram Riku Jarvinen 11 Nov, 2009 07:49:04
rssFeed for this Thread

Contact us at files@mathworks.com