Estimate pdf of data set spanning orders of magnitude

by

 

01 Sep 2010 (Updated )

Determines the pdf of empirical data set x when x spans several orders of magnitude

[Xout pdf]=logbinpdf(x,BinNum)
function [Xout pdf]=logbinpdf(x,BinNum)
% logbinpdf.m
%
% B. Clarke 2/9/10
%
% Determines the pdf of empirical data set x when x spans several orders of
% magnitude.
% Some empirical data from a statistical process varies in its value over
% many decades. It is desirable to estimate the pdf of this data which
% requires the values to be equally divided on a log scale. 
% 
%
% INPUT:
% x data set generated by a statistical process
% BinNum is the number of bins to use in determining the pdf
%
% OUTPUT:
% Output is a matrix of Xout and pdf values for each of the bins which are now
% points on the pdf curve.
%
% Start and Stop values of the bins used to estimate the pdf (Xstart and XStop) and the counts
% in each bin (N) are contained within the program and can be output if
% desired.
% 
% Example of use:
% x=rand(1000000,1); % generate a random number between 0 and 1000;
% out = -log(x); % exponentially distributed variable
% [Xout pdf]=logbin(out,100); % generate pdf using 100 bins
% loglog(Xout,pdf)
% The problem with simply using the original data can be seen by examining hist:
% hist(out,100) % 
% Setting the x axis to log shows the bin size variation


if nargin<2
    error('Must have 2 input arguments')
end
if BinNum<=0
    error('Number of bins must be >0')
end

lx=log10(x);
max1=max(lx);
min1=min(lx);
Ncount=length(lx);
step=(max1-min1)/BinNum;
[N X]=hist(lx,BinNum);
Xstart=X-step/2;
Xstop=X+step/2;
Xsize=10.^(Xstop)-10.^(Xstart);
pdf=N./Xsize/Ncount;
Xout=10.^X;
% sum1=sum(pdf.*(10.^Xstop-10.^Xstart)) % tests to see if the total
% probability is 1.

Contact us