How to plot a smooth curve for Empirical probability density ?

Hi everyone,
I require to plot an empirical probability density curve of a data set (184 rows, 59 columns). Initially, I pick one column to plot the results but, the EPD curve looks so wired. May someone suggest to me how can I get desired result (Attached). Data is also attached for reference.
clear all
clc
X = load('R_0.01T.csv'); % input data
SzX = size(X) % matrix size
r=[X(1,:)]; % first row of the data set
ra = r(~isnan(r)); % remove all nan enteries
[f,x,flo,fhi] = ecdf(ra);
WinLen = 5
dfdxs = smoothdata(gradient(f)./gradient(x), 'movmedian',WinLen); % ... Then, Smooth Them Again ...
aaa = smooth(dfdxs);
plot(x, aaa)
This is what i get
Here is the expected results.

2 Comments

Change the line
WinLen = 35%
Modify this value and you can observe the difference
Still not expected, I have tried with different values but no significant change.

Sign in to comment.

 Accepted Answer

hello
tried a few options , smoothdata and spline fit. In fact, I was thinking you wanted something super smooth so I first opted for the spline fit. But I saw that the expected plot was supposed to have some waviness , so I changed to smoothdata
NB as I don't have the toolbox for ecdf , I found an alternative on FEX for it https://fr.mathworks.com/matlabcentral/fileexchange/32831-homemade-ecdf
but I had to correct a bug (the corrected function is attaced fyi)
hope the code below can help you
clear all
clc
X = load('R_0.01T.csv'); % input data
SzX = size(X); % matrix size
r=X(1,:); % first row of the data set
ra = r(~isnan(r)); % remove all nan enteries
% [f,x,flo,fhi] = ecdf(ra);
[f,x] = homemade_ecdf(ra); % FEX : https://fr.mathworks.com/matlabcentral/fileexchange/32831-homemade-ecdf
% remove duplicates
[x,ia,ic] = unique(x);
f = f(ia);
% resample the data (interpolation) on linearly spaced points
xx = linspace(min(x),max(x),100);
ff = interp1(x,f,xx);
% smoothing
ffs = smoothdata(ff, 'loess',40);
% or spline fit ?
% Breaks interpolated from data (log x scale to get more points in the
% lower x range and less in the upper x range)
breaks = logspace(log10(min(x)),log10(max(x)),5); %
p = splinefit(xx,ff,breaks); %
ff2 = ppval(p,xx);
figure(1),
plot(x,f,'*',xx,ffs,'-',xx,ff2,'-')
legend('raw','smoothed','spline fit');
% compute gradient from spline fitted data (my choice)
dx = mean(diff(xx));
dfdxs = gradient(ffs)./dx;
figure(2),plot(xx, dfdxs)

14 Comments

ppval is not work on 2021a version. However other function works perfectly:
clear all
clc
X = load('R_0.01T.csv'); % input data
SzX = size(X); % matrix size
r=X(1,:); % first row of the data set
ra = r(~isnan(r)); % remove all nan enteries
% [f,x,flo,fhi] = ecdf(ra);
[f,x] = ecdf(ra); % FEX : https://fr.mathworks.com/matlabcentral/fileexchange/32831-homemade-ecdf
% remove duplicates
[x,ia,ic] = unique(x);
f = f(ia);
% resample the data (interpolation) on linearly spaced points
xx = linspace(min(x),max(x),100);
ff = interp1(x,f,xx);
% % smoothing
ffs = smoothdata(ff, 'loess',40);
dx = mean(diff(xx));
dfdxs = gradient(ffs)./dx;
figure(2),plot(xx, dfdxs)
I not quiet well understand why we need to apply unique function.
hello again
if you need to resample or interpolate, the x data must be unique and monotonic
I am surprised you don't have ppval function in your release
that's part of the "basic" matlab package (polynomial functions)
Even, I am using online. It should be ,....
I attempted to generalize your script but it seems there are some problems. May you suggest some wayout :
clear all
clc
X = load('R_0.01T.csv');
X = load('R_0.01T.csv');
SzX = size(X)
r{1}=[X(1,:)];
r{2}=[X(2,:)];
r{3}=[X(3,:)];
for k = 1:numel(r)
ra{k} = r{k}(~isnan(r{k}));
[f{k},x{k}] = ecdf(ra{k});
[x{k},ia,ic] = unique(x{k});
f{k} = f(ia);
xx{k} = linspace(min(x{k}),max(x{k}),100);
ff{k} = interp1(x{k},f{k},xx{k});
ffs{k} = smoothdata(ff{k}, 'loess',40);
dx{k} = mean(diff(xx{k}));
dfdxs{k} = gradient(ffs{k})./dx{k};
end
figure
hold on
for k = 1:numel(r) % ... Then, Plot The End Result!
plot(xx{k}, dfdxs{k}, 'DisplayName',sprintf('event{%2d}',k))
end
hold off
grid
legend('Location','best')
hello
maybe you are not obliged to store all computed datas in cell arrays
if the purpose is simply to make plots , there is a less complicated and memory intensive code :
clear all
clc
X = load('R_0.01T.csv'); % input data
SzX = size(X); % matrix size
rows = (1:5); % define here which rows to process
figure(2), hold on
for ci =1:numel(rows)
r=X(rows(ci),:); % row of the data set
r = r(~isnan(r)); % remove all nan enteries
% [f,x,flo,fhi] = ecdf(r);
[f,x] = homemade_ecdf(r); % FEX : https://fr.mathworks.com/matlabcentral/fileexchange/32831-homemade-ecdf
% remove duplicates
[x,ia,ic] = unique(x);
f = f(ia);
% resample the data (interpolation) on linearly spaced points
xx = linspace(min(x),max(x),100);
ff = interp1(x,f,xx);
% smoothing
ffs = smoothdata(ff, 'loess',40);
% compute gradient from spline fitted data (my choice)
dx = mean(diff(xx));
dfdxs = gradient(ffs)./dx;
plot(xx, dfdxs);
legend_str{ci} = (['Rows # :' num2str(ci)]);
end
legend(legend_str);
hold off
And the smoothed curves integrate to 1 over the interval ? Because they are supposed to be "pdf" curves.
seems so
i added one line in the for loop to check the values .
out = trapz(dfdxs)*dx % check if integral = 1
Got more or less a result in range 0.9 - 0.95 for the first 5 rows tested
X = load('R_0.01T.csv'); % input data
SzX = size(X); % matrix size
rows = (1:5); % define here which rows to process
figure(2), hold on
for ci =1:numel(rows)
r=X(rows(ci),:); % row of the data set
r = r(~isnan(r)); % remove all nan enteries
% [f,x,flo,fhi] = ecdf(r);
[f,x] = homemade_ecdf(r); % FEX : https://fr.mathworks.com/matlabcentral/fileexchange/32831-homemade-ecdf
% remove duplicates
[x,ia,ic] = unique(x);
f = f(ia);
% resample the data (interpolation) on linearly spaced points
xx = linspace(min(x),max(x),100);
ff = interp1(x,f,xx);
% smoothing
ffs = smoothdata(ff, 'loess',40);
% compute gradient from spline fitted data (my choice)
dx = mean(diff(xx));
dfdxs = gradient(ffs)./dx;
plot(xx, dfdxs);
legend_str{ci} = (['Rows # :' num2str(ci)]);
out = trapz(dfdxs)*dx % check if integral = 1
end
legend(legend_str);
hold off
Ok. Nevertheless, in place of the OP, I'd also try the "ksdensity" function from MATLAB to recover the PDF from the empirical data.
It seems especially designed for this purpose and should also be able to cope with noisy data.
@Mathieu NOE the key issue is my data set is not have sane dimenions so i lefgt with only one choice to store in a cell array. I again attemed to plot the expected result using your script but not working. may you have a look:
clear all
clc
X = load('PDS_case.csv'); % input data
C_sort = sortrows(X,5);
Tri = C_sort(1:86,:);
data1 = sortrows(Tri,7);
PDS=(data1(:,7))';
data2=data1(:,8:66);
data4=data2';
Y=data4(:,1:86);
X=Y';
UU=(10, 50, 80, 120, 160, 200) % these values linked with each color bar, should be make a color bar
r{1}=[X(1:60,:)];
r{2}=[X(61:68,:)];
r{3}=[X(69:76,:)];
r{4}=[X(77:79,:)];
r{5}=[X(80:81,:)];
r{6}=[X(82:85,:)];
figure
hold on
for k = 1:numel(r) % ... Then, Remove The 'NaN' Elements ...
r{k} = r{k}(~isnan(r{k}));
[f{k},x{k}] = ecdf(r{k});
[x{k},ia{k},ic{k}] = unique(x);
[x{k},ia{k},ic{k}] = unique(x{k});
f{k} = f(ia{k});
xx{k} = linspace(min(x{k}),max(x{k}),100);
ff{k} = interp1(x{k,f{k,xx{k});
% smoothing
ffs{k} = smoothdata(ff{k}, 'loess',40)
% compute gradient from spline fitted data (my choice)
dx{k} = mean(diff(xx{k}));
subplot(numel(rows)/2, 2, ci)
dfdxs{k} = gradient(ffs{k})./dx{k};
plot(xx{k}, dfdxs{k}, 'b--', 'linewidth', 1.2)
end
hold off
Final results be expected like this.
something like that ?
clear all
clc
X = load('PDS_case.csv'); % input data
C_sort = sortrows(X,5);
Tri = C_sort(1:86,:);
data1 = sortrows(Tri,7);
PDS=(data1(:,7))';
data2=data1(:,8:66);
data4=data2';
Y=data4(:,1:86);
X=Y';
UU=[10, 50, 80, 120, 160, 200]; % these values linked with each color bar, should be make a color bar
r{1}=[X(1:60,:)];
r{2}=[X(61:68,:)];
r{3}=[X(69:76,:)];
r{4}=[X(77:79,:)];
r{5}=[X(80:81,:)];
r{6}=[X(82:85,:)];
figure
cm = colormap(jet(numel(r)));
cc = colorbar('vert');
set(cc,'Ticks',((1:numel(UU))/numel(UU))');
set(cc,'TickLabels',num2str(UU'));
hold on
for k = 1:numel(r)
% ... Then, Remove The 'NaN' Elements ...
rr = r{k};
rr = rr(~isnan(rr));
% [f ,x ] = ecdf(rr ); % for your version
[f ,x ] = homemade_ecdf(rr ); % my version with FEX : https://fr.mathworks.com/matlabcentral/fileexchange/32831-homemade-ecdf
[x ,ia ,ic ] = unique(x );
f = f(ia );
xx = linspace(min(x ),max(x ),100);
ff = interp1(x ,f ,xx );
% smoothing
ffs = smoothdata(ff , 'loess',40);
% compute gradient from spline fitted data (my choice)
dx = mean(diff(xx ));
% subplot(numel(rows)/2, 2, ci)
dfdxs = gradient(ffs )./dx ;
plot(xx , dfdxs , '-', 'linewidth', 2, 'Color',cm(k,:))
end
hold off
Thank a lot. I think now i very near to the expected results. I have modified your script a bit to actual condition and plot the results. But woundering, why i unable to set color bar as jet, secondly the tick bar are not required only few ont power termed be fine.
clear all
clc
X = load('PDS_case.csv'); % input data
C_sort = sortrows(X,5);
Tri = C_sort(1:86,:);
data1 = sortrows(Tri,7);
PDS=(data1(:,7))';
data2=data1(:,8:66);
data4=data2';
Y=data4(:,1:86);
X=Y';
UU=[2e-8, 4e-8, 6e-8, 8e-8, 1e-7, 3e-7]; % these values linked with each color bar, should be make a color bar
r{1}=[X(1:60,:)];
r{2}=[X(61:68,:)];
r{3}=[X(69:76,:)];
r{4}=[X(77:79,:)];
r{5}=[X(80:81,:)];
r{6}=[X(82:85,:)];
figure
cm = colormap(jet(numel(r)));
cc = colorbar('vert');
set(cc,'Ticks',((1:numel(UU))/numel(UU))');
set(cc,'TickLabels',num2str(UU'));
ylabel(cc,'PDS')
hold on
for k = 1:numel(r) % ... Then, Remove The 'NaN' Elements ...
ra = r{k};
ra = ra(~isnan(ra));
[f ,x ] = ecdf(ra);
[x ,ia ,ic ] = unique(x);
f = f(ia );
xx = linspace(min(x ),max(x ),100);
ff = interp1(x ,f ,xx );
ffs = smoothdata(ff , 'loess',10);
dx = mean(diff(xx ));
dfdxs = gradient(ffs )./dx ;
plot(xx , dfdxs , '-', 'linewidth', 1, 'Color',cm(k,:))
ylim([0, 8])
end
hold off
grid
Here is what i get
and the expected one should be like
hello again
1/ seems the expected curves are smoother compare to what we have now. You can probably increase the smoothing (driven by the value in smoothdata parameter)
ffs = smoothdata(ff , 'loess',10); % increase 10 =>20
2/ scale of the colorbar (ticks) : I don't know what are the values you specify in UU coming from ?

Sign in to comment.

More Answers (0)

Products

Tags

Asked:

on 29 Mar 2022

Commented:

on 1 Apr 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!