How to plot a smooth curve for Empirical probability density ?
You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Show older comments
0 votes
Hi everyone,
I require to plot an empirical probability density curve of a data set (184 rows, 59 columns). Initially, I pick one column to plot the results but, the EPD curve looks so wired. May someone suggest to me how can I get desired result (Attached). Data is also attached for reference.
clear all
clc
X = load('R_0.01T.csv'); % input data
SzX = size(X) % matrix size
r=[X(1,:)]; % first row of the data set
ra = r(~isnan(r)); % remove all nan enteries
[f,x,flo,fhi] = ecdf(ra);
WinLen = 5
dfdxs = smoothdata(gradient(f)./gradient(x), 'movmedian',WinLen); % ... Then, Smooth Them Again ...
aaa = smooth(dfdxs);
plot(x, aaa)
This is what i get

Here is the expected results.

Accepted Answer
Mathieu NOE
on 29 Mar 2022
hello
tried a few options , smoothdata and spline fit. In fact, I was thinking you wanted something super smooth so I first opted for the spline fit. But I saw that the expected plot was supposed to have some waviness , so I changed to smoothdata
NB as I don't have the toolbox for ecdf , I found an alternative on FEX for it https://fr.mathworks.com/matlabcentral/fileexchange/32831-homemade-ecdf
but I had to correct a bug (the corrected function is attaced fyi)
hope the code below can help you
clear all
clc
X = load('R_0.01T.csv'); % input data
SzX = size(X); % matrix size
r=X(1,:); % first row of the data set
ra = r(~isnan(r)); % remove all nan enteries
% [f,x,flo,fhi] = ecdf(ra);
[f,x] = homemade_ecdf(ra); % FEX : https://fr.mathworks.com/matlabcentral/fileexchange/32831-homemade-ecdf
% remove duplicates
[x,ia,ic] = unique(x);
f = f(ia);
% resample the data (interpolation) on linearly spaced points
xx = linspace(min(x),max(x),100);
ff = interp1(x,f,xx);
% smoothing
ffs = smoothdata(ff, 'loess',40);
% or spline fit ?
% Breaks interpolated from data (log x scale to get more points in the
% lower x range and less in the upper x range)
breaks = logspace(log10(min(x)),log10(max(x)),5); %
p = splinefit(xx,ff,breaks); %
ff2 = ppval(p,xx);
figure(1),
plot(x,f,'*',xx,ffs,'-',xx,ff2,'-')
legend('raw','smoothed','spline fit');
% compute gradient from spline fitted data (my choice)
dx = mean(diff(xx));
dfdxs = gradient(ffs)./dx;
figure(2),plot(xx, dfdxs)

14 Comments
Andi
on 29 Mar 2022
ppval is not work on 2021a version. However other function works perfectly:
clear all
clc
X = load('R_0.01T.csv'); % input data
SzX = size(X); % matrix size
r=X(1,:); % first row of the data set
ra = r(~isnan(r)); % remove all nan enteries
% [f,x,flo,fhi] = ecdf(ra);
[f,x] = ecdf(ra); % FEX : https://fr.mathworks.com/matlabcentral/fileexchange/32831-homemade-ecdf
% remove duplicates
[x,ia,ic] = unique(x);
f = f(ia);
% resample the data (interpolation) on linearly spaced points
xx = linspace(min(x),max(x),100);
ff = interp1(x,f,xx);
% % smoothing
ffs = smoothdata(ff, 'loess',40);
dx = mean(diff(xx));
dfdxs = gradient(ffs)./dx;
figure(2),plot(xx, dfdxs)
I not quiet well understand why we need to apply unique function.
Mathieu NOE
on 29 Mar 2022
hello again
if you need to resample or interpolate, the x data must be unique and monotonic
Mathieu NOE
on 29 Mar 2022
I am surprised you don't have ppval function in your release
that's part of the "basic" matlab package (polynomial functions)

Andi
on 29 Mar 2022
Even, I am using online. It should be ,....

Andi
on 29 Mar 2022
I attempted to generalize your script but it seems there are some problems. May you suggest some wayout :
clear all
clc
X = load('R_0.01T.csv');
X = load('R_0.01T.csv');
SzX = size(X)
r{1}=[X(1,:)];
r{2}=[X(2,:)];
r{3}=[X(3,:)];
for k = 1:numel(r)
ra{k} = r{k}(~isnan(r{k}));
[f{k},x{k}] = ecdf(ra{k});
[x{k},ia,ic] = unique(x{k});
f{k} = f(ia);
xx{k} = linspace(min(x{k}),max(x{k}),100);
ff{k} = interp1(x{k},f{k},xx{k});
ffs{k} = smoothdata(ff{k}, 'loess',40);
dx{k} = mean(diff(xx{k}));
dfdxs{k} = gradient(ffs{k})./dx{k};
end
figure
hold on
for k = 1:numel(r) % ... Then, Plot The End Result!
plot(xx{k}, dfdxs{k}, 'DisplayName',sprintf('event{%2d}',k))
end
hold off
grid
legend('Location','best')
Torsten
on 29 Mar 2022
Usually, ksdensity is used in cases like yours:
Mathieu NOE
on 29 Mar 2022
hello
maybe you are not obliged to store all computed datas in cell arrays
if the purpose is simply to make plots , there is a less complicated and memory intensive code :

clear all
clc
X = load('R_0.01T.csv'); % input data
SzX = size(X); % matrix size
rows = (1:5); % define here which rows to process
figure(2), hold on
for ci =1:numel(rows)
r=X(rows(ci),:); % row of the data set
r = r(~isnan(r)); % remove all nan enteries
% [f,x,flo,fhi] = ecdf(r);
[f,x] = homemade_ecdf(r); % FEX : https://fr.mathworks.com/matlabcentral/fileexchange/32831-homemade-ecdf
% remove duplicates
[x,ia,ic] = unique(x);
f = f(ia);
% resample the data (interpolation) on linearly spaced points
xx = linspace(min(x),max(x),100);
ff = interp1(x,f,xx);
% smoothing
ffs = smoothdata(ff, 'loess',40);
% compute gradient from spline fitted data (my choice)
dx = mean(diff(xx));
dfdxs = gradient(ffs)./dx;
plot(xx, dfdxs);
legend_str{ci} = (['Rows # :' num2str(ci)]);
end
legend(legend_str);
hold off
Torsten
on 29 Mar 2022
And the smoothed curves integrate to 1 over the interval ? Because they are supposed to be "pdf" curves.
Mathieu NOE
on 29 Mar 2022
seems so
i added one line in the for loop to check the values .
out = trapz(dfdxs)*dx % check if integral = 1
Got more or less a result in range 0.9 - 0.95 for the first 5 rows tested
X = load('R_0.01T.csv'); % input data
SzX = size(X); % matrix size
rows = (1:5); % define here which rows to process
figure(2), hold on
for ci =1:numel(rows)
r=X(rows(ci),:); % row of the data set
r = r(~isnan(r)); % remove all nan enteries
% [f,x,flo,fhi] = ecdf(r);
[f,x] = homemade_ecdf(r); % FEX : https://fr.mathworks.com/matlabcentral/fileexchange/32831-homemade-ecdf
% remove duplicates
[x,ia,ic] = unique(x);
f = f(ia);
% resample the data (interpolation) on linearly spaced points
xx = linspace(min(x),max(x),100);
ff = interp1(x,f,xx);
% smoothing
ffs = smoothdata(ff, 'loess',40);
% compute gradient from spline fitted data (my choice)
dx = mean(diff(xx));
dfdxs = gradient(ffs)./dx;
plot(xx, dfdxs);
legend_str{ci} = (['Rows # :' num2str(ci)]);
out = trapz(dfdxs)*dx % check if integral = 1
end
legend(legend_str);
hold off
Torsten
on 29 Mar 2022
Ok. Nevertheless, in place of the OP, I'd also try the "ksdensity" function from MATLAB to recover the PDF from the empirical data.
It seems especially designed for this purpose and should also be able to cope with noisy data.
Andi
on 30 Mar 2022
@Mathieu NOE the key issue is my data set is not have sane dimenions so i lefgt with only one choice to store in a cell array. I again attemed to plot the expected result using your script but not working. may you have a look:
clear all
clc
X = load('PDS_case.csv'); % input data
C_sort = sortrows(X,5);
Tri = C_sort(1:86,:);
data1 = sortrows(Tri,7);
PDS=(data1(:,7))';
data2=data1(:,8:66);
data4=data2';
Y=data4(:,1:86);
X=Y';
UU=(10, 50, 80, 120, 160, 200) % these values linked with each color bar, should be make a color bar
r{1}=[X(1:60,:)];
r{2}=[X(61:68,:)];
r{3}=[X(69:76,:)];
r{4}=[X(77:79,:)];
r{5}=[X(80:81,:)];
r{6}=[X(82:85,:)];
figure
hold on
for k = 1:numel(r) % ... Then, Remove The 'NaN' Elements ...
r{k} = r{k}(~isnan(r{k}));
[f{k},x{k}] = ecdf(r{k});
[x{k},ia{k},ic{k}] = unique(x);
[x{k},ia{k},ic{k}] = unique(x{k});
f{k} = f(ia{k});
xx{k} = linspace(min(x{k}),max(x{k}),100);
ff{k} = interp1(x{k,f{k,xx{k});
% smoothing
ffs{k} = smoothdata(ff{k}, 'loess',40)
% compute gradient from spline fitted data (my choice)
dx{k} = mean(diff(xx{k}));
subplot(numel(rows)/2, 2, ci)
dfdxs{k} = gradient(ffs{k})./dx{k};
plot(xx{k}, dfdxs{k}, 'b--', 'linewidth', 1.2)
end
hold off
Final results be expected like this.

Mathieu NOE
on 30 Mar 2022
something like that ?

clear all
clc
X = load('PDS_case.csv'); % input data
C_sort = sortrows(X,5);
Tri = C_sort(1:86,:);
data1 = sortrows(Tri,7);
PDS=(data1(:,7))';
data2=data1(:,8:66);
data4=data2';
Y=data4(:,1:86);
X=Y';
UU=[10, 50, 80, 120, 160, 200]; % these values linked with each color bar, should be make a color bar
r{1}=[X(1:60,:)];
r{2}=[X(61:68,:)];
r{3}=[X(69:76,:)];
r{4}=[X(77:79,:)];
r{5}=[X(80:81,:)];
r{6}=[X(82:85,:)];
figure
cm = colormap(jet(numel(r)));
cc = colorbar('vert');
set(cc,'Ticks',((1:numel(UU))/numel(UU))');
set(cc,'TickLabels',num2str(UU'));
hold on
for k = 1:numel(r)
% ... Then, Remove The 'NaN' Elements ...
rr = r{k};
rr = rr(~isnan(rr));
% [f ,x ] = ecdf(rr ); % for your version
[f ,x ] = homemade_ecdf(rr ); % my version with FEX : https://fr.mathworks.com/matlabcentral/fileexchange/32831-homemade-ecdf
[x ,ia ,ic ] = unique(x );
f = f(ia );
xx = linspace(min(x ),max(x ),100);
ff = interp1(x ,f ,xx );
% smoothing
ffs = smoothdata(ff , 'loess',40);
% compute gradient from spline fitted data (my choice)
dx = mean(diff(xx ));
% subplot(numel(rows)/2, 2, ci)
dfdxs = gradient(ffs )./dx ;
plot(xx , dfdxs , '-', 'linewidth', 2, 'Color',cm(k,:))
end
hold off
Thank a lot. I think now i very near to the expected results. I have modified your script a bit to actual condition and plot the results. But woundering, why i unable to set color bar as jet, secondly the tick bar are not required only few ont power termed be fine.
clear all
clc
X = load('PDS_case.csv'); % input data
C_sort = sortrows(X,5);
Tri = C_sort(1:86,:);
data1 = sortrows(Tri,7);
PDS=(data1(:,7))';
data2=data1(:,8:66);
data4=data2';
Y=data4(:,1:86);
X=Y';
UU=[2e-8, 4e-8, 6e-8, 8e-8, 1e-7, 3e-7]; % these values linked with each color bar, should be make a color bar
r{1}=[X(1:60,:)];
r{2}=[X(61:68,:)];
r{3}=[X(69:76,:)];
r{4}=[X(77:79,:)];
r{5}=[X(80:81,:)];
r{6}=[X(82:85,:)];
figure
cm = colormap(jet(numel(r)));
cc = colorbar('vert');
set(cc,'Ticks',((1:numel(UU))/numel(UU))');
set(cc,'TickLabels',num2str(UU'));
ylabel(cc,'PDS')
hold on
for k = 1:numel(r) % ... Then, Remove The 'NaN' Elements ...
ra = r{k};
ra = ra(~isnan(ra));
[f ,x ] = ecdf(ra);
[x ,ia ,ic ] = unique(x);
f = f(ia );
xx = linspace(min(x ),max(x ),100);
ff = interp1(x ,f ,xx );
ffs = smoothdata(ff , 'loess',10);
dx = mean(diff(xx ));
dfdxs = gradient(ffs )./dx ;
plot(xx , dfdxs , '-', 'linewidth', 1, 'Color',cm(k,:))
ylim([0, 8])
end
hold off
grid
Here is what i get

and the expected one should be like

Mathieu NOE
on 1 Apr 2022
hello again
1/ seems the expected curves are smoother compare to what we have now. You can probably increase the smoothing (driven by the value in smoothdata parameter)
ffs = smoothdata(ff , 'loess',10); % increase 10 =>20
2/ scale of the colorbar (ticks) : I don't know what are the values you specify in UU coming from ?
More Answers (0)
Categories
Find more on Create Plots on Maps in Help Center and File Exchange
Products
Tags
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
