How can obtain the probability density function for a random discreate set of data and fit a custom distribution function ??

I have random data. I want to calculate the probability density function. I also fit a custom distribution function which is the where gamma is the gamma function of n and n is the distribution parameter and x is the variable.
I attached the data. Can you provide the solution for this ??

5 Comments

No, This is the custom single parameter gamma distribution.
Assuming that P(n,x) is the probability density function, is it true that P(n,x) = 0 for x < 0 ?

Sign in to comment.

Answers (2)

Use "histogram" to plot the empirical probability density function for your data and use "mle" to estimate n.
Why are there so many data repeated multiple times ? Did you create them artificially ?

8 Comments

No, these are measured data.
Can i write custom code to fit the custom distribution function without using any function from matlab library ??
Can i write custom code to fit the custom distribution function without using any function from matlab library ??
Sure. Set up the likelihood function for your distribution and determine n such that the function is maximized.
Look up the paragraph
Continuous distribution, continuous parameter space
under
to see how to proceed.
The equation in the question is a gamma distribution with b = 1/a using Matlab's convention (doc link). For example, with a = 4
x = xlsread('temp_data.xlsx');
histogram(x,'Normalization','pdf')
hold on
a = 4;
plot(0:.01:3,gampdf(0:.01:3,a,1/a))
I'm just showing this so you can properly input your estimated parameter into gampdf if you want ot use that function.
Thank you @Paul. Can i use gamfit or fitdist for this problem for b=1/a, So I can get the godness of fit parameter.
No, I don't think you can use gamfit or fitdist because they don't provide an option to constrain the value of b as 1/a. However, you can use mle as @Torsten already suggested in this answer.
But, I thought you wanted to write your own code without using any function from the Matlab library. Of course, if you write your code in Matlab at some point you'll be relying on Matlab functions, but I think I know (or thought I knew) what you mean.
I tried to fit distribution using mle for different set of data, but i getting very poor result. I attached the figure. Why is it happening for one set of data it gives good result and for other set of data gives poor result?? Is it due to nature of data or anything else??
x = xlsread('temp_data.xlsx');
pdf = @(x,n)n^n/gamma(n) * x.^(n-1) .* exp(-n*x);
[phat,pci] = mle(x,'pdf',pdf,'Start',1)
phat = 3.3776
pci = 2×1
3.1120 3.6431
hold on
h = histogram(x);
h.Normalization = 'pdf';
y = linspace(0,max(x),100);
plot(y,pdf(y,phat))
hold off
Thank you for your inout. It's works. How can i know the goodness of the distribution fit if i fit using mle??

Sign in to comment.

Can you write custom code to fit a PDF to data? Yes. It is not that truly difficult, if you know what you are doing. HOWEVER, SHOULD you write custom code to fit a PDF using maximum likelihood estimation? NO. It already exists in MATLAB, Use those codes. Writing your own code to do something already provided to you is a bad idea. If you don't have a clue what you are doing and don't have the skills in such things, then your result will be poor. Use code provided to you by professionals. Even there, you can have problems if you use the code incorrectly or provide it bad data that does not fit the distribution, but at least you give yourself a chance of success.
Now, does your data have a problem? YES. It is not at all random looking. In fact, what you have is not even a set of samples from a distribution, but apparently a cumulative histogram of sorts.
x = xlsread('temp_data.xlsx');
plot(x)
In there I see what appears to be stairsteps in your "data". So these are not samples from a continuous distribution. For example, the first 54 points in that vector are identical. It is not clear how you generated this data, which you claim to have measured.
So first, you need to explain what you have in that data set. How you created it.

2 Comments

Actually, Data are droplet size of the spray. I measured the drop size of spray using some software. The spray created identical drop size that why the data are in repeatative manner.
I think writing one's own code to solve a problem is great way to learn. The "professional" code can be used for comparison.

Sign in to comment.

Products

Release

R2023a

Asked:

on 8 Dec 2023

Commented:

on 14 Dec 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!