Somewhat lengthy question on distribution fitting

2 views (last 30 days)
My apologies in advance for what I expect to be a lengthy background section leading up to my question.
I'm working on a set of decision analysis methodologies for making choices about alternatives that arrive asynchronously over a time period. One of the methods requires a cumulative distribution function (CDF) for one aspect of the alternatives. In exploring the performance of the methods, I try several different methods of generating the CDF for some sample data:
  1. Using an empirically-derived CDF generated by MatLab that precisely fits the observed data
  2. Using an approximation via a triangular distribution (since those are easy when data is scarce)
  3. Using an approximation that uses a "standard" distribution to fit the observed data
My question regards that third method. I have some sample data, and in my first shot at this I used the Arena Input Analyzer tool against three data sets. For two of the three it suggested distributions that performed almost as well as the empirically-derived exact fit. These were
  1. 25 + exponential(261)
  2. 127 + exponential(1030)
For the third data set though, it suggested LogNormal(1.96, 3.23) which worked like a dog....using that CDF literally performed worse than just flipping a coin at each decision point.
So I figured I'd use MatLab to fit distributions and see if I got better results. And for the one that Arena missed badly on, given the exact same text file of input data, MatLab suggested LogNormal(0.0185, 1.1458)...note the significantly different parameters. This worked like a champ-again as good as the empirical one. So I figured I'd go on with MatLab to fit the other two data sets. What MatLab suggested was
  1. LogNormal(5.2912, 0.8789)
  2. LogNormal(6.7327, 0.8078)
And these two were dogs! My suspicion is that it has something to do with that "offset" that you see in the Arena suggested distribution. MatLab seems to be trying to only fit to a "straight" distribution with no offset term like that.
So here's my question: is there a way to get MatLab to identify an offset term in examining a data set for distribution fitting?
Ideally my final methodology will just involve running a fit (if you have historical data to fit to) which I think is a fairly low bar. If you first have to examine the data and determine an appropriate offset manually and then adjust all the data to account for it, I think it's of less use.
I hope this makes sense, and that I haven't bored you sleep yet. Any help would be greatly appreciated.
  4 Comments
Image Analyst
Image Analyst on 21 Apr 2015
Nonetheless, I agree with Star - screenshots would help us visualize, even if it's just for one example set of data.
Jeremy Hendrix
Jeremy Hendrix on 21 Apr 2015
Okay....here is a screenshot of the output I get from the Arena Input Analyzer
And using the MatLab Distribution Fitting Tool on the same data file
They choose significantly different binning. Arena estimates the distribution as 25 + expo(261) while MatLab return LogNormal(5.2912, 0.8789)

Sign in to comment.

Answers (3)

Image Analyst
Image Analyst on 21 Apr 2015
I haven't used those functions. I've never heard of the Arena Input Analyzer. What toolbox are they in? Please list it below your question. Is it the stats toolbox or curve fitting toolbox or something else?
What does the histogram of your actual data look like? Is it more like the bars in the top plot (like an exponential decay) or in the bottom plot (like a log-normal or Poisson)?
You say: " is there a way to get MatLab to identify an offset term in examining a data set for distribution fitting?" Can you subtract the mean and then see this: http://www.mathworks.com/matlabcentral/answers/94272-how-do-i-constrain-a-fitted-curve-through-specific-points-like-the-origin-in-matlab
By the way, for what it's worth, here's an interesting File Exchange submission that has dozens of distributions: http://www.mathworks.com/matlabcentral/fileexchange/7309-randraw
  3 Comments

Sign in to comment.


Hannes Driessen
Hannes Driessen on 20 May 2018
Your confusion arrises from the fact that the parameters used for a lognormal distribution in Matlab represent the parameters from the underlying normal distribution. If you want to use those in Rockwell Arena, you'll first need to transform them into the mu and sigma from the lognormal distribution (https://en.wikipedia.org/wiki/Log-normal_distribution#Arithmetic_moments). Then you'll see that the parameters found with the Input Analyzer tool in Arena closely resemble the parameter estimates you get from Matlab.

Jeff Miller
Jeff Miller on 21 May 2018
Edited: Jeff Miller on 21 May 2018
You might be able to do a lot of what you want with the routines here: Cupid
To create a standard distribution with an offset, you would write something like this:
% Create exponential distribution with an offset of 100.
mydist=AddTrans(Exponential(.01),100);
Assuming your to-be-fitted data is in an array x, you could then get maximum likelihood estimates of the exponential rate and additive offset with:
mydist.EstML(x)
Actually, in the case of an exponential plus a constant, the MLE of the constant will always be the minimum value in the data set (perhaps minus a few eps to avoid numerical problems).

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!