Error trying to fit data to gaussian function: Inf computed by model function, fitting cannot continue

I am trying to find out whether my data fit to a gaussian distribution or not and am using the fit function for that.
When I try to execute this line:
[~,gof]=fit((1:length(median(norm_spikes_flight_control,2,'omitnan')))', ...
median(norm_spikes_flight_control,2,'omitnan'),'gauss1');
I get the following error message:
Error using fit>iFit
Inf computed by model function, fitting cannot continue.
Try using or tightening upper and lower bounds on coefficients.
Error in fit (line 116)
[fitobj, goodness, output, convmsg] = iFit( xdatain, ydatain, fittypeobj, ...
I have attached the 3001x1 double that results from "median(norm_spikes_flight_control,2,'omitnan')" for a set of data that gives no error and one dataset where the code above works just fine.
If anyone is sufficiently knowledgeable what is happening here and how to find my error, I would be extremely grateful.

Answers (1)

d1=readmatrix('data_working.txt');
d2=readmatrix('data_error.txt');
plot([d1 d2]), xlim([1 numel(d1)]), legend('Working','Error')
Neither of those look anything at all like a single-peak Gaussian when plotted seqentially which is what the fit() algorithm tries to match.
You can use Statistics TB fitdist to consider the data as a whole without the given order; histfit does it and shows a figure when done...
histfit(d1)
fitdist(d1,'normal')
ans =
NormalDistribution Normal distribution mu = -0.0737158 [-0.0755425, -0.0718891] sigma = 0.0510356 [0.0497764, 0.0523606]
[mean(d1) std(d1)]
ans = 1×2
-0.0737 0.0510
The result for it is, you see simply the mean and standard deviation with error estimates.
histfit(d2)
What is the purpose behind trying to fit a Gaussian model to those data as shown originally? That model simply doesn't fit the data in that form and the histograms show the overall distributions are multi-mode and also far from being simply a single Gaussian distribution.
ADDENDUM
I forgot to add the look at what the fitted model actually did...
x=[1:numel(d1)].';
[f,gof]=fit(x,d1,'gauss1')
f =
General model Gauss1: f(x) = a1*exp(-((x-b1)/c1)^2) Coefficients (with 95% confidence bounds): a1 = 0.01526 (-0.02586, 0.05637) b1 = 607.3 (559.1, 655.5) c1 = 21.89 (-46.23, 90.02)
gof = struct with fields:
sse: 24.1150 rsquare: -2.0862 dfe: 2998 adjrsquare: -2.0882 rmse: 0.0897
plot(f,x,d1,'fit','residuals')
subplot(2,1,1), ylabel('D1'), xlim([1 numel(d1)])
legend location southwest
subplot(2,1,2), ylabel('Residuals'), xlim([1 numel(d1)])
legend location southwest
As the above shows, the fitted model is totally inadequate; an adjusted R-square of -2 is absurd. If one looks very carefully, one can see the model fit red line overlays the blue data at the peak around x=600; otherwise the model prediction is essentially zero. The plot of the model residuals is almost indistinguishable from the data plot although the magnitude of the peak at x=607 is almost removed; anywhere else farther away from that location parameter value the exponential term rapidly vanishes. The model simply doesn't represent the data at all in that form.
figure, hAx=subplot(2,1,1);
xlim([1 numel(x)])
plot(f)
xlim([1 numel(x)])
legend off
title('Model Full Range')
subplot(2,1,2)
xlim([400 800])
plot(f)
legend off
title('Model About Central Peak')
illustrates the model output itself without the original data; it shows that it is, indeed, a Gaussian, but that other than around the one central location the value is essentially zero because the exponential vanishes.
Again, would need to know what the end purpose here is in order to have any idea of what might be feasible/make sense; simply throwing data at a model without exploratory analysis is exceedingly dangerous.

4 Comments

d=readmatrix('data_error.txt');
plot(d), xlim([1 numel(d)]), legend('Error')
x=[1:numel(d)].';
[f,gof]=fit(x,d,'gauss2')
f =
General model Gauss2: f(x) = a1*exp(-((x-b1)/c1)^2) + a2*exp(-((x-b2)/c2)^2) Coefficients (with 95% confidence bounds): a1 = -0.07336 (-0.0742, -0.07251) b1 = 883.1 (876.3, 889.8) c1 = 559.7 (550.1, 569.2) a2 = -0.09688 (-0.09839, -0.09537) b2 = 2978 (2935, 3020) c2 = 1048 (1003, 1093)
gof = struct with fields:
sse: 0.2379 rsquare: 0.8834 dfe: 2995 adjrsquare: 0.8832 rmse: 0.0089
plot(f,x,d,'fit','residuals')
subplot(2,1,1), ylabel('D'), xlim([1 max(x)])
legend location southwest
subplot(2,1,2), ylabel('Residuals'), xlim([1 max(x)])
legend location southwest
Got to wondering how it would do if did try to model the larger patterns; then it doesn't do too badly; the residuals plot shows quite a lot of finer structure remaining, but the overall representation isn't too bad; the adjusted Rsq is almost 0.9.
Now, whether it means anything or not is another question; just because you can do something doesn't mean you should.
Okay, first of all thank you very much for that very detailed answer!
It seems I did a poor job putting into text what I wanted to do.
So to explain better why I'm doing this:
These data show normalized neuronal activity from an animal performing a turn and I want to find out if that activity increases leading up to the turn. These data are 500 ms before and after the highest velocity of a turn (x=1500). So I am not exactly trying to find a good fit for the data but I want to find out whether they roughly fit a gaussian distribution yes or no. Some of my data will fit quite well when a given neuron encodes turns and others that do not, will not fit at all. Does this explain it a bit better?
The part that confused me most was getting the error message only sometimes when trying the fit. Would I be right in the assumption that the reason I get that error comes from an absolutely terrible fit to the gaussian distribution?
That would be one conclusion to be drawn, yes. I haven't dug into the details of the solution algorithm used as to the exact cause of the inf being returned; somewhere it's dividing by zero which most likely comes from underflow of the tails of the normal pdf function being evaluated. But, I'd contend that's certainly not a reliable test; it's simply a heuristic occurrence that is dependent upon some specific condition within a dataset.
For your case, given the sample data, I'd say neither is a normal distribution and that your analysis/observation of your data should be prepared to observe/find the multi-modal cases and identify them as opposed to a case that actually does show a normal-like distribution.
I <answered a previous Q? some time back> on testing for normality, I still haven't implemented an m-file to do the Shapiro-Wilk, however.
Thank you again for taking the time with this!
I will give this some thought and discuss this with some people in our group to see how we want to tackle this in the future. Definitely something to keep i
When it comes to the Shapiro-Wilk test, there is a function by Ahmed Ben Saïda in the community file exchange that I could use for testing for normality.

Sign in to comment.

Categories

Products

Release

R2023a

Asked:

on 10 Aug 2023

Commented:

on 14 Sep 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!