356 views (last 30 days)

Show older comments

Hello all,

I need to fit a nonlinear model to several data sets simultaneously. The model has the same functional form for all sets, and the values of some model parameters are the same for all sets, but the value of at least one parameter is different for each set. For example, three exponential decay curves might have the same decay constant but a different amplitude for each data set. The global fit to all three curves would produce one decay constant and three amplitudes.

What is the best way to do this in MatLab? (Apologizing in advance for a novice question!)

ELELAB

Tom Lane
on 11 Dec 2012

Take a look at this example of a function that partitions the parameters into a set sharing values across all datasets and a set taking different values for different datasets. You may be able to modify this for your purposes.

function sharedparams

t = (0:10)';

T = [t; t; t];

Y = 3 + [exp(-t/2); 2*exp(-t/2); 3*exp(-t/2)] + randn(33,1)/10;

dsid = [ones(11,1); 2*ones(11,1); 3*ones(11,1)];

gscatter(T,Y,dsid)

X = [T dsid];

b = nlinfit(X,Y,@subfun,ones(1,5))

line(t,b(1)+b(2)*exp(-t/b(5)),'color','r')

line(t,b(1)+b(3)*exp(-t/b(5)),'color','g')

line(t,b(1)+b(4)*exp(-t/b(5)),'color','b')

function yfit = subfun(params,X)

T = X(:,1); % time

dsid = X(:,2); % dataset id

A0 = params(1); % same A0 for all datasets

A1 = params(2:4)'; % different A1 for each dataset

tau = params(5); % same tau

yfit = A0 + A1(dsid).*exp(-T/tau);

This uses nlinfit from the Statistics Toolbox, but you could use other fitting routines instead.

Tom Lane
on 27 Apr 2021 at 20:14

j dr
on 8 Feb 2011

Normaly you could identify both parameters in a fit using lsqcurvefit (least square fit) and a function of the type:

function F=myfun(x,xdata)

F=x(1)*exp(-1*x(2)*xdata)

then

[x,resnorm] = lsqcurvefit(@myfun,StartingValues,xdata,ydata,LowerBound,UpperBound,options);

x will contain amplitude and decay rate that best match your data starting at StartingValue between LowerBound and UpperBound.

You can then appreciate that your 3 datasets have roughly the same decay rate and different amplitudes, OR:

If you know in advance which parameter will be shared, You can use lsqcurvefit with a function that uses a global variable as your shared parameter.

function F=myfunFixedAmp(x,xdata)

global A

F=A*exp(-1*x(1)*xdata)

or a different finction for a fixed decay rate

function F=myfunFixedDecay(x,xdata)

global Alpha

F=x(1)*exp(-1*Alpha*xdata)

j dr
on 6 Apr 2011

Like I said in the top half of my post, if you use lsqcurvefit on each of the 3 curves independently, you would get:

A1,Alpha1 A2,Alpha2 A3,Alpha3

You could then verify if all "A"s are similar or all "Alpha"s are similar (by checking the standard deviation or the standard deviation divided by the mean to have a relative value of which of "A"s of "Alpha"s are more similar).

If this does not solve your problem I think you might need to restate it.

Richard Willey
on 6 Apr 2011

Hi Kenneth

Coincidentially, I did a webinar a couple weeks back that uses lsqcurvefit to solve just this type of problem.

You can download the code from http://www.mathworks.com/matlabcentral/fileexchange/30869-fitting-with-matlab-statistics-optimization-and-curve-fitting

(You want the analysis.m example)

The webinar was titled "Fitting with MATLAB: Statistics, Optimization, and Curve Fitting". The recorded webinar should be up on the Statistics Toolbox product page shortly

Yih Shan Yap
on 25 Nov 2016

DiRa
on 11 Dec 2012

Dear all, it's been some time now since the last answer was posted, but I'm having almost the same problem now. I thought the last answer from Richard would help me, but it actually didn't. So here's my problem:

I have about 200 2D-datasets, where the x-scale shows the time(t)-dependence.

I need to fit a nonlinear exponetial decay function onto each one of these 200 datasets, that looks in principal like this:

y = y0 + A1*exp(-t/tau1) + A2*exp(-t/tau2) + A3*exp(-t/tau3)

The important thing is now that tau1,tau2 and tau3 must be the same for all 200 datasets and y0, A1, A2 and A3 may vary, but all of them have to be fitted by a lsqcurvefit at once. So the variables tau have to be constant for all datasets in the end, but is an unknown variable during the lsqcurvefit and should be optimized by the fit.

To specify it again: I want to do a lsqcurvefit for the variables y0,A1,A2,A3,tau1,tau2,tau3 for every single 2D dataset and in addition i need to do a global fit for tau1,tau2,tau3.

Is there a nice and fast way to calculate this kind of fit in matlab?

When I tried to fully understand the analysis.m i had the problem that it is not running after downloading all files because the is missing a needed function or script.

Thanks alot!

Seth DeLand
on 11 Dec 2012

So if I understand correctly, you want to fit tau1, tau2, tau3 to all the data, but the other parameters are fit on an individual dataset basis? If so, I think this solution of breaking this up into a linear lsq problem embedded in a nonlinear lsq problem would work:

Identify tau1, tau2, tau3 for all of the data sets with lsqcurvefit. The other parameters y0, A1, A2, A3 can be identified on an individual basis inside the lsqcurvefit objective function (and this will be pretty fast because it's a linear least squares problem to solve for these).

Lsqcurvefit will be modifying the values for tau1, tau2, tau3, and then calling your objective function. Your objective function will:

1) Loop over the 200 datasets:

- solve a linear least-squares problem to identify y0,A1,A2,A3 for each dataset (this can be done with "x = A\b", which returns the least-squares soln for underdetermined systems: http://www.mathworks.com/help/matlab/ref/mldivide.html).
- calculate the value of your function at each point for each dataset: A*x

2) Combine all of the ydata from the 200 runs to be passed back to lsqcurvefit.

This approach may be kind of time consuming (it has to do 200 matrix solves each time the objective function is calculated), depending on the size of your datasets. But, it should be quicker and give you a better answer than trying to solve this as one large nonlinear curve fitting problem.

DiRa
on 12 Dec 2012

Dear Seth and Tom,

I want to thank you both for your fast response. I used Toms suggestion since my script was already looking like that in some parts.

Since I needed some lower and upper bounds I then used the lsqcurvefit. After fixing some memory problems it then worked perfectly.

Thanks alot again!

Yi
on 18 Apr 2013

@Kenneth: I understand precisely what your problem is and I am dealing with very similar situation: fitting 2D Gaussian function to hundreds of small pieces of ROI cut out from a large image, with the same global sigma, but different intensities and centroids. In fact in Igor, there is a built-in function called "global fit" that does the job. But honestly I don't know any elegant way to do the same in Matlab.

@Tom: Your solution will work for small set of data, like Kenneth's three exponential functions. But In my problem, I am dealing with greater than 500 sets of data at the same time, which would mean thousands of parameters to optimize at a time. But most of the parameters from different ROI don't "crosstalk", ie the Jacobian matrix is extremely spars. I am not sure if Matlab is smart enough to figure that out.

The workaround solution I can think of for large number of data sets like for my case is the E-M approach, which is to deal the common global parameter and other local parameters separately. But it can be slow too...

Julie
on 14 May 2014

Edited: Walter Roberson
on 29 May 2020

Dear fellow Matlab users I have a similar problem but can't seem to solve the issue with the above suggestions:

I am trying to use nlinfit to fit 14 sets of data to obtain a common exchange rate while a second scaling variable is also allowed to vary during the fit and should be different for each data set. I load a data file (f1) where the first row contains a 0 followed by my independent variables (24 of them) and each subsequent row has in column 1 a variable corresponding to the data in that row (ligand concentration) followed by the dependant variable (intensity) for the corresponding independent variables in row 1. I am getting the following error:

Error using nlinfit (line 122) Requires a vector second input argument.

Error in fit_lineshape_exchange (line 42) [beta,R]=nlinfit(x,y,@lineshape_exchange, beta0);

my function is below:

function f = lineshape_exchange(a)

%a(1) is k (the rate), a(2) is c (the weighting function allowing each

%spectrum to vary in intensity)

global K L v1 v2 Tp1 Tp2 w n

K=0.0000033;

Tp1=4.96;

Tp2=7.25;

v1=12.212;

v2=13.114;

f=0;

for j=1:n

t=1/(a(1*L*(1+K)));

p1=K*a(1)*L*t;

p2=1-p1;

W=v1-v2;

P=t*((1/Tp1*Tp2)-4*pi^2*w^2+pi^2*W^2)+(p1/Tp1)+(p2/Tp2);

Q=t*(2*pi*w-pi*W*(p1-p2));

R=2*pi*w*(1+t*((1/Tp1)+(1/Tp2)))+pi*W*t*((1/Tp1)-(1/Tp2)+pi*W*(p1-p2));

f=a(2)*(P*(1+t*(p2/Tp1 + p1/Tp2))+Q*R)/(P^2+R^2);

end

Silvina Pugliese
on 11 Jun 2018

Edited: Silvina Pugliese
on 11 Jun 2018

Hi everyone,

I have used the idea from Tom Lane here (thank you very much, Tom!) to make a code that is general to any number of datasets. The code as it is now, generates some datasets for testing, you will need to load your data and change the fitting function to your fitting model. Best regards, Silvina. Link: https://github.com/SilvinaPugliese/Global-fitting-in-MATLAB

Leonardo Scantamburlo
on 20 Dec 2018

Hi everyone!

I used the code from @Richard, thanks a lot!

My question is: i have 10 datasets at 10 different known temperatures, so i don't need to fit the temperature. Then I have an equation with 3 unkown parameters and i have to fit the 10 curves with a different temperature and the same unknown parameters. How can i modify the code?

Thanks a lot

Leonardo

Tom Lane
on 29 May 2020

My posted answer from 11-Dec-2012 shows how to deal with multiple datasets that are roughly similar. That isn't necessary, though. Here is a variation of that answer where the X values are not common across groups and aren't even of the same size.

function sharedparams

% Set up data so that Y is a function of T with a specific functional form,

% but there are three groups and one parameter varies across groups.

t1 = (0:10)'; t2 = (0:2:10)'; t3 = (2:9)';

T = [t1; t2; t3];

Y = 3 + [exp(-t1/2); 2*exp(-t2/2); 3*exp(-t3/2)] + randn(size(T))/10;

dsid = [ones(size(t1)); 2*ones(size(t2)); 3*ones(size(t3))];

gscatter(T,Y,dsid)

% Pack up the time and dataset id variables into X for later unpacking

X = [T dsid];

b = nlinfit(X,Y,@subfun,ones(1,5))

tt = linspace(0,10)';

line(tt,b(1)+b(2)*exp(-tt/b(5)),'color','r')

line(tt,b(1)+b(3)*exp(-tt/b(5)),'color','g')

line(tt,b(1)+b(4)*exp(-tt/b(5)),'color','b')

function yfit = subfun(params,X)

T = X(:,1); % unpack time from X

dsid = X(:,2); % unpack dataset id from X

A0 = params(1); % same A0 for all datasets

A1 = params(2:4)'; % different A1 for each dataset

tau = params(5); % same tau

yfit = A0 + A1(dsid).*exp(-T/tau);

Tom Lane
on 29 May 2020

Edited: Tom Lane
on 5 Jun 2020

My posted answer from 11-Dec-2012 shows how to deal with multiple datasets following roughly the same functional form with parameters that vary from one dataset to the other. That isn't necessary, though. Here is a variation of that answer where the functional forms differ across the datasets. They share some parameters, but others are distinct to each dataset.

function sharedparams2

% Set up data so that Y is a function of T with a different functional

% form across three groups. The groups share a common constant parameter

% and a constant time scale parameter, but otherwise follow a different

% functional relationship.

rng default

t1 = linspace(0,10,40)'; t2 = linspace(0,10,25)'; t3 = sort(2+7*rand(15,1));

T = [t1; t2; t3];

Y = [F1(t1,[3 2 1]); F2(t2,[3 1 1]); F3(t3,[3 2 1])];

Y = Y + randn(size(Y))/5;

dsid = [ones(size(t1)); 2*ones(size(t2)); 3*ones(size(t3))];

gscatter(T,Y,dsid)

% Pack up the time and dataset id variables into X for later unpacking

X = [T dsid];

b = nlinfit(X,Y,@subfun,ones(1,5))

tt = linspace(0,10)';

line(tt,F1(tt,b([1 2 5])),'color','r')

line(tt,F2(tt,b([1 3 5])),'color','g')

line(tt,F3(tt,b([1 4 5])),'color','b')

end

function yfit = subfun(params,X)

T = X(:,1); % unpack time from X

dsid = X(:,2); % unpack dataset id from X

A0 = params(1); % same A0 for all datasets

A1 = params(2:4)'; % different A1 for each dataset

tau = params(5); % same tau

yfit = zeros(size(T));

% Find separate groups and call the F function for each group

idx = (dsid==1);

yfit(idx) = F1(X(idx),[A0 A1(1) tau]);

idx = (dsid==2);

yfit(idx) = F2(X(idx),[A0 A1(2) tau]);

idx = (dsid==3);

yfit(idx) = F3(X(idx),[A0 A1(3) tau]);

end

% Below are the three functions for the three groups

function y = F1(x,b)

y = b(1) + b(2)*exp(-x/b(3));

end

function y = F2(x,b)

y = b(1) + b(2)*sin(x/b(3));

end

function y = F3(x,b)

y = b(1) + b(2)*cos(x/(2*b(3)));

end

Tai-Yen Chen
on 14 Apr 2021 at 13:25

@Alex Sha, thanks for the reply. but the a and b need to be a value between 0 and 1 though.

Tom Lane
on 9 Feb 2021

@Tai-Yen Chen, if you concatenate, then the sharing is built in. You don't have to account for different subsets of the data following different models.

I don't think anything I wrote easily extends to imposing constraints on the parameters. I don't have a good idea of how you might use nlinfit to do this. If it turns out the estimates satisfy the constraints, then you are all set. If not, then the constraints would place one or both of a and b on the boundary. You could perhaps regard that parameter as fixed and solve for the other. For example, if you get a=-0.1 try setting a=0 (omitting the term with a) and solve only for b. That could possibly give you an estimate and confidence interval for b under the assumption that a=0.

I know Optimization Toolbox has functions like fminunc that can impose constraints on the parameters. That would not produce confidence intervals, though. Perhaps you could use bootstrapping.

Jack Nolan
on 25 Feb 2021

I am trying to wright a MATLAB program to perfrom nonlinear regression accross multiple datasets (in a similiar manner). But in my question I have 3 unknown shared parameters and 1 known parameters whose value will be different for each dataset.

Unfortuanately I am getting an error (something to do with deviations between matrix dimiension of Y and the model function I provided). See below error. I don't see where I'm going wrong as I have followed your methodology exactly and have debugged each line without finding a solution. If you could point out my mistake I would realy appreciate it. Thanks.

ERROR:

Error using nlinfit (line 219)

MODELFUN must be a function that returns a vector of fitted values the same size as Y (160-by-1). The model function you provided returned a result that was 160-by-160.

One common reason for a size mismatch is using matrix operators (*, /, ^) in your function instead of the corresponding elementwise operators (.*, ./, .^).

Error in untitled (line 18)

b = nlinfit(X,Y,@subfun,ones(1,3))

CODE:

clc

clear all

close all

t = xlsread('data.xlsx', 'Sheet1', 'F4:F43'); % x data

y1 = xlsread('data.xlsx', 'Sheet1', 'G4:G43'); % y data 1

y2 = xlsread('data.xlsx', 'Sheet1', 'M4:M43'); % y data 1

y3 = xlsread('data.xlsx', 'Sheet1', 'AG4:AG43'); % y data 1

y4 = xlsread('data.xlsx', 'Sheet1', 'AM4:AM43'); % y data 1

T = [t; t; t; t] % vector of x data sets

Y = [y1; y2; y3; y4] % vector of y data sets

dsid = [ones(40,1); 2*ones(40,1); 3*ones(40,1);4*ones(40,1)] % ones(40,1) is a 40 x 1 array of 1's

gscatter(T,Y,dsid)

X = [T dsid] %%%%%%%%

tau = [63085.05; 1525.3; 1601.8; 62465.3]

b = nlinfit(X,Y,@subfun,ones(1,3))

line(t,b(1) * log(1 + t/tau(1)) + b(2) * log(1 + (t/tau(1))*(1/b(3))), 'color', 'r');

line(t,b(1) * log(1 + t/tau(2)) + b(2) * log(1 + (t/tau(2))*(1/b(3))), 'color', 'g');

line(t,b(1) * log(1 + t/tau(3)) + b(2) * log(1 + (t/tau(3))*(1/b(3))), 'color', 'b');

line(t,b(1) * log(1 + t/tau(4)) + b(2) * log(1 + (t/tau(4))*(1/b(3))), 'color', 'c');

function yfit = subfun(param,X)

T = X(:,1) % time

dsid = X(:,2); % dataset id

A1 = param(1); % unkown paramater (common to each dataset)

A2 = param(2); % unkown paramater (common to each dataset)

gamma = param(3); % unkown paramater (common to each dataset)

tau = [63085.05; 1525.3; 1601.8; 62465.28756]; %known paramter (unique to each dataset)

yfit = A1 * log(1 + T/tau(dsid)) + A2 * log(1 + (T/tau(dsid))*(1/gamma));

end

I also posted the question here:

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!