Perform nonlinear least-squares regression
grpData is a groupedData object specifying the data to fit. responseMap is a string or cell array of strings that maps model components to response data in grpData. estimatedInfo is an estimatedInfo object that defines the estimated parameters in the model sm.
If Statistics Toolbox™ is available, the function uses nlinfit for estimation as the default algorithm. If not, but Optimization Toolbox™ is available, fminunc is used. If both are unavailable, fminsearch is used.
If the model contains active doses and variants, they are applied during the simulation.
fitResults = sbiofit(sm,grpData,responseMap,estiminfo,dosing) uses the dosing information specified by a matrix of SimBiology dose objects dosing instead of using the active doses of the model sm if there is any.
fitResults = sbiofit(sm,grpData,responseMap,estiminfo,dosing,functionName) uses the estimation function specified by functionName. If the specified function is unavailable, a warning is issued and the first available default function is used.
This example uses the yeast heterotrimeric G protein model and experimental data reported by . For details about the model, see the Background section in Parameter Scanning, Parameter Estimation, and Sensitivity Analysis in the Yeast Heterotrimeric G Protein Cycle.
Load the G protein model.
Store the experimental data containing the time course for the fraction of active G protein as reported in the reference paper .
time = [0 10 30 60 110 210 300 450 600]'; GaFracExpt = [0 0.35 0.4 0.36 0.39 0.33 0.24 0.17 0.2]';
Create a groupedData object based on the experimental data.
tbl = table(time,GaFracExpt); grpData = groupedData(tbl);
Map the appropriate model component to the experimental data. In other words, indicate which species in the model corresponds to which response variable in the data. In this example, map the model parameter GaFrac to the experimental data variable GaFracExpt from grpData.
responseMap = 'GaFrac = GaFracExpt';
Use an estimatedInfo object to define the model parameter kGd as a parameter to be estimated.
estimatedParam = estimatedInfo('kGd');
Perform the parameter estimation.
fitResult = sbiofit(m1,grpData,responseMap,estimatedParam);
View the estimated parameter value of kGd.
ans = Name Estimate StandardError _____ ________ _____________ 'kGd' 0.11 0.00037969
SimBiology model, specified as a SimBiology model object. The active configset object of the model contains solver settings for simulation. Any active doses and variants are applied to the model during simulation unless specified otherwise using the dosing and variants input arguments, respectively.
Data to fit, specified as a groupedData object.
The name of the time variable must be defined in the IndependentVariableName property of grpData. For instance, if the time variable name is 'TIME', then specify it as follows.
grpData.Properties.IndependentVariableName = 'TIME';
If the data contains more than one group of measurements, the grouping variable name must be defined in the GroupVariableName property of grpData. For example, if the grouping variable name is 'GROUP', then specify it as follows.
grpData.Properties.GroupVariableName = 'GROUP';
A group usually refers to a set of measurements that represent a single time course, often corresponding to a particular individual or experimental condition.
Mapping information of model components to grpData, specified as a string or cell array of strings.
Each string is an equation-like expression, similar to assignment rules in SimBiology. It contains the name (or qualified name) of a quantity (species, compartment, or parameter) in the model sm, followed by the character '=' and the name of a variable in grpData. For clarity, white spaces are allowed between names and '='.
For example, if you have the concentration data 'CONC' in grpData for a species 'Drug_Central', you can specify it as follows.
responseMap = 'Drug_Central = CONC';
To unambiguously name a species, use the qualified name, which includes the name of the compartment. To name a reaction-scoped parameter, use the reaction name to qualify the parameter. If the name is not a valid MATLAB® variable name, surround it by square brackets such as [reaction 1].[parameter 1].
An error is issued if any (qualified) name matches two components of the same type. However, you can use a (qualified) name that matches two components of different types, and the function first finds the species with the given name, followed by compartments and then parameters.
Estimated parameters, specified as an estimatedInfo object or vector of estimatedInfo objects that defines the estimated parameters in the model sm and their initial estimates (optional). You can also specify parameter transformations if necessary. Supported transforms are log, logit, and probit. For details, see estimatedInfo object.
Dosing information, specified as an empty array or 2-D matrix of dose objects (ScheduleDose object or RepeatDose object). If empty, no doses are applied during simulation, even if the model has active doses. If not empty, the matrix must have a single row or one row per group in the data grpData. If it has a single row, same doses are applied to all groups during simulation. If it has multiple rows, each row is applied to a separate group, in the same order as the groups appear in grpData.
Multiple columns are allowed so that you can apply multiple dose objects to each group. Each column of doses must reference the same components in the model sm. Specifically, all doses (except for empty doses) in a column must have the same values for TargetName, DurationParameterName, and LagParameterName. If some groups require more doses than others, then fill in the matrix with dummy doses that are either default doses or empty doses.
A default dose has default values for all properties, except for the Name property. An empty dose has a dose amount of 0, thus having no effect on the model. Create a default dose as follows.
d1 = sbiodose('d1');
In addition to manually constructing dose objects, if grpData has dosing information, you can use the createDoses method to construct doses.
Estimation function name, specified as a string. Choices are as follows.
'nlinfit' (Statistics Toolbox is required.)
'fminunc' (Optimization Toolbox is required.)
'lsqcurvefit' (Optimization Toolbox is required.)
'lsqnonlin' (Optimization Toolbox is required.)
'patternserch' (Global Optimization Toolbox is required.)
'ga' (Global Optimization Toolbox is required.)
'pso' (Global Optimization Toolbox is required.)
If Statistics Toolbox is available, the function uses the nlinfit algorithm for estimation as the default. If not, but Optimization Toolbox is available, fminunc is used. If both are unavailable, fminsearch is used as the default.
Options specific to the estimation function, specified as a struct or optimoptions object.
optimset struct for fminsearch
optimoptions object for lsqcurvefit, lsqnonlin, fminunc, and pso
gaoptimset struct for ga
psoptimset struct for patternsearch
statset struct for nlinfit
See Default Options for Estimation Algorithms for more details and default options associated with each estimation function.
Variants, specified as an empty array or vector of variant objects. If empty, no variants are used even if the model has active variants.
Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.Example: 'ErrorModel','constant','UseParallel',true specifies a constant error model and to run simulations in parallel during parameter estimation.
Error model used for estimation, specified as a string. There are four built-in error models. Each model defines the error using a standard mean-zero and unit-variance (Gaussian) variable e, simulation results f, and one or two parameters a and b.
Weights used for estimation, specified as a matrix of real positive weights, where the number of columns corresponds to the number of responses, or a function handle that accepts a vector of predicted response values and returns a vector of real positive weights.
If you specify an error model, you cannot use weights except for the constant error model. If neither the 'ErrorModel' or 'Weights' is specified, by default the function uses the constant error model with equal weights.
Fit option flag, specifying whether to fit each individual (false) or pool all individual data (true).
When true, the function performs fitting for all individuals or groups simultaneously using the same parameter estimates, and fitResults is a scalar results object.
When false, the function fits each group or individual separately using group- or individual-specific parameters, and fitResults is a vector of results objects with one result for each group.
Flag for parallel simulations during fitting, specified as true or false. If true and Parallel Computing Toolbox™ is available, the function runs simulations in parallel.
When 'Pooled' is set to false, the function fits each group separately using group-specific parameters, and fitResults is a vector of results objects with one results object for each group.
Simulation results, returned as a vector of SimData objects representing simulation results for each group or individual.
If the 'Pooled' option is false, then each simulation uses group-specific parameter values. If true, then all simulations use the same (population-wide) parameter values.
The states reported in simdata are the states that were included in the responseMap input argument as well as any other states listed in the StatesToLog property of the runtime options (RuntimeOptions) of the SimBiology model sm.
The function sbiofit uses maximum likelihood estimation (MLE) for fitting. Mathematically, the problem is typically formulated as the minimization of the negative of the logarithm of the likelihood function. For the constant error model, this is equivalent to the method of least-squares, which minimizes the sum of squares of the residuals. For other error models, the objective function has additional terms. When the error models are normally distributed, the negative log-likelihood can be written as follows.
|N||Number of experimental observations|
|yi||The ith experimental observation|
|Predicted value of the ith observation, which is a function of independent variables xi and estimated parameters p|
|Standard deviation of the ith observation. For instance, for the combined error model , the standard deviation is .|
When you use numeric weights or the weight function, the weights are assumed to be proportional to the variance of the error, i.e., where a is the constant error parameter. If you use weights, you cannot specify an error model except the constant error model.
Various parameter estimation methods have different requirements on the objective function. For some methods, the estimation of model parameters is performed independently of the estimation of the error model parameters. The following table summarizes the objective function F and any separate formulas used for the estimation of error model parameters. ri is the ith residual, a and b are error model parameters, and fi is the predicted value of the ith observation.
|Method||Supported Error Model||Objective Function||Error Parameter Estimation Function|
|Numeric weights (wi)|
|nlinfit||All||nlinfit also estimates error parameters.|
|All others||'combined'||Error parameters are included in minimization.|
|All others||'exponential'||, where is the standard deviation of the log-transformed ith observation.|
sbiofit uses the default options structure associated with nlinfit, except for:
sbiofit uses the default options structure associated with fminsearch, except for:
Requires Optimization Toolbox.
sbiofit uses the default options structure associated with lsqcurvefit, except for:
Requires Optimization Toolbox.
sbiofit uses the default options structure associated with lsqnonlin, except for:
Requires Global Optimization Toolbox.
sbioparamestim uses the default options structure associated with patternsearch, except for:
Requires Global Optimization Toolbox.
sbioparamestim uses the default options structure associated with ga, except for:
Requires Global Optimization Toolbox.sbioparamestim uses the following default options for the pso (particle swarm optimization) algorithm, except for:
See the Function Descriptions table from the sbioparamestim reference page for more details about this algorithm.
 Yi, T-M., Kitano, H., and Simon, M. (2003). A quantitative characterization of the yeast heterotrimeric G protein cycle. PNAS. 100, 10764–10769.
estimatedInfo object | fminsearch | fminunc | ga | groupedData object | LeastSquaresResults object | lsqcurvefit | lsqnonlin | nlinfit | NLINResults object | OptimResults object | patternsearch | sbiofitmixed