Accelerating the pace of engineering and science

# sbiofit

Perform nonlinear least-squares regression

## Syntax

• fitResults = sbiofit(sm,grpData,responseMap,estiminfo) example
• fitResults = sbiofit(sm,grpData,responseMap,estiminfo,dosing) example
• fitResults = sbiofit(sm,grpData,responseMap,estiminfo,dosing,functionName) example
• fitResults = sbiofit(sm,grpData,responseMap,estiminfo,dosing,functionName,options) example
• fitResults = sbiofit(sm,grpData,responseMap,estiminfo,dosing,functionName,options,variants) example
• fitResults = sbiofit(_,Name,Value) example
• [fitResults,simdata] = sbiofit(_) example

## Description

example

fitResults = sbiofit(sm,grpData,responseMap,estiminfo) estimates parameters of a SimBiology model sm using nonlinear least-squares regression.

grpData is a groupedData object specifying the data to fit. responseMap is a string or cell array of strings that maps model components to response data in grpData. estimatedInfo is an estimatedInfo object that defines the estimated parameters in the model sm.

If you have specified bound constraints in estiminfo, sbiofit uses the first available function among the following: fmincon, nlinfit, or fminsearch.

If there are no bound constraints, sbiofit uses the first available function among the following: nlinfit, fminunc, or fminsearch.

nlinfit requires Statistics Toolbox™, and fmincon and fminunc require Optimization Toolbox™.

By default, each group in grpData is fit separately, resulting in group-specific parameter estimates. If the model contains active doses and variants, they are applied during the simulation.

example

fitResults = sbiofit(sm,grpData,responseMap,estiminfo,dosing) uses the dosing information specified by a matrix of SimBiology dose objects dosing instead of using the active doses of the model sm if there is any.

example

fitResults = sbiofit(sm,grpData,responseMap,estiminfo,dosing,functionName) uses the estimation function specified by functionName. If the specified function is unavailable, a warning is issued and the first available default function is used.

example

fitResults = sbiofit(sm,grpData,responseMap,estiminfo,dosing,functionName,options) uses the additional options specified by options for the function functionName.

example

fitResults = sbiofit(sm,grpData,responseMap,estiminfo,dosing,functionName,options,variants) applies variant objects specified as variants instead of using any active variants of the model.

example

fitResults = sbiofit(_,Name,Value) uses additional options specified by one or more Name,Value pair arguments.

example

[fitResults,simdata] = sbiofit(_) also returns a vector of SimData objects simdata using any of the input arguments in the previous syntaxes.

 Note:   sbiofit unifies sbionlinfit and sbioparamestim estimation functions. Use sbiofit to perform nonlinear least-squares regression.sbiofit simulates the model using a SimFunction object, which automatically accelerates simulations by default. Hence it is not necessary to run sbioaccelerate before you call sbiofit.

## Examples

expand all

### Fit a One-Compartment Model to an Individual's PK Profile

Background

This example shows how to fit an individual's PK profile data to one-compartment model and estimate pharmacokinetic parameters.

Suppose you have drug plasma concentration data from an individual and want to estimate the volume of the central compartment and the clearance. Assume the drug concentration versus the time profile follows the monoexponential decline , where is the drug concentration at time t, is the initial concentration, and is the elimination rate constant that depends on the clearance and volume of the central compartment .

The synthetic data in this example was generated using the following model and parameters:

• One-compartment model with bolus dosing and first-order elimination

• Volume of the central compartment (Central) = 1.70 liter

• Clearance parameter (Cl_Central) = 0.55 liter/hour

• Constant error model

The data is stored as a table with variables Time and Conc that represent the time course of the plasma concentration of an individual after an intravenous bolus administration measured at 13 different time points. The variable units for Time and Conc are hour and milligram/liter, respectively.

```clear all
```
```plot(data.Time,data.Conc,'b+')
xlabel('Time');
ylabel('Drug Concentration');
```

Convert to groupedData Format

Convert the data set to a groupedData object, which is the required data format for the fitting function sbiofit for later use. A groupedData object also lets you set independent variable and group variable names (if they exist). Set the units of the Time and Conc variables. The units are optional and only required for the UnitConversion feature, which automatically converts matching physical quantities to one consistent unit system.

```gData = groupedData(data);
gData.Properties.VariableUnits = {'hour','milligram/liter'};
gData.Properties
```

groupedData automatically set the name of the IndependentVariableName property to the Time variable of the data.

```ans =

Description: ''
VariableDescriptions: {}
VariableUnits: {'hour'  'milligram/liter'}
DimensionNames: {'Row'  'Variable'}
UserData: []
RowNames: {}
VariableNames: {'Time'  'Conc'}
GroupVariableName: ''
IndependentVariableName: 'Time'

```

Construct a One-Compartment Model

Use the built-in PK library to construct a one-compartment model with bolus dosing and first-order elimination where the elimination rate depends on the clearance and volume of the central compartment. Use the configset object to turn on unit conversion.

```pkmd                    = PKModelDesign;
pkc1.DosingType         = 'Bolus';
pkc1.EliminationType    = 'linear-clearance';
pkc1.HasResponseVariable = true;
model                   = construct(pkmd);
configset               = getconfigset(model);
configset.CompileOptions.UnitConversion = true;
```

For details on creating compartmental PK models using the SimBiology® built-in library, see Create Pharmacokinetic Models.

Define Dosing

Define a single bolus dose of 10 milligram given at time = 0. For details on setting up different dosing schedules, see Doses.

```dose                = sbiodose('dose');
dose.TargetName     = 'Drug_Central';
dose.StartTime      = 0;
dose.Amount         = 10;
dose.AmountUnits    = 'milligram';
dose.TimeUnits      = 'hour';
```

Map Response Data to the Corresponding Model Component

The data contains drug concentration data stored in the Conc variable. This data corresponds to the Drug_Central species in the model. Therefore, map the data to Drug_Central as follows.

```responseMap = {'Drug_Central = Conc'};
```

Specify Parameters to Estimate

The parameters to fit in this model are the volume of the central compartment (Central) and the clearance rate (Cl_Central). In this case, specify log-transformation for these biological parameters since they are constrained to be positive. The estimatedInfo object lets you specify parameter transforms, initial values, and parameter bounds (optional).

```paramsToEstimate    = {'log(Central)','log(Cl_Central)'};
estimatedParams     = estimatedInfo(paramsToEstimate,'InitialValue',[1 1]);
```

Estimate Parameters

Now that you have defined one-compartment model, data to fit, mapped response data, parameters to estimate, and dosing, use sbiofit to estimate parameters. By default, sbiofit uses the constant error model and nlinfit estimation method, which requires Statistics Toolbox (TM). The results in this example are returned by the nlinfit method. If you do not have Statistics Toolbox, the default function will change, and your results may be slightly different when you run this example.

```nlinfitConst = sbiofit(model,gData,responseMap,estimatedParams,dose)
```
```nlinfitConst =

NLINResults with properties:

GroupName: [1x1 categorical]
Beta: [2x3 table]
ParameterEstimates: [2x3 table]
J: [13x2 double]
COVB: [2x2 double]
CovarianceMatrix: [2x2 double]
R: [13x1 double]
MSE: 0.0375
SSE: 0.4122
Weights: []
EstimatedParameterNames: {'Central'  'Cl_Central'}
ErrorModel: 'constant'
ErrorParameters: [1x1 table]
EstimationFunction: 'nlinfit'

```

Display Estimated Parameters and Plot Results

Notice the parameter estimates were not far off from the true values (1.70 and 0.55) that were used to generate the data. You may also try different error models to see if they could further improve the parameter estimates.

```nlinfitConst.ParameterEstimates
plot(nlinfitConst);
```
```ans =

Name        Estimate    StandardError
____________    ________    _____________

'Central'        1.6993     0.034822
'Cl_Central'    0.53358     0.019679

```

Use Different Error Models

Try three other supported error models (proportional, combination of constant and proportional error models, and exponential).

```nlinfitProp = sbiofit(model,gData,responseMap,estimatedParams,dose,...
'ErrorModel','proportional');
nlinfitExp  = sbiofit(model,gData,responseMap,estimatedParams,dose,...
'ErrorModel','exponential');
nlinfitComb = sbiofit(model,gData,responseMap,estimatedParams,dose,...
'ErrorModel','combined');
```

Compare Sum of Squared Errors (SSE)

Compare the SSE of each fit to see which error model best fits the data. However, only the constant and exponential error models provide the raw residuals, that is, observed minus fitted values. The proportional and combined error models provide weighted residuals. Thus to compare SSE of error models, first calculate the raw (unweighted) residuals for the proportional and combined error models.

The results returned by sbiofit contain fitted values at both experimental and solver time points. Use the fitted function of the results object to get all the simulated (fitted) data. Then use the resample function to extract only those fitted values at experimental time points.

```propSimData     = fitted(nlinfitProp);
combSimData     = fitted(nlinfitComb);
expTime         = gData.Time;
propSimData     = resample(propSimData,expTime);
combSimData     = resample(combSimData,expTime);
```

Calculate the sum of squared errors.

```nlinfitPropR    = gData.Conc - propSimData.Data;
nlinfitPropSSE  = sum(nlinfitPropR.^2);
nlinfitCombR    = gData.Conc - combSimData.Data;
nlinfitCombSSE  = sum(nlinfitCombR.^2);
```

Construct a table of SSEs for all error models. Use the reported SSE of the constant and exponential error models by accessing the SSE property of the corresponding results object. For the proportional and combined error models, use the calculated SSEs.

```ErrorModels     = {'Constant','Proportional','Exponential','Combined'}';
SSE             = [nlinfitConst.SSE,nlinfitPropSSE,nlinfitExp.SSE,nlinfitCombSSE]';
sseTable        = table(ErrorModels,SSE)
```
```sseTable =

ErrorModels        SSE
______________    _______

'Constant'         0.4122
'Proportional'     0.7176
'Exponential'     0.63299
'Combined'         0.4122

```

As shown in the table, the constant and combined error models have the lowest (and same) SSE. However, the constant error model is a better model since it uses only one error parameter (a) whereas the combined error model uses two (a and b).

In addition, notice that the value of b from the combined error model is close to zero, which also suggests the constant error model is better.

```nlinfitConst.ErrorParameters
```
```ans =

a
_______

0.19358

```
```nlinfitComb.ErrorParameters
```
```ans =

a           b
_______    __________

0.17807    4.2452e-08

```

Display Estimated Parameter Values

Show the estimated parameter values of each error model.

```allResults              = [nlinfitConst,nlinfitProp,nlinfitExp,nlinfitComb];
Error_Model             = cell(4,1);
Estimated_Central       = zeros(4,1);
Estimated_Cl_Central    = zeros(4,1);
t = table(Error_Model,Estimated_Central,Estimated_Cl_Central);
for i = 1:height(t)
t{i,1} = {allResults(i).ErrorModel};
t{i,2} = allResults(i).ParameterEstimates.Estimate(1);
t{i,3} = allResults(i).ParameterEstimates.Estimate(2);
end
t
```
```t =

Error_Model      Estimated_Central    Estimated_Cl_Central
______________    _________________    ____________________

'constant'        1.6993               0.53358
'proportional'    1.7986               0.50816
'exponential'     1.7872               0.51701
'combined'        1.6993               0.53358

```

Conclusion

This example showed how to estimate PK parameters, namely the volume of the central compartment and clearance parameter of an individual, by fitting the PK profile data to one-compartment model. You compared SSE and estimated parameter values of different error models to see which model best explained the data. Final fitted results suggested both the constant and combined error models provided the closest estimates to the parameter values used to generate the data. However, the constant error model is a better model since it uses one fewer parameter than the combined model.

### Fit a Two-Compartment Model to PK Profiles of Multiple Individuals

This example shows how to estimate pharmacokinetic parameters of multiple individuals using a two-compartment model.

Suppose you have drug plasma concentration data from three individuals that you want to use to estimate corresponding pharmacokinetic parameters, namely the volume of central and peripheral compartment (Central, Peripheral), the clearance rate (Cl_Central), and intercompartmental clearance (Q12). Assume the drug concentration versus the time profile follows the biexponential decline ${C}_{t}=A{e}^{-at}+B{e}^{-bt}$, where Ct is the drug concentration at time t, a and b are slopes for corresponding exponential declines.

The synthetic data set contains drug plasma concentration data measured in both central and peripheral compartments. The data was generated using a two-compartment model with an infusion dose and first-order elimination. These parameters were used for each individual.

CentralPeripheralQ12Cl_Central
Individual 11.900.680.240.57
Individual 22.106.050.360.95
Individual 31.704.210.460.95

The data is stored as a table with variables ID, Time, CentralConc, and PeripheralConc. It represents the time course of plasma concentrations measured at eight different time points for both central and peripheral compartments after a bolus dose.

```clear all
```

Convert the data set to a groupedData object which is the required data format for the fitting function sbiofit for later use. A groupedData object also lets you set independent variable and group variable names (if they exist). Set the units of the ID, Time, CentralConc, and PeripheralConc variables. The units are optional and only required for the UnitConversion feature, which automatically converts matching physical quantities to one consistent unit system.

```gData = groupedData(data);
gData.Properties.VariableUnits = {'','hour','milligram/liter','milligram/liter'};
gData.Properties
```
```ans =

Description: ''
VariableDescriptions: {}
VariableUnits: {1x4 cell}
DimensionNames: {'Row'  'Variable'}
UserData: []
RowNames: {}
VariableNames: {'ID'  'Time'  'CentralConc'  'PeripheralConc'}
GroupVariableName: 'ID'
IndependentVariableName: 'Time'

```

Create a trellis plot that shows the PK profiles of three individuals.

```sbiotrellis(gData,'ID','Time',{'CentralConc','PeripheralConc'});
```

Use the built-in PK library to construct a two-compartment model with infusion dosing and first-order elimination where the elimination rate depends on the clearance and volume of central compartment. Use the configset object to turn on unit conversion.

```pkmd                 = PKModelDesign;
pkc1.DosingType      = 'Infusion';
pkc1.EliminationType = 'linear-clearance';
pkc1.HasResponseVariable = true;
model                = construct(pkmd);
configset            = getconfigset(model);
configset.CompileOptions.UnitConversion = true;
```

Assume every individual receives an infusion dose at time = 0, with a total infusion amount of 100 mg at a rate of 50 mg/hour. For details on setting up different dosing strategies, see Doses.

```dose             = sbiodose('dose','TargetName','Drug_Central');
dose.StartTime   = 0;
dose.Amount      = 100;
dose.Rate        = 50;
dose.AmountUnits = 'milligram';
dose.TimeUnits   = 'hour';
dose.RateUnits   = 'milligram/hour';
```

The data contains measured plasma concentrations in the central and peripheral compartments. Map these variables to the appropriate model species, which are Drug_Central and Drug_Peripheral.

```responseMap = {'Drug_Central = CentralConc','Drug_Peripheral = PeripheralConc'};
```

The parameters to estimate in this model are the volumes of central and peripheral compartments (Central and Peripheral), intercompartmental clearance Q12, and clearance rate Cl_Central. In this case, specify log-transform for Central and Peripheral since they are constrained to be positive. The estimatedInfo object lets you specify parameter transforms, initial values, and parameter bounds (optional).

```paramsToEstimate   = {'log(Central)','log(Peripheral)','Q12','Cl_Central'};
estimatedParam     = estimatedInfo(paramsToEstimate,'InitialValue',[1 1 1 1]);
```

Fit the model to all of the data pooled together, that is, estimate one set of parameters for all individuals. By default (and also in this example), sbiofit uses the nlinfit estimation function, which requires Statistics Toolbox. If you do not have Statistics Toolbox, the default function will change, and your results may be slightly different when you run this example. To see which estimation function sbiofit used for the fitting, check the EstimationFunction property of the corresponding results object.

```pooledFit = sbiofit(model,gData,responseMap,estimatedParam,dose,'Pooled',true)
```
```pooledFit =

NLINResults with properties:

GroupName: []
Beta: [4x3 table]
ParameterEstimates: [4x3 table]
J: [24x4x2 double]
COVB: [4x4 double]
CovarianceMatrix: [4x4 double]
R: [24x2 double]
MSE: 6.6220
SSE: 291.3688
Weights: []
EstimatedParameterNames: {'Central'  'Peripheral'  'Q12'  'Cl_Central'}
ErrorModel: 'constant'
ErrorParameters: [1x1 table]
EstimationFunction: 'nlinfit'

```

Plot the fitted results versus the original data. Although three separate plots were generated, the data was fitted using the same set of parameters (that is, all three individuals had the same fitted line).

```plot(pooledFit);
```

Use the summary function to display the final estimated parameter values, including the error parameter estimate for the constant error model (the default error model).

```summary(pooledFit);
```

Estimate one set of parameters for each individual and see if there is any improvement in the parameter estimates. In this example, since there are three individuals, three sets of parameters are estimated.

```unpooledFit =  sbiofit(model,gData,responseMap,estimatedParam,dose,'Pooled',false);
```

Plot the fitted results versus the original data. Each individual was fitted differently (that is, each fittted line is unique to each individual) and each line appeared to fit well to individual data.

```plot(unpooledFit);
```

Display the summary of estimated parameters.

```summary(unpooledFit);
```

Display the fitted results of the first individual. The MSE was lower than that of the pooled fit. This is also true for the other two individuals.

```unpooledFit(1)
```
```ans =

NLINResults with properties:

GroupName: [1x1 categorical]
Beta: [4x3 table]
ParameterEstimates: [4x3 table]
J: [8x4x2 double]
COVB: [4x4 double]
CovarianceMatrix: [4x4 double]
R: [8x2 double]
MSE: 2.1380
SSE: 25.6559
Weights: []
EstimatedParameterNames: {'Central'  'Peripheral'  'Q12'  'Cl_Central'}
ErrorModel: 'constant'
ErrorParameters: [1x1 table]
EstimationFunction: 'nlinfit'

```

Generate a plot of the residuals over time to compare the pooled and unpooled fit results. The figure indicates unpooled fit residuals are smaller than those of pooled fit as expected. In addition to comparing residuals, other rigorous crieteria can be used to compare the fitted results.

```t = [gData.Time;gData.Time];
res_pooled = vertcat(pooledFit.R);
res_pooled = res_pooled(:);
res_unpooled = vertcat(unpooledFit.R);
res_unpooled = res_unpooled(:);
plot(t,res_pooled,'o','MarkerFaceColor','w','markerEdgeColor','b')
hold on
plot(t,res_unpooled,'o','MarkerFaceColor','b','markerEdgeColor','b')
refl = refline(0,0); % A reference line representing a zero residual
title('Residuals versus Time');
xlabel('Time');
ylabel('Residuals');
legend({'Pooled','Unpooled'});
```

This example showed how to perform pooled and unpooled fits using sbiofit. As illustrated, the unpooled fit accounts for variations due to the specific subjects in the study, and, in this case, the model fits better to the data. However, the pooled fit returns population-wide parameters. If you want to estimate population-wide parameters while considering individual variations, use sbiofitmixed.

### Estimate Category-Specific PK Parameters for Multiple Individuals

This example shows how to estimate category-specific (such as young versus old, male versus female), individual-specific, and population-wide parameters using PK profile data from multiple individuals.

Background

Suppose you have drug plasma concentration data from 30 individuals and want to estimate pharmacokinetic parameters, namely the volumes of central and peripheral compartment, the clearance, and intercompartmental clearance. Assume the drug concentration versus the time profile follows the biexponential decline , where is the drug concentration at time t, and and are slopes for corresponding exponential declines.

This synthetic data contains the time course of plasma concentrations of 30 individuals after a bolus dose (100 mg) measured at different times for both central and peripheral compartments. It also contains categorical variables, namely Sex and Age.

```clear
```

Convert to groupedData Format

Convert the data set to a groupedData object, which is the required data format for the fitting function sbiofit. A groupedData object also allows you set independent variable and group variable names (if they exist). Set the units of the ID, Time, CentralConc, PeripheralConc, Age, and Sex variables. The units are optional and only required for the UnitConversion feature, which automatically converts matching physical quantities to one consistent unit system.

```gData = groupedData(data);
gData.Properties.VariableUnits = {'','hour','milligram/liter','milligram/liter','',''};
gData.Properties
```

The IndependentVariableName and GroupVariableName properties have been automatically set to the Time and ID variables of the data.

```ans =

Description: ''
VariableDescriptions: {}
VariableUnits: {1x6 cell}
DimensionNames: {'Row'  'Variable'}
UserData: []
RowNames: {}
VariableNames: {1x6 cell}
GroupVariableName: 'ID'
IndependentVariableName: 'Time'

```

Visualize Data

Display the response data for each individual.

```sbiotrellis(gData,'ID','Time',{'CentralConc','PeripheralConc'});
```

Set Up a Two-Compartment Model

Use the built-in PK library to construct a two-compartment model with infusion dosing and first-order elimination where the elimination rate depends on the clearance and volume of the central compartment. Use the configset object to turn on unit conversion.

```pkmd                                    = PKModelDesign;
pkc1.DosingType                         = 'Bolus';
pkc1.EliminationType                    = 'linear-clearance';
pkc1.HasResponseVariable                = true;
model                                   = construct(pkmd);
configset                               = getconfigset(model);
configset.CompileOptions.UnitConversion = true;
```

For details on creating compartmental PK models using the SimBiology® built-in library, see Create Pharmacokinetic Models.

Define Dosing

Assume every individual receives a bolus dose of 100 mg at time = 0. For details on setting up different dosing strategies, see Doses.

```dose             = sbiodose('dose','TargetName','Drug_Central');
dose.StartTime   = 0;
dose.Amount      = 100;
dose.AmountUnits = 'milligram';
dose.TimeUnits   = 'hour';
```

Map the Response Data to Corresponding Model Components

The data contains measured plasma concentration in the central and peripheral compartments. Map these variables to the appropriate model components, which are Drug_Central and Drug_Peripheral.

```responseMap = {'Drug_Central = CentralConc','Drug_Peripheral = PeripheralConc'};
```

Specify Parameters to Estimate

Specify the volumes of central and peripheral compartments Central and Peripheral, intercompartmental clearance Q12, and clearance Cl_Central as parameters to estimate. The estimatedInfo object lets you optionally specify parameter transforms, initial values, and parameter bounds. Since both Central and Peripheral are constrained to be positive, specify a log-transform for each parameter.

```paramsToEstimate    = {'log(Central)', 'log(Peripheral)', 'Q12', 'Cl_Central'};
estimatedParam      = estimatedInfo(paramsToEstimate,'InitialValue',[1 1 1 1]);
```

Estimate Individual-Specific Parameters

Estimate one set of parameters for each individual by setting the 'Pooled' name-value pair argument to false.

```unpooledFit =  sbiofit(model,gData,responseMap,estimatedParam,dose,'Pooled',false);
```

Display Results

Plot the fitted results versus the original data for each individual (group).

```plot(unpooledFit);
```

For an unpooled fit, sbiofit always returns one results object for each individual.

Display the summary of estimated parameters for each individual.

```summary(unpooledFit);
```

Examine Parameter Estimates for Category Dependencies

Explore the unpooled estimates to see if there is any category-specific parameters, that is, if some parameters are related to one or more categories. If there are any category dependencies, it might be possible to reduce the number of degrees of freedom by estimating just category-specific values for those parameters.

First extract the ID and category values for each ID

```catParamValues = unique(gData(:,{'ID','Sex','Age'}));
```

Add variables to the table containing each parameter's estimate.

```allParamValues            = vertcat(unpooledFit.ParameterEstimates);
catParamValues.Central    = allParamValues.Estimate(strcmp(allParamValues.Name, 'Central'));
catParamValues.Peripheral = allParamValues.Estimate(strcmp(allParamValues.Name, 'Peripheral'));
catParamValues.Q12        = allParamValues.Estimate(strcmp(allParamValues.Name, 'Q12'));
catParamValues.Cl_Central = allParamValues.Estimate(strcmp(allParamValues.Name, 'Cl_Central'));
```

Plot estimates of each parameter for each category. gscatter requires Statistics Toolbox (TM). If you do not have Statistics Toolbox, use other alternative plotting functions such as plot.

```h           = figure;
ylabels     = {'Central','Peripheral','Cl\_Central','Q12'};
plotNumber  = 1;
for i = 1:4
thisParam = estimatedParam(i).Name;

% Plot for Sex category
subplot(4,2,plotNumber);
plotNumber  = plotNumber + 1;
gscatter(double(catParamValues.Sex), catParamValues.(thisParam), catParamValues.Sex);
ax          = gca;
ax.XTick    = [];
ylabel(ylabels(i));

% Plot for Age category
subplot(4,2,plotNumber);
plotNumber  = plotNumber + 1;
gscatter(double(catParamValues.Age), catParamValues.(thisParam), catParamValues.Age);
ax          = gca;
ax.XTick    = [];
ylabel(ylabels(i));
end
```

Based on the plot, it seems that young individuals tend to have higher volumes of central and peripheral compartments (Central, Peripheral) than old invididuals (that is, the volumes seem to be age-specific). In addition, males tend to have lower clearance rates (Cl_Central) than females but the opposite for the Q12 parameter (that is, the clearance and Q12 seem to be sex-specific).

Estimate Category-Specific Parameters

Use the 'CategoryVariableName' property of the estimatedInfo object to specify which category to use during fitting. Use 'Sex' as the group to fit for the clearrance Cl_Central and Q12 parameters. Use 'Age' as the group to fit for the Central and Peripheral parameters.

```estimatedParam(1).CategoryVariableName = 'Age';
estimatedParam(2).CategoryVariableName = 'Age';
estimatedParam(3).CategoryVariableName = 'Sex';
estimatedParam(4).CategoryVariableName = 'Sex';
categoryFit = sbiofit(model,gData,responseMap,estimatedParam,dose)
```

When fitting by category (or group), sbiofit always returns one results object, not one for each category level. This is because both male and female individuals are considered to be part of the same optimization using the same error model and error parameters, similarly for the young and old individuals.

```categoryFit =

NLINResults with properties:

GroupName: []
Beta: [8x5 table]
ParameterEstimates: [120x6 table]
J: [240x8x2 double]
COVB: [8x8 double]
CovarianceMatrix: [8x8 double]
R: [240x2 double]
MSE: 0.4365
SSE: 206.0170
Weights: []
EstimatedParameterNames: {'Central'  'Peripheral'  'Q12'  'Cl_Central'}
ErrorModel: 'constant'
ErrorParameters: [1x1 table]
EstimationFunction: 'nlinfit'

```

Plot Results

Plot the category-specific estimated results.

```plot(categoryFit);
```

For the Cl_Central and Q12 parameters, all males had the same estimates, and similarly for the females. For the Central and Peripheral parameters, all young individuals had the same estimates, and similarly for the old individuals.

Estimate Population-Wide Parameters

To better compare the results, fit the model to all of the data pooled together, that is, estimate one set of parameters for all individuals by setting the 'Pooled' name-value pair argument to true. The warning message tells you that this option will ignore any category-specific information (if they exist).

```pooledFit = sbiofit(model,gData,responseMap,estimatedParam,dose,'Pooled',true);
```
```Warning: You called SBIOFIT using the Pooled option. The CategoryVariableName
values of the ESTIMINFO input will be ignored.
```

Plot Results

Plot the fitted results versus the original data. Although a separate plot was generated for each individual, the data was fitted using the same set of parameters (that is, all individuals had the same fitted line).

```plot(pooledFit);
```

Compare Residuals

Compare residuals of CentralConc and PeripheralConc responses for each fit.

```t = gData.Time;
allResid(:,:,1) = pooledFit.R;
allResid(:,:,2) = categoryFit.R;
allResid(:,:,3) = vertcat(unpooledFit.R);

figure;
responseList = {'CentralConc', 'PeripheralConc'};
for i = 1:2
subplot(2,1,i);
oneResid = squeeze(allResid(:,i,:));
plot(t,oneResid,'o');
refline(0,0); % A reference line representing a zero residual
title(sprintf('Residuals (%s)', responseList{i}));
xlabel('Time');
ylabel('Residuals');
legend({'Pooled','Category-Specific','Unpooled'});
end
```

As shown in the plot, the unpooled fit produced the best fit to the data as it fit the data to each indivdual. This was expected since it used the most number of degrees of freedom. The category-fit reduced the number of degrees of freedom by fitting the data to two categories (sex and age). As a result, the residuals were larger than the unpooled fit, but still smaller than the population-fit, which estimated just one set of parameters for all individuals. The category-fit might be a good compromise between the unpooled and pooled fitting provided that any hierarchical model exists within your data.

### Estimate a Parameter from the Yeast G protein Model

This example uses the yeast heterotrimeric G protein model and experimental data reported by [1]. For details about the model, see the Background section in Parameter Scanning, Parameter Estimation, and Sensitivity Analysis in the Yeast Heterotrimeric G Protein Cycle.

`sbioloadproject gprotein`

Store the experimental data containing the time course for the fraction of active G protein as reported in the reference paper [1].

```time = [0 10 30 60 110 210 300 450 600]';
GaFracExpt = [0 0.35 0.4 0.36 0.39 0.33 0.24 0.17 0.2]';
```

Create a groupedData object based on the experimental data.

```tbl = table(time,GaFracExpt);
grpData = groupedData(tbl);```

Map the appropriate model component to the experimental data. In other words, indicate which species in the model corresponds to which response variable in the data. In this example, map the model parameter GaFrac to the experimental data variable GaFracExpt from grpData.

`responseMap = 'GaFrac = GaFracExpt';`

Use an estimatedInfo object to define the model parameter kGd as a parameter to be estimated.

`estimatedParam = estimatedInfo('kGd');`

Perform the parameter estimation.

`fitResult = sbiofit(m1,grpData,responseMap,estimatedParam);`

View the estimated parameter value of kGd.

`fitResult.ParameterEstimates`
```ans =

Name     Estimate    StandardError
_____    ________    _____________

'kGd'    0.11        0.00037969
```

## Input Arguments

expand all

### sm — SimBiology modelSimBiology model object

SimBiology model, specified as a SimBiology model object. The active configset object of the model contains solver settings for simulation. Any active doses and variants are applied to the model during simulation unless specified otherwise using the dosing and variants input arguments, respectively.

### grpData — Data to fitgroupedData object

Data to fit, specified as a groupedData object.

The name of the time variable must be defined in the IndependentVariableName property of grpData. For instance, if the time variable name is 'TIME', then specify it as follows.

`grpData.Properties.IndependentVariableName = 'TIME';`

If the data contains more than one group of measurements, the grouping variable name must be defined in the GroupVariableName property of grpData. For example, if the grouping variable name is 'GROUP', then specify it as follows.

`grpData.Properties.GroupVariableName = 'GROUP';`

A group usually refers to a set of measurements that represent a single time course, often corresponding to a particular individual or experimental condition.

 Note:   sbiofit uses the categorical function to identify groups. If any group values are converted to the same value by categorical, then those observations are treated as belonging to the same group. For instance, if some observations have no group information (that is, empty string), then categorical converts empty strings to , and these observations are treated as one group.

### responseMap — Mapping information of model components to grpDatastring | cell array of strings

Mapping information of model components to grpData, specified as a string or cell array of strings.

Each string is an equation-like expression, similar to assignment rules in SimBiology. It contains the name (or qualified name) of a quantity (species, compartment, or parameter) in the model sm, followed by the character '=' and the name of a variable in grpData. For clarity, white spaces are allowed between names and '='.

For example, if you have the concentration data 'CONC' in grpData for a species 'Drug_Central', you can specify it as follows.

`responseMap = 'Drug_Central = CONC';`

To unambiguously name a species, use the qualified name, which includes the name of the compartment. To name a reaction-scoped parameter, use the reaction name to qualify the parameter. If the name is not a valid MATLAB® variable name, surround it by square brackets such as [reaction 1].[parameter 1].

An error is issued if any (qualified) name matches two components of the same type. However, you can use a (qualified) name that matches two components of different types, and the function first finds the species with the given name, followed by compartments and then parameters.

### estiminfo — Estimated parametersestimatedInfo object | vector of estimatedInfo objects

Estimated parameters, specified as an estimatedInfo object or vector of estimatedInfo objects that defines the estimated parameters in the model sm, and other optional information such as their initial estimates, transformations, bound constraints, and categories. Supported transforms are log, logit, and probit.

If you do not specify Pooled name-value pair argument, sbiofit uses CategoryVariableName property of estiminfo to decide if parameters should be estimated for each individual, group, category, or all individuals as a whole. Use the Pooled option to override any CategoryVariableName values. For details about CategoryVariableName property, see estimatedInfo object.

 Note:   sbiofit uses the categorical function to identify groups or categories. If any group values are converted to the same value by categorical, then those observations are treated as belonging to the same group. For instance, if some observations have no group information (that is, empty string as a group value), then categorical converts empty strings to , and these observations are treated as one group.

### dosing — Dosing information[] | 2-D matrix of dose objects

Dosing information, specified as an empty array or 2-D matrix of dose objects (ScheduleDose object or RepeatDose object). If empty, no doses are applied during simulation, even if the model has active doses. If not empty, the matrix must have a single row or one row per group in the data grpData. If it has a single row, same doses are applied to all groups during simulation. If it has multiple rows, each row is applied to a separate group, in the same order as the groups appear in grpData.

Multiple columns are allowed so that you can apply multiple dose objects to each group. Each column of doses must reference the same components in the model sm. Specifically, all doses (except for empty doses) in a column must have the same values for TargetName, DurationParameterName, and LagParameterName. If some groups require more doses than others, then fill in the matrix with dummy doses that are either default doses or empty doses.

A default dose has default values for all properties, except for the Name property. An empty dose has a dose amount of 0, thus having no effect on the model. Create a default dose as follows.

`d1 = sbiodose('d1');`

In addition to manually constructing dose objects, if grpData has dosing information, you can use the createDoses method to construct doses.

### functionName — Estimation function namestring

Estimation function name, specified as a string. Choices are as follows.

• 'nlinfit' (Statistics Toolbox is required.)

• 'fminunc' (Optimization Toolbox is required.)

• 'fmincon' (Optimization Toolbox is required.)

• 'fminsearch'

• 'lsqcurvefit' (Optimization Toolbox is required.)

• 'lsqnonlin' (Optimization Toolbox is required.)

• 'patternserch' (Global Optimization Toolbox is required.)

• 'ga' (Global Optimization Toolbox is required.)

• 'particleswarm' (Global Optimization Toolbox is required.)

If the problem uses bound constraints, sbiofit uses the first available function among the following: fmincon, nlinfit, or fminsearch.

If there are no bound constraints, sbiofit uses the first available function among the following: nlinfit, fminunc, or fminsearch.

### options — Options specific to estimation functionstruct | optimoptions object

Options specific to the estimation function, specified as a struct or optimoptions object.

• optimset struct for fminsearch

• optimoptions object for lsqcurvefit, lsqnonlin, fminunc, and particleswarm

• gaoptimset struct for ga

• psoptimset struct for patternsearch

• statset struct for nlinfit

See Default Options for Estimation Algorithms for more details and default options associated with each estimation function.

### variants — Variants[] | vector of variant objects

Variants, specified as an empty array or vector of variant objects. If empty, no variants are used even if the model has active variants.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'ErrorModel','constant','UseParallel',true specifies a constant error model and to run simulations in parallel during parameter estimation.

### 'ErrorModel' — Error model'constant' (default) | 'proportional' | 'combined' | 'exponential'

Error model used for estimation, specified as a string. There are four built-in error models. Each model defines the error using a standard mean-zero and unit-variance (Gaussian) variable e, simulation results f, and one or two parameters a and b.

• 'constant': $y=f+ae$

• 'proportional': $y=f+b|f|e$

• 'combined': $y=f+\left(a+b|f|\right)e$

• 'exponential': $y=f\ast \mathrm{exp}\left(ae\right)$

 Note:   If you specify an error model, you cannot specify weights except for the constant error model.

### 'Weights' — Weightsmatrix | function handle

Weights used for estimation, specified as a matrix of real positive weights, where the number of columns corresponds to the number of responses, or a function handle that accepts a vector of predicted response values and returns a vector of real positive weights.

If you specify an error model, you cannot use weights except for the constant error model. If neither the 'ErrorModel' or 'Weights' is specified, by default the function uses the constant error model with equal weights.

### 'Pooled' — Fit option flagfalse (default) | true

Fit option flag, specifying whether to fit each individual (false) or pool all individual data (true).

When true, the function performs fitting for all individuals or groups simultaneously using the same parameter estimates, and fitResults is a scalar results object.

When false, the function fits each group or individual separately using group- or individual-specific parameters, and fitResults is a vector of results objects with one result for each group.

 Note:   Use this option to override any CategoryVariableName values of estiminfo.

### 'UseParallel' — Flag for parallel simulationsfalse (default) | true

Flag for parallel simulations during fitting, specified as true or false. If true and Parallel Computing Toolbox™ is available, the function runs simulations in parallel.

expand all

### fitResults — Estimation resultsOptimResults object | NLINResults object | Vector of results objects

Estimation results, returned as a OptimResults object or NLINResults object or a vector of these objects. Both results objects are subclasses of the LeastSquaresResults object.

If the function uses nlinfit, then fitResults is a NLINResults object. Otherwise, fitResults is an OptimResults object.

When 'Pooled' is set to false, the function fits each group separately using group-specific parameters, and fitResults is a vector of results objects with one results object for each group.

When 'Pooled' is set to true, the function performs fitting for all individuals or groups simultaneously using the same parameter estimates, and fitResults is a scalar results object.

When 'Pooled' is not used, and CategoryVariableName values of estiminfo are all <none>, fitResults is a single result object. This is the same behavior as setting 'Pooled' to true.

When 'Pooled' is not used, and CategoryVariableName values of estiminfo are all <GroupVariableName>, fitResults is a vector of results objects. This is the same behavior as setting 'Pooled' to false.

In all other cases, fitResults is a scalar object containing estimated parameter values for different groups or categories specified by CategoryVariableName.

### simdata — Simulation resultsVector of SimData objects

Simulation results, returned as a vector of SimData objects representing simulation results for each group or individual.

If the 'Pooled' option is false, then each simulation uses group-specific parameter values. If true, then all simulations use the same (population-wide) parameter values.

The states reported in simdata are the states that were included in the responseMap input argument as well as any other states listed in the StatesToLog property of the runtime options (RuntimeOptions) of the SimBiology model sm.

expand all

### Objective Functions

The function sbiofit uses maximum likelihood estimation (MLE) for fitting. Mathematically, the problem is typically formulated as the minimization of the negative of the logarithm of the likelihood function. For the constant error model, this is equivalent to the method of least-squares, which minimizes the sum of squares of the residuals. For other error models, the objective function has additional terms. When the error models are normally distributed, the negative log-likelihood can be written as follows.

$\begin{array}{l}-\mathrm{ln}ℒ=\sum _{i}^{N}\frac{{\left({y}_{i}-f\left({x}_{i};p\right)\right)}^{2}}{2{\sigma }_{i}^{2}}+\sum _{i}^{N}\mathrm{ln}\sqrt{2\pi {\sigma }_{i}^{2}}\\ \end{array}$

 $ℒ$ Log likelihood N Number of experimental observations yi The ith experimental observation $f\left({x}_{i};p\right)$ Predicted value of the ith observation, which is a function of independent variables xi and estimated parameters p ${\sigma }_{i}$ Standard deviation of the ith observation. For instance, for the combined error model $y=f+\left(a+b|f|\right)e$, the standard deviation is $\sigma =a+b|f|$.

When you use numeric weights or the weight function, the weights are assumed to be proportional to the variance of the error, i.e., ${\sigma }_{i}^{2}={a}^{2}\cdot {w}_{i}$ where a is the constant error parameter. If you use weights, you cannot specify an error model except the constant error model.

Various parameter estimation methods have different requirements on the objective function. For some methods, the estimation of model parameters is performed independently of the estimation of the error model parameters. The following table summarizes the objective function F and any separate formulas used for the estimation of error model parameters. ri is the ith residual, a and b are error model parameters, and fi is the predicted value of the ith observation.

MethodSupported Error ModelObjective FunctionError Parameter Estimation Function
lsqnonlin

lsqcurvefit

'constant'$\stackrel{\to }{F}={r}_{i}$${a}^{2}=\frac{1}{N}\sum \frac{{r}_{i}^{2}}{{w}_{i}}$
lsqnonlin

lsqcurvefit

Numeric weights (wi)$\stackrel{\to }{F}=\frac{{r}_{i}}{\sqrt{{w}_{i}}}$${a}^{2}=\frac{1}{N}\sum \frac{{r}_{i}^{2}}{{w}_{i}}$
lsqnonlin

lsqcurvefit

'exponential'$\stackrel{\to }{F}=\mathrm{ln}{r}_{i}$${a}^{2}=\frac{1}{N}\sum \frac{{r}_{i}^{2}}{{w}_{i}}$
nlinfitAll$\stackrel{\to }{F}={f}_{i}$nlinfit also estimates error parameters.
All others'constant'$F=-\mathrm{ln}ℒ$${a}^{2}=\frac{1}{N}\sum \frac{{r}_{i}^{2}}{{w}_{i}}$
All others'proportional'$F=-\mathrm{ln}ℒ$${b}^{2}=\frac{1}{N}{\sum }^{\text{​}}{\left({r}_{i}}{{f}_{i}}\right)}^{2}$
All others'combined'$F=-\mathrm{ln}ℒ$Error parameters are included in minimization.
All others'exponential'$F=\sum _{i}^{N}\frac{{\left(\mathrm{ln}{y}_{i}-\mathrm{ln}f\left({x}_{i};p\right)\right)}^{2}}{2{a}^{2}}+\sum _{i}^{N}\mathrm{ln}\sqrt{2\pi {a}^{2}}$, where $a$ is the standard deviation of the log-transformed ith observation.${a}^{2}=\frac{1}{N}\sum \frac{{r}_{i}^{2}}{{w}_{i}}$
All othersWeights$F=-\mathrm{ln}ℒ$${a}^{2}=\frac{1}{N}\sum \frac{{r}_{i}^{2}}{{w}_{i}}$

### Default Options for Estimation Algorithms

FunctionDefault Options
nlinfit

sbiofit uses the default options structure associated with nlinfit, except for:

 FunValCheck = 'off' DerivStep = max(eps^(1/3), min(1e-4,solverOptions.RelativeTolerance)), where solverOptions property corresponds to the model's active configset object.
fminunc

sbiofit uses the default options structure associated with fminunc, except for:

```Display          = 'off';
TolFun           = 1e-6*[Initial objective function value]
Algorithm        = 'quasi-newton'
FinDiffRelStep   = max(eps^(1/3), SolverOptions.RelativeTolerance)
TypicalX         = 1e-6*[Initial estimates]```
fminsearch

sbiofit uses the default options structure associated with fminsearch, except for:

 Display = 'off' TolFun = 1e-6 * f0, where f0 is the initial value of objective function.
lsqcurvefit

Requires Optimization Toolbox.

sbiofit uses the default options structure associated with lsqcurvefit, except for:

 Display = 'off' TolFun = 1e-6 * norm(f0), where f0 is the initial value of objective function. FinDiffRelStep = max(eps^(1/3), min(1e-4,solverOptions.RelativeTolerance)) , where solverOptions property corresponds to the model's active configset object. TypicalX = 1e-6 * x0, where x0 is an array of transformed initial estimates.
lsqnonlin

Requires Optimization Toolbox.

sbiofit uses the default options structure associated with lsqnonlin, except for:

 Display = 'off' TolFun = 1e-6 * norm(f0), where f0 is the initial value of objective function. FinDiffRelStep = max(eps^(1/3), min(1e-4,solverOptions.RelativeTolerance)) , where solverOptions property corresponds to the model's active configset object. TypicalX = 1e-6 * x0, where x0 is an array of transformed initial estimates.
patternsearch

Requires Global Optimization Toolbox.

sbioparamestim uses the default options structure associated with patternsearch, except for:

 Display = 'off' TolFun = 1e-6 * f0, where f0 is the initial value of the objective function. TolMesh = 1.0e-3 Cache = 'on' MeshAccel = 'on'
ga

Requires Global Optimization Toolbox.

sbioparamestim uses the default options structure associated with ga, except for:

 Display = 'off' TolFun = 1e-6 * f0, where f0 is the initial value of objective function. MutationFcn = @mutationadaptfeasible
particleswarm

Requires Global Optimization Toolbox.

sbiofit uses the following default options for the particleswarm algorithm, except for:
 Display = 'off' TolFun = 1e-6 * f0, where f0 is the initial value of objective function.

## References

[1] Yi, T-M., Kitano, H., and Simon, M. (2003). A quantitative characterization of the yeast heterotrimeric G protein cycle. PNAS. 100, 10764–10769.