estimate

Class: arima

Estimate ARIMA or ARIMAX model parameters

Syntax

EstMdl = estimate(Mdl,y)
[EstMdl,EstParamCov,logL,info] = estimate(Mdl,y)
[EstMdl,EstParamCov,logL,info] = estimate(Mdl,y,Name,Value)

Description

EstMdl = estimate(Mdl,y) uses maximum likelihood to estimate the parameters of the ARIMA(p,D,q) model Mdl given the observed univariate time series y. EstMdl is an arima model that stores the results.

[EstMdl,EstParamCov,logL,info] = estimate(Mdl,y) additionally returns EstParamCov, the variance-covariance matrix associated with estimated parameters, logL, the optimized loglikelihood objective function, and info, a data structure of summary information.

[EstMdl,EstParamCov,logL,info] = estimate(Mdl,y,Name,Value) estimates the model with additional options specified by one or more Name,Value pair arguments.

Input Arguments

expand all

Mdl — ARIMA or ARIMAX modelarima model

ARIMA or ARIMAX model, specified as an arima model returned by arima or estimate.

estimate treats non-NaN elements in Mdl as equality constraints and does not estimate the corresponding parameters.

y — Single path of response datanumeric column vector

Single path of response data to which the model is fit, specified as a numeric column vector. The last observation of y is the latest.

Data Types: double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'AR0' — Initial estimates of nonseasonal autoregressive coefficientsnumeric vector

Initial estimates of the nonseasonal autoregressive coefficients for the ARIMA model, specified as the comma-separated pair consisting of 'AR0' and a numeric vector.

The number of coefficients in AR0 must equal the number of lags associated with nonzero coefficients in the nonseasonal autoregressive polynomial, ARLags.

By default, estimate derives initial estimates using standard time series techniques.

Data Types: double

'Beta0' — Initial estimates of regression coefficientsnumeric vector

Initial estimates of regression coefficients for the regression component, specified as the comma-separated pair consisting of 'Beta0' and a numeric vector.

The number of coefficients in Beta0 must equal the number of columns of X.

By default, estimate derives initial estimates using standard time series techniques.

Data Types: double

'Constant0' — Initial ARIMA model constant estimatescalar

Initial ARIMA model constant estimate, specified as the comma-separated pair consisting of 'Constant0' and a scalar.

By default, estimate derives initial estimates using standard time series techniques.

Data Types: double

'Display' — Command Window display option'params' (default) | 'diagnostics' | 'full' | 'iter' | 'off' | cell vector of strings

Command Window display option, specified as the comma-separated pair consisting of 'Display' and a string or cell vector of strings.

Set Display using any combination of values in this table.

Valueestimate Displays
'diagnostics'Optimization diagnostics
'full'Maximum likelihood parameter estimates, standard errors, t statistics, iterative optimization information, and optimization diagnostics
'iter'Iterative optimization information
'off'Nothing in the Command Window
'params'Maximum likelihood parameter estimates, standard errors, and t statistics

For example,

  • To run a simulation where you are fitting many models, and therefore want to suppress all output, use 'Display','off'.

  • To display all estimation results and the optimization diagnostics, use 'Display',{'params','diagnostics'}.

Data Types: char | cell

'DoF0' — Initial estimate of t-distribution degrees-of-freedom parameter10 (default) | positive scalar

Initial estimate of the t-distribution degree-of-freedom parameter, specified as the comma-separated pair consisting of 'DoF0' and a positive scalar. DoF0 must exceed 2.

Data Types: double

'E0' — Presample innovationsnumeric column vector

Presample innovations that have mean 0 and provide initial values for the ARIMA(p,Dq) model, specified as the comma-separated pair consisting of 'E0' and a numeric column vector.

E0 must contain at least Mdl.Q rows. If you use a conditional variance model, such as a garch model, then the software might require more than Mdl.Q presample innovations.

If E0 contains extra rows, then estimate uses the latest Mdl.Q presample innovations. The last row contains the latest presample innovation.

By default, estimate sets the necessary presample innovations to 0.

Data Types: double

'MA0' — Initial estimates of nonseasonal moving average coefficientsnumeric vector

Initial estimates of nonseasonal moving average coefficients for the ARIMA(p,Dq) model, specified as the comma-separated pair consisting of 'MA0' and a numeric vector.

The number of coefficients in MA0 must equal the number of lags associated with nonzero coefficients in the nonseasonal moving average polynomial, MALags.

By default, estimate derives initial estimates using standard time series techniques.

Data Types: double

'Options' — Optimization optionsoptimoptions optimization controller | optimset optimization controller

Optimization options, specified as the comma-separated pair consisting of 'Options' and an optimoptions or optimset optimization controller. For details on altering the default values of the optimizer, see optimoptions, optimset, or fmincon in Optimization Toolbox™.

For example, suppose that you want to change the constraint tolerance to 1e-6. Set Options = optimoptions(@fmincon,'TolCon',1e-6,'Algorithm','sqp'), and then pass Options into estimate using 'Options',Options.

By default, estimate uses the same default options as fmincon, except Algorithm = sqp and TolCon = 1e-7.

'SAR0' — Initial estimates of seasonal autoregressive coefficientsnumeric vector

Initial estimates of seasonal autoregressive coefficients for the ARIMA(p,Dq) model, specified as the comma-separated pair consisting of 'SAR0' and a numeric vector.

The number of coefficients in SAR0 must equal the number of lags associated with nonzero coefficients in the seasonal autoregressive polynomial, SARLags.

By default, estimate derives initial estimates using standard time series techniques.

Data Types: double

'SMA0' — Initial estimates of seasonal moving average coefficientsnumeric vector

Initial estimates of seasonal moving average coefficients for the ARIMA(p,Dq) model, specified as the comma-separated pair consisting of 'SMA0' and a vector.

The number of coefficients in SMA0 must equal the number of lags with nonzero coefficients in the seasonal moving average polynomial, SMALags.

By default, estimate derives initial estimates using standard time series techniques.

Data Types: double

'V0' — Presample conditional variancesnumeric column vector with positive entries

Presample conditional variances that provide initial values for any conditional variance model, specified as the comma-separated pair consisting of 'V0' and a numeric column vector with positive entries.

The software requires V0 to have at least the number of observations required to initialize the variance model. If the number of rows in V0 exceeds the number necessary, then estimate only uses the latest observations. The last row contains the latest observation.

If the variance of the model is constant, then V0 is unnecessary.

By default, estimate sets the necessary presample conditional variances to the average of the squared inferred residuals.

Data Types: double

'Variance0' — Initial estimates of variances of innovationspositive scalar | cell vector of positive scalars

Initial estimates of variances of innovations for the ARIMA(p,Dq) model, specified as the comma-separated pair consisting of 'Variance0' and a positive scalar or a cell vector of positive scalars. If Variance0 is a cell vector, then the conditional variance model must recognize the parameter names as valid coefficients.

By default, estimate derives initial estimates using standard time series techniques.

Data Types: double | cell

'X' — Exogenous predictorsmatrix

Exogenous predictors in the regression model, specified as the comma-separated pair consisting of 'X' and a matrix.

The columns of X are separate, synchronized time series, with the last row containing the latest observations.

If you do not specify Y0, then the number of rows of X must be at least numel(y,2) + Mdl.P. Otherwise, the number of rows of X should be at least the length of y.

If the number of rows of X exceeds the number necessary, then estimate uses the latest observations and synchronizes X with the response series y.

By default, estimate does not estimate the regression coefficients regardless of their presence in Mdl.

Data Types: double

'Y0' — Presample response datanumeric column vector

Presample response data that provides initial values for the ARIMA(p,Dq) model, specified as the comma-separated pair consisting of 'Y0' and a numeric column vector.

Y0 is a column vector with at least Mdl.P rows. If the number of rows in Y0 exceeds Mdl.P, estimate only uses the latest Mdl.P observations. The last row contains the latest observation.

By default, estimate backward forecasts for the necessary amount of presample observations.

Data Types: double

    Notes  

    • NaNs indicate missing values, and estimate removes them. The software merges the presample data (E0, V0, and Y0) separately from the effective sample data (X and y), then uses list-wise deletion to remove any NaNs. Removing NaNs in the data reduces the sample size, and can also create irregular time series.

    • Removing NaNs in the data reduces the sample size, and can also create irregular time series.

    • estimate assumes that you synchronize the response and exogenous predictors such that the last (latest) observation of each occurs simultaneously. The software also assumes that you synchronize the presample series similarly.

    • If you specify a value for Display, then it takes precedence over the specifications of the optimization options Diagnostics and Display. Otherwise, estimate honors all selections related to the display of optimization information in the optimization options.

Output Arguments

expand all

EstMdl — Model containing parameter estimatesarima model

Model containing parameter estimates, returned as an arima model. estimate uses maximum likelihood to calculate all parameter estimates not constrained by Mdl (that is, all parameters in Mdl that you set to NaN).

EstParamCov — Variance-covariance matrix of maximum likelihood estimatesmatrix

Variance-covariance matrix of maximum likelihood estimates of model parameters known to the optimizer, returned as a matrix.

The rows and columns contain the covariances of the parameter estimates. The standard errors of the parameter estimates are the square root of the entries along the main diagonal.

The rows and columns associated with any parameters held fixed as equality constraints contain 0s.

estimate uses the outer product of gradients (OPG) method to perform covariance matrix estimation.

estimate orders the parameters in EstParamCov as follows:

  • Constant

  • Nonzero AR coefficients at positive lags

  • Nonzero SAR coefficients at positive lags

  • Nonzero MA coefficients at positive lags

  • Nonzero SMA coefficients at positive lags

  • Regression coefficients (when you specify X in estimate)

  • Variance parameters (scalar for constant-variance models, vector of additional parameters otherwise)

  • Degrees of freedom (t innovation distribution only)

Data Types: double

logL — Optimized loglikelihood objective function valuescalar

Optimized loglikelihood objective function value, returned as a scalar.

Data Types: double

info — Summary informationstructure array

Summary information, returned as a structure.

FieldDescription
exitflagOptimization exit flag (see fmincon in Optimization Toolbox)
optionsOptimization options controller (see optimoptions and fmincon in Optimization Toolbox)
XVector of final parameter estimates
X0Vector of initial parameter estimates

For example, you can display the vector of final estimates by typing info.X in the Command Window.

Data Types: struct

Examples

expand all

Estimate ARIMA Model Parameters Without Initial Values

Fit an ARMA(2,1) model to simulated data.

Simulate 500 data points from the ARMA(2,1) model

$${y_t} = 0.5{y_{t - 1}} - 0.3{y_{t - 2}} + {\varepsilon _t} +
0.2{\varepsilon _{t - 1}},$$

where $\varepsilon_{t}$ follows a Gaussian distribution with mean 0 and variance 0.1.

Mdl = arima('AR',{0.5,-0.3},'MA',0.2,...
	'Constant',0,'Variance',0.1);

rng(5); % For reproducibility
y = simulate(Mdl,500);

The simulated data is stored in the column vector Y.

Specify an ARMA(2,1) model with no constant and unknown coefficients and variance.

ToEstMdl = arima(2,0,1);
ToEstMdl.Constant = 0
ToEstMdl = 

    ARIMA(2,0,1) Model:
    --------------------
    Distribution: Name = 'Gaussian'
               P: 2
               D: 0
               Q: 1
        Constant: 0
              AR: {NaN NaN} at Lags [1 2]
             SAR: {}
              MA: {NaN} at Lags [1]
             SMA: {}
        Variance: NaN

Fit the ARMA(2,1) model to y.

EstMdl = estimate(Mdl,y)
EstMdl = 

    ARIMA(2,0,1) Model:
    --------------------
    Distribution: Name = 'Gaussian'
               P: 2
               D: 0
               Q: 1
        Constant: 0
              AR: {0.5 -0.3} at Lags [1 2]
             SAR: {}
              MA: {0.2} at Lags [1]
             SMA: {}
        Variance: 0.1

The result is a new arima model called EstMdl. The estimates in EstMdl resemble the parameter values that generated the simulated data.

Estimate ARIMA Model Parameters Using Initial Values

Fit an integrated ARIMA(1,1,1) model to the daily close of the NASDAQ Composite Index.

Load the NASDAQ data included with the toolbox. Extract the first 1500 observations of the Composite Index (January 1990 to December 1995).

load Data_EquityIdx
nasdaq = DataTable.NASDAQ(1:1500);

Specify an ARIMA(1,1,1) model for fitting.

Mdl = arima(1,1,1);

The model is nonseasonal, so you can use shorthand syntax.

Fit the model to the first half of the data.

EstMdl = estimate(Mdl,nasdaq(1:750));
 
    ARIMA(1,1,1) Model:
    --------------------
    Conditional Probability Distribution: Gaussian

                                  Standard          t     
     Parameter       Value          Error       Statistic 
    -----------   -----------   ------------   -----------
     Constant       0.223399      0.184177        1.21296
        AR{1}       0.114341      0.119438       0.957323
        MA{1}       0.127637      0.119251        1.07032
     Variance        18.9833      0.689994        27.5122

The result is a new arima model (EstMdl). The estimated parameters, their standard errors, and $t$ statistics display in the Command Window.

Use the estimated parameters as initial values for fitting the second half of the data.

con0 = EstMdl.Constant;
ar0 = EstMdl.AR{1};
ma0 = EstMdl.MA{1};
var0 = EstMdl.Variance;

[EstMdl2,EstParamCov2,logL2,info2] = estimate(Mdl,....
   nasdaq(751:end),'Constant0',con0,'AR0',ar0,...
   'MA0',ma0,'Variance0',var0);
 
    ARIMA(1,1,1) Model:
    --------------------
    Conditional Probability Distribution: Gaussian

                                  Standard          t     
     Parameter       Value          Error       Statistic 
    -----------   -----------   ------------   -----------
     Constant       0.611451      0.326752         1.8713
        AR{1}      -0.150708      0.117818       -1.27916
        MA{1}       0.385685      0.109055         3.5366
     Variance        36.4933       1.22699        29.7422

The parameter estimates are stored in the info data structure. Display the final parameter estimates.

info2.X
ans =

    0.6115
   -0.1507
    0.3857
   36.4933

Estimate ARIMAX Model Parameters Without Initial Values

Fit an ARIMAX model to a simulated time series without specifying initial values for the response or the parameters.

Define the ARIMAX(2,1,1) model

$$(1 - 0.5L + 0.3{L^2}){(1 - L)^1}{y_t} = 1.5{x_{1,t}} + 2.6{x_{2,t}} - 0.3{x_{3,t}} + {\varepsilon _t} + 0.2{\varepsilon _{t - 1}}$$

to eventually simulate a time series of length 500, where $\varepsilon_{t}$ follows a Gaussian distribution with mean 0 and variance 0.1.

Mdl = arima('AR',{0.5,-0.3},'MA',0.2,'D',1,...
    'Constant',0,'Variance',0.1,'Beta',[1.5 2.6 -0.3]);
T = 500;

Simulate three stationary AR(1) series and presample values:

$$\begin{array}{*{20}{c}}
{{x_{1,t}} = 0.1{x_{1,t - 1}} + {\eta _{1,t}}}\\
{{x_{2,t}} = 0.2{x_{2,t - 1}} + {\eta _{2,t}}}\\
{{x_{3,t}} = 0.3{x_{3,t - 1}} + {\eta _{3,t}},}
\end{array}$$

where $\eta_{i,t}$ follows a Gaussian distribution with mean 0 and variance 0.01 for i = {1,2,3}.

numObs = Mdl.P + T;
MdlX1 = arima('AR',0.1,'Constant',0,'Variance',0.01);
MdlX2 = arima('AR',0.2,'Constant',0,'Variance',0.01);
MdlX3 = arima('AR',0.3,'Constant',0,'Variance',0.01);
X1 = simulate(MdlX1,numObs);
X2 = simulate(MdlX2,numObs);
X3 = simulate(MdlX3,numObs);
Xmat = [X1 X2 X3];

The simulated exogenous predictors are stored in the numObs-by-3 matrix Xmat.

Simulate 500 data points from the ARIMA(2,1,1) model.

y = simulate(Mdl,T,'X',Xmat);

The simulated response is stored in the column vector y.

Create an ARIMA(2,1,1) model with known 0-valued constant and unknown coefficients and variance.

ToEstMdl = arima(2,1,1);
ToEstMdl.Constant = 0
ToEstMdl = 

    ARIMA(2,1,1) Model:
    --------------------
    Distribution: Name = 'Gaussian'
               P: 3
               D: 1
               Q: 1
        Constant: 0
              AR: {NaN NaN} at Lags [1 2]
             SAR: {}
              MA: {NaN} at Lags [1]
             SMA: {}
        Variance: NaN

ToEstMdl is an ARIMA(2,1,1) model. estimate changes this designation to ARIMAX(2,1,1) when you pass the exogenous predictors into the X argument. estimate estimates all parameters with the value NaN in ToEstMdl.

Fit the ARIMAX(2,1,1) model to y including regression matrix Xmat.

EstMdl = estimate(ToEstMdl,y,'X',Xmat);
 
    ARIMAX(2,1,1) Model:
    ---------------------
    Conditional Probability Distribution: Gaussian

                                  Standard          t     
     Parameter       Value          Error       Statistic 
    -----------   -----------   ------------   -----------
     Constant              0         Fixed          Fixed
        AR{1}       0.416338     0.0460672        9.03763
        AR{2}      -0.274052     0.0406445       -6.74266
        MA{1}       0.334598     0.0572075        5.84885
        Beta1         1.4194      0.142422        9.96619
        Beta2        2.54199      0.133102        19.0981
        Beta3      -0.287669       0.14035       -2.04965
     Variance      0.0967773    0.00579104        16.7115

ToEstMdl is a new arima model designated as ARIMAX(2,1,1) since exogenous predictors enter the model. The estimates in ToEstMdl resemble the parameter values that generated the simulated data.

Estimate ARIMAX Model Parameters Using Initial Values

Fit an ARIMAX model to a time series specifying initial values for the response and the parameters.

The Credit Defaults data set contains four variables:

  • Default rate on investment-grade corporate bonds (IGD)

  • Percentage of investment-grade bond issuers first rated 3 years ago (AGE)

  • One-year-ahead forecast of the change in corporate profits, adjusted for inflation (CPF)

  • Spread between corporate bond yields and those of comparable government bonds (SPR)

Assume that an ARIMAX(1,0,0) model is appropriate to fit IGD using AGE, CPF, and SPR as exogenous predictors. Load the Credit Defaults data set. Assign the response IGD to y. Assign the predictors AGE, CPF, and SPR to the matrix X.

load Data_CreditDefaults
X = Data(:,[1 3:4]);
T = size(X,1);
y = Data(:,5);

The response and exogenous predictor series should be stationary before you continue. If your response is not stationary, then specify the degree of integration in the arima statement. If your exogenous predictors are not stationary, then you must difference them using diff. The series in this example are stationary to not distract from its main purpose.

Separate the initial values from the main response and exogenous predictors. Choose initial values for the regression coefficients Beta0.

y0 = y(1);
yEst = y(2:T);
XEst = X(2:end,:);
Beta0 = [0.5 0.5 0.5];

y0 initializes the response series and yest is the main response series for estimation. XEst is the main exogenous predictor matrix for estimation.

Specify the model Mdl to fit to the data.

Mdl = arima(1,0,0);

Fit the model to the data and specify the initial values.

EstMdl = estimate(Mdl,yEst,'X',XEst,...
    'Y0',y0,'Beta0',Beta0);
 
    ARIMAX(1,0,0) Model:
    ---------------------
    Conditional Probability Distribution: Gaussian

                                  Standard          t     
     Parameter       Value          Error       Statistic 
    -----------   -----------   ------------   -----------
     Constant      -0.204768      0.266078      -0.769582
        AR{1}     -0.0173111      0.565618     -0.0306057
        Beta1      0.0239329     0.0218416        1.09574
        Beta2     -0.0124602    0.00749915       -1.66155
        Beta3      0.0680874      0.074504       0.913876
     Variance     0.00539462    0.00224392         2.4041

Tip

Suppose EstParamCov is an estimated parameter covariance matrix returned by estimate. The software sets the variances and covariances of parameters fixed during estimation to 0. Enter this command to count the number of free parameters (numParams) in a fitted model.

numParams = sum(any(EstParamCov))

This command counts the number of columns (or equivalently, rows) with any nonzero values.

References

[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.

[2] Enders, W. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, 1995.

[3] Greene, W. H. Econometric Analysis. 3rd ed. Upper Saddle River, NJ: Prentice Hall, 1997.

[4] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.

Was this topic helpful?