Each regression function has a specific operation. This section
shows how to use these functions to perform specific types of regressions.
To illustrate use of the functions for various regressions, "typical"
usage is shown with optional arguments kept to a minimum. For a typical
regression, you estimate model parameters and residual covariance
matrices with the `mle`

functions and estimate the
standard errors of model parameters with the `std`

functions.
The regressions "without missing data" essentially ignore
samples with any missing values, and the regressions "with
missing data" ignore samples with every value missing.

Multivariate normal regression, or MVNR, is the "standard" implementation of the regression functions in Financial Toolbox™ software.

Estimate Parameters

[Parameters, Covariance] = mvnrmle(Data, Design);

Estimate Standard Errors

StdParameters = mvnrstd(Data, Design, Covariance);

Estimate Parameters

[Parameters, Covariance] = ecmmvnrmle(Data, Design);

Estimate Standard Errors

StdParameters = ecmmvnrstd(Data, Design, Covariance);

Least-squares regression, or LSR, sometimes called ordinary least-squares or multiple linear regression, is the simplest linear regression model. It also enjoys the property that, independent of the underlying distribution, it is a best linear unbiased estimator (BLUE).

Given *m* = `NumSamples`

observations,
the typical least-squares regression model seeks to minimize the objective
function

$$\sum _{k=1}^{m}{\left({Z}_{k}-{H}_{k}b\right)}^{T}\left({Z}_{k}-{H}_{k}b\right)},$$

which, within the maximum likelihood framework of the multivariate
normal regression routine `mvnrmle`

, is equivalent
to a single-iteration estimation of just the parameters to obtain `Parameters`

with
the initial covariance matrix `Covariance`

held fixed
as the identity matrix. In the case of missing data, however, the
internal algorithm to handle missing data requires a separate routine `ecmlsrmle`

to
do least-squares instead of multivariate normal regression.

Estimate Parameters

[Parameters, Covariance] = mvnrmle(Data, Design, 1);

Estimate Standard Errors

StdParameters = mvnrstd(Data, Design, Covariance);

Estimate Parameters

[Parameters, Covariance] = ecmlsrmle(Data, Design);

Estimate Standard Errors

StdParameters = ecmmvnrstd(Data, Design, Covariance);

Given* m *= `NUMSAMPLES`

observations,
the typical covariance-weighted least squares, or CWLS, regression
model seeks to minimize the objective function

$$\sum _{k=1}^{m}{\left({Z}_{k}-{H}_{k}b\right)}^{T}{C}_{0}\left({Z}_{k}-{H}_{k}b\right)$$

with fixed covariance *C*_{0}.

In most cases, *C*_{0} is
a diagonal matrix. The inverse matrix $$W={C}_{0}^{-1}$$ has diagonal elements that can
be considered relative "weights" for each series. Thus,
CWLS is a form of weighted least squares with the weights applied
across series.

Estimate Parameters

[Parameters, Covariance] = mvnrmle(Data, Design, 1, [], [], [], Covar0);

Estimate Standard Errors

StdParameters = mvnrstd(Data, Design, Covariance);

Estimate Parameters

[Parameters, Covariance] = ecmlsrmle(Data, Design, [], [], [], [], Covar0);

Estimate Standard Errors

StdParameters = ecmmvnrstd(Data, Design, Covariance);

An ad hoc form of least squares that has surprisingly good properties for misspecified or nonnormal models is known as feasible generalized least squares, or FGLS. The basic procedure is to do least-squares regression and then to do covariance-weighted least-squares regression with the resultant residual covariance from the first regression.

Estimate Parameters

[Parameters, Covariance] = mvnrmle(Data, Design, 2, 0, 0);

or (to illustrate the FGLS process explicitly)

[Parameters, Covar0] = mvnrmle(Data, Design, 1); [Parameters, Covariance] = mvnrmle(Data, Design, 1, [], [], [], Covar0);

Estimate Standard Errors

StdParameters = mvnrstd(Data, Design, Covariance);

Estimate Parameters

[Parameters, Covar0] = ecmlsrmle(Data, Design); [Parameters, Covariance] = ecmlsrmle(Data, Design, [], [], [], [], Covar0);

Estimate Standard Errors

StdParameters = ecmmvnrstd(Data, Design, Covariance);

Given a multivariate normal regression model in standard form
with a `Data`

matrix and a `Design`

array,
it is possible to convert the problem into a seemingly unrelated regression
(SUR) problem by a simple transformation of the `Design`

array. The
main idea of SUR is that instead of having a common parameter vector
over all data series, you have a separate parameter vector associated
with each separate series or with distinct groups of series that,
nevertheless, share a common residual covariance. It is this ability
to aggregate and disaggregate series and to perform comparative tests
on each design that is the power of SUR.

To make the transformation, use the function `convert2sur`

, which converts a standard-form
design array into an equivalent design array to do SUR with a specified
mapping of the series into `NUMGROUPS`

groups. The
regression functions are used in the usual manner, but with the SUR
design array instead of the original design array. Instead of having `NUMPARAMS`

elements,
the SUR output parameter vector has `NUMGROUPS`

of
stacked parameter estimates, where the first `NUMPARAMS`

elements
of `Parameters`

contain parameter estimates associated
with the first group of series, the next `NUMPARAMS`

elements
of `Parameters`

contain parameter estimates associated
with the second group of series, and so on. If the model has only
one series, for example, `NUMSERIES`

= 1, then the
SUR design array is the same as the original design array since SUR
requires two or more series to generate distinct parameter estimates.

Given `NUMPARAMS`

parameters and `NUMGROUPS`

groups
with a parameter vector (`Parameters`

) with ```
NUMGROUPS
* NUMPARAMS
```

elements from any of the regression routines,
the following MATLAB^{®} code fragment shows how to print a table
of SUR parameter estimates with rows that correspond to each parameter
and columns that correspond to each group or series:

fprintf(1,'Seemingly Unrelated Regression Parameter Estimates\n'); fprintf(1,' %7s ',' '); fprintf(1,' Group(%3d) ',1:NumGroups); fprintf(1,'\n'); for i = 1:NumParams fprintf(1,' %7d ',i); ii = i; for j = 1:NumGroups fprintf(1,'%12g ',Param(ii)); ii = ii + NumParams; end fprintf(1,'\n'); end fprintf(1,'\n');

Form an SUR Design

DesignSUR = convert2sur(Design, Group);

Estimate Parameters

[Parameters, Covariance] = mvnrmle(Data, DesignSUR);

Estimate Standard Errors

StdParameters = mvnrstd(Data, DesignSUR, Covariance);

Form a SUR Design

DesignSUR = convert2sur(Design, Group);

Estimate Parameters

[Parameters, Covariance] = ecmmvnrmle(Data, DesignSUR);

Estimate Standard Errors

StdParameters = ecmmvnrstd(Data, DesignSUR, Covariance);

Without missing data, you can estimate the mean of your `Data`

with
the function `mean`

and the covariance with the function `cov`

.
Nevertheless, the function `ecmnmle`

does this for
you if it detects an absence of missing values. Otherwise, it uses
the ECM algorithm to handle missing values.

Estimate Parameters

[Mean, Covariance] = ecmnmle(Data);

Estimate Standard Errors

StdMean = ecmnstd(Data, Mean, Covariance);

This section provides a few pointers to handle various technical and operational difficulties that might occur.

If samples are ignored, the number of samples used in the estimation
is less than `NumSamples`

. Clearly the actual number
of samples used must be sufficient to obtain estimates. In addition,
although the model parameters `Parameters`

(or mean
estimates `Mean`

) are unbiased maximum likelihood
estimates, the residual covariance estimate `Covariance`

is
biased. To convert to an unbiased covariance estimate, multiply `Covariance`

by

$$\text{Count}/\left(\text{Count}-1\right),$$

where `Count`

is the actual number of samples
used in the estimation with `Count`

≤ `NumSamples`

. None of the regression
functions perform this adjustment.

The regression functions, particularly the estimation functions,
have several requirements. First, they must have consistent values
for `NumSamples`

, `NumSeries`

, and `NumParams`

. As
a rule, the multivariate normal regression functions require

$$\text{Count}\times \text{NumSeries}\le \mathrm{max}\left\{\text{NumParams},\text{}\text{NumSeries}\times \left(\text{NumSeries}+1\right)/2\right\}$$

and the least-squares regression functions require

$$\text{Count}\times \text{NumSeries}\le \text{NumParams},$$

where `Count`

is the actual number of samples
used in the estimation with

$$\text{Count}\le \text{NumSamples}\text{.}$$

Second, they must have enough nonmissing values to converge. Third, they must have a nondegenerate covariance matrix.

Although some necessary and sufficient conditions can be found in the references, general conditions for existence and uniqueness of solutions in the missing-data case, do not exist. Nonconvergence is usually due to an ill-conditioned covariance matrix estimate, which is discussed in greater detail in Nonconvergence.

Since worst-case convergence of the ECM algorithm is linear,
it is possible to execute hundreds and even thousands of iterations
before termination of the algorithm. If you are estimating with the
ECM algorithm regularly with regular updates, you can use prior estimates
as initial guesses for the next period's estimation. This approach
often speeds up things since the default initialization in the regression
functions sets the initial parameters **b** to
zero and the initial covariance **C** to
be the identity matrix.

Other ad hoc approaches are possible although most approaches
are problem-dependent. In particular, for mean and covariance estimation,
the estimation function `ecmnmle`

uses a function `ecmninit`

to
obtain an initial estimate.

Simultaneous estimates for parameters **b** and
covariances **C** require **C** to be positive-definite. So, the general
multivariate normal regression routines require nondegenerate residual
errors. If you are faced with a model that has exact results, the
least-squares routine `ecmlsrmle`

still works, although
it provides a least-squares estimate with a singular residual covariance matrix.
The other regression functions fail.

Although the regression functions are robust and work for most
"typical" cases, they can fail to converge. The main
failure mode is an ill-conditioned covariance matrix, where failures
are either soft or hard. A soft failure wanders endlessly toward a
nearly singular covariance matrix and can be spotted if the algorithm
fails to converge after about 100 iterations. If `MaxIterations`

is
increased to 500 and display mode is initiated (with no output arguments),
a typical soft failure looks like this.

This case, which is based on 20 observations of five assets
with 30% of data missing, shows that the log-likelihood goes linearly
to infinity as the likelihood function goes to 0. In this case, the
function converges but the covariance matrix is effectively singular
with a smallest eigenvalue on the order of machine precision (`eps`

).

For the function `ecmnmle`

, a hard error looks
like this:

> Inecmninitat 60 Inecmnmleat 140 ??? Error using ==> ecmnmle Full covariance not positive-definite in iteration 218.

From a practical standpoint, if in doubt, test your residual
covariance matrix from the regression routines to ensure that it is
positive-definite. This is important because a soft error has a matrix
that appears to be positive-definite but actually has a near-zero-valued
eigenvalue to within machine precision. To do this with a covariance
estimate `Covariance`

, use `cond(Covariance)`

,
where any value greater than `1/eps`

should be considered
suspect.

If either type of failure occurs, however, note that the regression routine is indicating that something is probably wrong with the data. (Even with no missing data, two time series that are proportional to one another produce a singular covariance matrix.)

This example illustrates how to use the missing data algorithms for portfolio optimization and for valuation. This example works with five years of daily total return data for 12 computer technology stocks, with 6 hardware and 6 software companies. The example estimates the mean and covariance matrix for these stocks, forms efficient frontiers with both a naïve approach and the ECM approach, and compares results.

You can run the example directly with `ecmtechdemo.m`

.

Load the following data file:

`load ecmtechdemo`

This file contains these three quantities:

`Assets`

is a cell array of the tickers for the 12 stocks in the example.`Data`

is a`1254`

-by-`12`

matrix of 1254 daily total returns for each of the 12 stocks.`Dates`

is a`1254`

-by-`1`

column vector of the dates associated with the data.

The time period for the data extends from April 19, 2000 to April 18, 2005.

The sixth stock in

`Assets`

is Google (GOOG), which started trading on August 19, 2004. So, all returns before August 20, 2004 are missing and represented as`NaNs`

. Also, Amazon (AMZN) had a few days with missing values scattered throughout the past five years.A naïve approach to the estimation of the mean and covariance for these 12 assets is to eliminate all days that have missing values for any of the 12 assets. Use the function

`ecmninit`

with the`nanskip`

option to do this.`[NaNMean, NaNCovar] = ecmninit(Data,'nanskip');`

Contrast the result of this approach with using all available data and the function

`ecmnmle`

to compute the mean and covariance. First, call`ecmnmle`

with no output arguments to establish that enough data is available to obtain meaningful estimates.ecmnmle(Data);

The following figure shows that, even with almost 87% of the Google data being

`NaN`

values, the algorithm converges after only four iterations.Estimate the mean and covariance as computed by

`ecmnmle`

.[ECMMean, ECMCovar] = ecmnmle(Data)

ECMMean = 0.0008 0.0008 -0.0005 0.0002 0.0011 0.0038 -0.0003 -0.0000 -0.0003 -0.0000 -0.0003 0.0004 ECMCovar = 0.0012 0.0005 0.0006 0.0005 0.0005 0.0003 0.0005 0.0024 0.0007 0.0006 0.0010 0.0004 0.0006 0.0007 0.0013 0.0007 0.0007 0.0003 0.0005 0.0006 0.0007 0.0009 0.0006 0.0002 0.0005 0.0010 0.0007 0.0006 0.0016 0.0006 0.0003 0.0004 0.0003 0.0002 0.0006 0.0022 0.0005 0.0005 0.0006 0.0005 0.0005 0.0001 0.0003 0.0003 0.0004 0.0003 0.0003 0.0002 0.0006 0.0006 0.0008 0.0007 0.0006 0.0002 0.0003 0.0004 0.0005 0.0004 0.0004 0.0001 0.0005 0.0006 0.0008 0.0005 0.0007 0.0003 0.0006 0.0012 0.0008 0.0007 0.0011 0.0016 ECMCovar (continued) 0.0005 0.0003 0.0006 0.0003 0.0005 0.0006 0.0005 0.0003 0.0006 0.0004 0.0006 0.0012 0.0006 0.0004 0.0008 0.0005 0.0008 0.0008 0.0005 0.0003 0.0007 0.0004 0.0005 0.0007 0.0005 0.0003 0.0006 0.0004 0.0007 0.0011 0.0001 0.0002 0.0002 0.0001 0.0003 0.0016 0.0009 0.0003 0.0005 0.0004 0.0005 0.0006 0.0003 0.0005 0.0004 0.0003 0.0004 0.0004 0.0005 0.0004 0.0011 0.0005 0.0007 0.0007 0.0004 0.0003 0.0005 0.0006 0.0004 0.0005 0.0005 0.0004 0.0007 0.0004 0.0013 0.0007 0.0006 0.0004 0.0007 0.0005 0.0007 0.0020

Given estimates for the mean and covariance of asset returns derived from the naïve and ECM approaches, estimate portfolios, and associated expected returns and risks on the efficient frontier for both approaches.

[ECMRisk, ECMReturn, ECMWts] = portopt(ECMMean',ECMCovar,10); [NaNRisk, NaNReturn, NaNWts] = portopt(NaNMean',NaNCovar,10);

Plot the results on the same graph to illustrate the differences.

figure(gcf) plot(ECMRisk,ECMReturn,'-bo','MarkerFaceColor','b','MarkerSize', 3); hold on plot(NaNRisk,NaNReturn,'-ro','MarkerFaceColor','r','MarkerSize', 3); title('\bfMean-Variance Efficient Frontiers under Various Assumptions'); legend('ECM','NaN','Location','SouthEast'); xlabel('\bfStd. Dev. of Returns'); ylabel('\bfMean of Returns'); hold off

Clearly, the naïve approach is optimistic about the risk-return trade-offs for this universe of 12 technology stocks. The proof, however, lies in the portfolio weights. To view the weights, enter

Assets ECMWts NaNWts

which generates

>> Assets ans = 'AAPL' 'AMZN' 'CSCO' 'DELL' 'EBAY' 'GOOG' >> ECMWts ans = 0.0358 0.0011 -0.0000 0.0000 0.0000 0.0989 0.0654 0.0110 0.0000 0.0000 0.0000 0.1877 0.0923 0.0194 0.0000 0.0000 0.0000 0.2784 0.1165 0.0264 0.0000 -0.0000 0.0000 0.3712 0.1407 0.0334 -0.0000 0 0.0000 0.4639 0.1648 0.0403 0.0000 0 -0.0000 0.5566 0.1755 0.0457 0.0000 -0.0000 -0.0000 0.6532 0.1845 0.0509 0.0000 0.0000 -0.0000 0.7502 0.1093 0.0174 -0.0000 0.0000 0 0.8733 0 0 -0.0000 0.0000 0 1.0000 >> NaNWts ans = -0.0000 0.0000 -0.0000 0.1185 0.0000 0.0522 0.0576 -0.0000 -0.0000 0.1219 0.0000 0.0854 0.1248 -0.0000 -0.0000 0.0952 -0.0000 0.1195 0.1969 -0.0000 -0.0000 0.0529 -0.0000 0.1551 0.2690 -0.0000 -0.0000 0.0105 0.0000 0.1906 0.3414 0.0000 -0.0000 -0.0000 -0.0000 0.2265 0.4235 0.0000 -0.0000 -0.0000 -0.0000 0.2639 0.5245 0.0000 -0.0000 -0.0000 -0.0000 0.3034 0.6269 -0.0000 -0.0000 -0.0000 -0.0000 0.3425 1.0000 -0.0000 -0.0000 0.0000 -0.0000 0 Assets (continued) 'HPQ' 'IBM' 'INTC' 'MSFT' 'ORCL' 'YHOO' ECMWts (continued) 0.0535 0.4676 0.0000 0.3431 -0.0000 0.0000 0.0179 0.3899 -0.0000 0.3282 0.0000 -0.0000 0 0.3025 -0.0000 0.3074 0.0000 -0.0000 0.0000 0.2054 -0.0000 0.2806 0.0000 0.0000 0.0000 0.1083 -0.0000 0.2538 -0.0000 0.0000 0.0000 0.0111 -0.0000 0.2271 -0.0000 0.0000 0.0000 0.0000 -0.0000 0.1255 -0.0000 0.0000 0.0000 0 -0.0000 0.0143 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 -0.0000 -0.0000 0.0000 0.0000 -0.0000 -0.0000 -0.0000 -0.0000 0.0000 NaNWts (continued) 0.0824 0.1779 0.0000 0.5691 -0.0000 0.0000 0.1274 0.0460 0.0000 0.5617 -0.0000 -0.0000 0.1674 -0.0000 0.0000 0.4802 0.0129 -0.0000 0.2056 -0.0000 0.0000 0.3621 0.0274 -0.0000 0.2438 -0.0000 0.0000 0.2441 0.0419 -0.0000 0.2782 -0.0000 0.0000 0.0988 0.0551 -0.0000 0.2788 -0.0000 0.0000 -0.0000 0.0337 -0.0000 0.1721 -0.0000 0.0000 -0.0000 -0.0000 -0.0000 0.0306 -0.0000 0.0000 0.0000 0 -0.0000 0 0.0000 0.0000 -0.0000 -0.0000 -0.0000

The naïve portfolios in

`NaNWts`

tend to favor Apple Computer (AAPL), which happened to do well over the period from the Google IPO to the end of the estimation period, while the ECM portfolios in`ECMWts`

tend to underweight Apple Computer and to recommend increased weights in Google relative to the naïve weights.To evaluate the impact of estimation error and, in particular, the effect of missing data, use

`ecmnstd`

to calculate standard errors. Although it is possible to estimate the standard errors for both the mean and covariance, the standard errors for the mean estimates alone are usually the main quantities of interest.`StdMeanF = ecmnstd(Data,ECMMean,ECMCovar,'fisher');`

Calculate standard errors that use the data-generated Hessian matrix (which accounts for the possible loss of information due to missing data) with the option

`HESSIAN`

.`StdMeanH = ecmnstd(Data,ECMMean,ECMCovar,'hessian');`

The difference in the standard errors shows the increase in uncertainty of estimation of asset expected returns due to missing data. This can be viewed by entering:

Assets StdMeanH' StdMeanF' StdMeanH' - StdMeanF'

The two assets with

`missing data`

, AMZN and GOOG, are the only assets to have differences due to missing information.

`convert2sur`

| `ecmlsrmle`

| `ecmlsrobj`

| `ecmmvnrfish`

| `ecmmvnrfish`

| `ecmmvnrmle`

| `ecmmvnrobj`

| `ecmmvnrstd`

| `ecmmvnrstd`

| `ecmnfish`

| `ecmnhess`

| `ecmninit`

| `ecmnmle`

| `ecmnobj`

| `ecmnstd`

| `mvnrfish`

| `mvnrmle`

| `mvnrobj`

| `mvnrstd`

Was this topic helpful?