This example shows how to perform panel data
analysis using `mvregress`

. First, a fixed effects
model with concurrent correlation is fit by ordinary least squares
(OLS) to some panel data. Then, the estimated error covariance matrix
is used to get panel corrected standard errors for the regression
coefficients.

**Load sample data.**

Navigate to the folder containing sample data. Load the sample panel data.

cd(matlabroot) cd('help/toolbox/stats/examples') load('panelData')

The dataset array, `panelData`

, contains yearly
observations on eight cities for 6 years. This is simulated data.

**Define variables.**

The first variable, `Growth`

, measures economic
growth (the response variable). The second and third variables are
city and year indicators, respectively. The last variable, `Employ`

,
measures employment (the predictor variable).

y = panelData.Growth; city = panelData.City; year = panelData.Year; x = panelData.Employ;

**Plot data grouped by category.**

To look for potential city-specific fixed effects, create a box plot of the response grouped by city.

```
figure()
boxplot(y,city)
xlabel('City')
```

There does not appear to be any systematic differences in the mean response among cities.

**Plot data grouped by a different category.**

To look for potential year-specific fixed effects, create a box plot of the response grouped by year.

```
figure()
boxplot(y,year)
xlabel('Year')
```

Some evidence of systematic differences in the mean response between years seems to exist.

**Format response data.**

Let *y _{ij}* denote the
response for city

Consider fitting a year-specific fixed effects model with a constant slope and concurrent correlation among cities in the same year,

$${y}_{ij}={\alpha}_{i}+{\beta}_{1}{x}_{ij}+{\epsilon}_{ij},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,\dots ,n,\text{\hspace{0.17em}}j=1,\dots ,d,$$

where $${\epsilon}_{i}=({\epsilon}_{i1},\dots ,{\epsilon}_{id}{)}^{\prime}\sim MVN(0,\Sigma )$$. The concurrent correlation accounts for any unmeasured, time-static factors that might impact growth similarly for some cities. For example, cities with close spatial proximity might be more likely to have similar economic growth.

To fit this model using `mvregress`

, reshape
the response data into an *n*-by-*d* matrix.

n = 6; d = 8; Y = reshape(y,n,d);

**Format design matrices.**

Create a length-*n* cell array of *d*-by-*K* design
matrices. For this model, there are *K* = 7 parameters
(*d* = 6 intercept terms and a slope).

Suppose the vector of parameters is arranged as

$$\beta =\left(\begin{array}{l}{\alpha}_{1}\\ {\alpha}_{2}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\vdots \\ {\alpha}_{6}\\ {\beta}_{1}\end{array}\right).$$

In this case, the first design matrix for year 1 looks like

$$X\left\{1\right\}=\left(\begin{array}{ccccc}1& 0& \cdots & 0& {x}_{11}\\ 1& 0& \cdots & 0& {x}_{12}\\ \vdots & \vdots & \cdots & 0& \vdots \\ 1& 0& \cdots & 0& {x}_{18}\end{array}\right),$$

and the second design matrix for year 2 looks like

$$X\left\{2\right\}=\left(\begin{array}{cccccc}0& 1& 0& \cdots & 0& {x}_{21}\\ 0& 1& 0& \cdots & 0& {x}_{22}\\ \vdots & \vdots & 0& \cdots & 0& \vdots \\ 0& 1& 0& \cdots & 0& {x}_{28}\end{array}\right).$$

The design matrices for the remaining 4 years are similar.

K = 7; N = n*d; X = cell(n,1); for i = 1:n x0 = zeros(d,K-1); x0(:,i) = 1; X{i} = [x0,x(i:n:N)]; end

**Fit the model.**

Fit the model using ordinary least squares (OLS).

[b,sig,E,V] = mvregress(X,Y,'algorithm','cwls'); b

b = 41.6878 26.1864 -64.5107 11.0924 -59.1872 71.3313 4.9525

**Plot fitted model.**

xx = linspace(min(x),max(x)); axx = repmat(b(1:K-1),1,length(xx)); bxx = repmat(b(K)*xx,n,1); yhat = axx + bxx; figure() hPoints = gscatter(x,y,year); hold on hLines = plot(xx,yhat); for i=1:n set(hLines(i),'color',get(hPoints(i),'color')); end hold off

The model with year-specific intercepts and common slope appears to fit the data quite well.

**Residual correlation.**

Plot the residuals, grouped by year.

```
figure()
gscatter(year,E(:),city)
ylabel('Residuals')
```

The residual plot suggests concurrent correlation is present. For examples, cities 1, 2, 3, and 4 are consistently above or below average as a group in any given year. The same is true for the collection of cities 5, 6, 7, and 8. As seen in the exploratory plots, there are no systematic city-specific effects.

**Panel corrected standard errors.**

Use the estimated error variance-covariance matrix to compute panel corrected standard errors for the regression coefficients.

XX = cell2mat(X); S = kron(eye(n),sig); Vpcse = inv(XX'*XX)*XX'*S*XX*inv(XX'*XX); se = sqrt(diag(Vpcse))

se = 9.3750 8.6698 9.3406 9.4286 9.5729 8.8207 0.1527

Was this topic helpful?