MATLAB Examples

Medium / Long Term Energy Forecasting with Econometrics

Goal - Produce a reliable med term forecasting model for Energy Demand

Challenge - current med/long term models in Australia are regression based. These models are now incorrect as demand has been decreasing. An econometrics based model is dynamic and hence can capture the up & down swings that may occur in the future.

Current model is Univariate, ARIMA/GARCH

Data - Data used is historical Monthly Demand data for NSW along with Sydney Temperature.

David Willingham david.willingham@mathworks.com.au May, 2014

Contents

Load Monthly Aggregated Data

% Uncomment if you wish to show date pre-processing, which aggregates the data to monthly
% takes about 100s
% ds = '31-Jan-1999';
% dE = '30-Jun-2014';
%
% tic
% TEcon = data_prep(ds,dE);
% toc

% Loading in historical monthly Load Data
load T
T(end,:) = []; %removing last row as it's incomplete
F = T.MonthlyTotalDemand;

D = [num2str(T.Months),repmat('/',length(T.Months),1),num2str(T.Years)];
D = datenum(D,'mm/yyyy');

plot(D,F)
dynamicDateTicks
ylabel('Demand')
xlabel('Dates')
title('Historical Monthly Demand for NSW')

% Split data into training and test data set.
Fin = F(1:end-12);
Fout = F(end-11:end);
Datein = D(1:end-12);
Dateout = D(end-11:end);

Fit the trend

try and fit using Curve Fitting toolbox highlights it flaws with the downturn in demand

% show cftool fitting 6th order sum of sin
% cftool(Datein,Fin) %6th order sum of sin
cftool('cfit.sfit') %presaved 6th order sum of sin
mdl = createFit(Datein,Fin) %6th order sum of sin

% try again but only using data up to the downturn
% then see what happens when it's applied to the rest of the data set.

% cftool(Datein(1:110),Fin(1:110)) %6th order sum of sin
mdl = createFit1(Datein(1:110),Fin(1:110)); %6th order sum of sin

figure
% plot(D,F)
Fin2 = mdl(Datein);
plot(Datein,Fin,'ro')
hold on
plot(Datein,Fin2)
dynamicDateTicks
ylabel('Demand')
xlabel('Dates')
legend('actual demand','forecast demand','Location','best')
title('Demand Forecast based on 6th order sum of sin')
mdl = 
     General model Sin6:
     mdl(x) =  
                    a1*sin(b1*x+c1) + a2*sin(b2*x+c2) + a3*sin(b3*x+c3) + 
                    a4*sin(b4*x+c4) + a5*sin(b5*x+c5) + a6*sin(b6*x+c6)
       where x is normalized by mean 7.327e+05 and std 1524
     Coefficients (with 95% confidence bounds):
       a1 =    1.74e+08  (-1.565e+13, 1.565e+13)
       b1 =      0.7564  (-1312, 1313)
       c1 =       1.735  (-596, 599.4)
       a2 =   1.614e+08  (-1.565e+13, 1.565e+13)
       b2 =      0.7851  (-1371, 1372)
       c2 =      -1.393  (-620.7, 617.9)
       a3 =   1.296e+05  (-1.124e+06, 1.384e+06)
       b3 =         3.8  (-22, 29.6)
       c3 =       1.157  (-13.91, 16.22)
       a4 =   5.957e+04  (-6.299e+05, 7.49e+05)
       b4 =       6.233  (-6.758, 19.23)
       c4 =       -1.19  (-5.115, 2.736)
       a5 =   6.154e+05  (5.319e+05, 6.989e+05)
       b5 =       52.46  (52.32, 52.6)
       c5 =      -2.496  (-2.631, -2.361)
       a6 =   6.684e+05  (5.846e+05, 7.521e+05)
       b6 =       26.18  (26.05, 26.31)
       c6 =     -0.4139  (-0.5404, -0.2875)

Unit Root Tests for Stationarity

Augmented Dickey-Fuller test for unit root H0: Non-Stationary process H1: Stationary process

Note, the example here uses the ADFTEST function in its simplest form using the default number of lags (0), the default model (AR) and the default test statistic (t1). Besides customizing this test, there exists a variety of alternative tests:

  • pptest: Phillips-Perron test
  • kpsstest: KPSS test for trend stationarity
[H,pValue] = adftest(diff(F));
if (H == false)
    disp(['The null hypothesis cannot be rejected with a pValue of ', ...
        num2str(pValue)])
    disp('The data does not show enough evidence of stationarity')
else
    disp(['The null hypothesis can be rejected with a pValue of ', ...
        num2str(pValue)])
    disp('The data shows strong evidence of stationarity')
end
Warning: Test statistic #1 below tabulated critical values:
minimum p-value = 0.001 reported. 
The null hypothesis can be rejected with a pValue of 0.001
The data shows strong evidence of stationarity

Test for ARCH Effects (Residual Autocorrelation)

Ljung-Box Q-test test for residual autocorrelation H0: series of residuals exhibits no autocorrelation H1: series of residuals exhibits some autocorrelation

Note, the example here uses the LBQTEST on the squared residuals. An alternative may be archtest, the Engle test for residual heteroscedasticity, for which you have to determine a suitable number of lags to draw valid inferences from it.

[H,pValue] = lbqtest(diff(F));
if (H == false)
    disp(['The null hypothesis cannot be rejected with a pValue of ', ...
        num2str(pValue)])
    disp('The data does not show enough evidence of autocorrelation')
else
    disp(['The null hypothesis can be rejected with a pValue of ', ...
        num2str(pValue)])
    disp('The data shows strong evidence of autocorrelation')
end
The null hypothesis can be rejected with a pValue of 0
The data shows strong evidence of autocorrelation

Identifying a suitable model

Now that we have verified to have a stationary, autocorrelated time series, we still have to identify a suitable model structure.

By computing the ACF and PACF of the stationary time series and of its squares, one can identify the orders of a first candidate model. The following rules of thumb may be useful:

  • If the PACF of the series displays a sharp cutoff and/or the lag-1 autocorrelation is positive, i.e. if the series appears slightly "underdifferenced", then consider adding an AR term to the model. The lag at which the PACF cuts off is the indicated number of AR terms.
  • If the ACF of the series displays a sharp cutoff and/or the lag-1 autocorrelation is negative, i.e. if the series appears slightly "overdifferenced", then consider adding an MA term to the model. The lag at which the ACF cuts off is the indicated number of MA terms.
  • It is possible for an AR term and an MA term to cancel each other's effects, so if a mixed AR-MA model seems to fit the data, also try a model with one fewer AR term and one fewer MA term.
figure
subplot(2,2,1)
autocorr(diff(F),12);
subplot(2,2,2)
parcorr(diff(F),12);
subplot(2,2,3)
autocorr(diff(F).^2,12);
subplot(2,2,4)
parcorr(diff(F).^2,12);

% There is difficulty looking for ARIMA/GARCH effects, lets consider adding
% a seasonal filter and look at the underlying trend

Seasonality Filter

Do the same process again, adding a 12 pt seasonal filter

%12 pt seasonal filter
pts = 12;
win = ones(pts,1)./pts;
F2 = convn(F,win,'same'); %applying filter
F3 = F2(6:end-6); %removing moving avg window edges

figure
plot(D,F)
hold on
plot(D,F2,'r')
dynamicDateTicks
ylabel('Demand')
xlabel('Dates')
title('Historical Monthly Demand for NSW')
legend('Original','Filtered Data')

ARIMA/GARCH effects in filtered (de seaonsonlised data)

Let's now repeat the process and look for ARIMA/GARCH effects

% Show There is evidence of GARCH effects
[H,pValue] = adftest(diff(F3));
if (H == false)
    disp(['The null hypothesis cannot be rejected with a pValue of ', ...
        num2str(pValue)])
    disp('The data does not show enough evidence of stationarity')
else
    disp(['The null hypothesis can be rejected with a pValue of ', ...
        num2str(pValue)])
    disp('The data shows strong evidence of stationarity')
end

[H,pValue] = lbqtest(diff(F3));
if (H == false)
    disp(['The null hypothesis cannot be rejected with a pValue of ', ...
        num2str(pValue)])
    disp('The data does not show enough evidence of autocorrelation')
else
    disp(['The null hypothesis can be rejected with a pValue of ', ...
        num2str(pValue)])
    disp('The data shows strong evidence of autocorrelation')
end
% taking 1st difference for 1 integrated term (I)
figure
subplot(2,2,1)
autocorr(diff(F3),12); %AR term difficult to determine, lets use 1 for now
subplot(2,2,2)
parcorr(diff(F3),12); %shows MA term of 1
subplot(2,2,3)
autocorr(diff(F3).^2,12); % shows P Garch term of lag 1
subplot(2,2,4)
parcorr(diff(F3).^2,12); % Shows Q Garch term of lag 1

% It shows there is an ARIMA model of at least 1,1,1
% It shows there is a GARCH model of 1,1
Warning: Test statistic #1 below tabulated critical values:
minimum p-value = 0.001 reported. 
The null hypothesis can be rejected with a pValue of 0.001
The data shows strong evidence of stationarity
The null hypothesis can be rejected with a pValue of 0
The data shows strong evidence of autocorrelation

Build and forecast ARIMA/GARCH model

garchMdl = garch(1,1);
% garchMdl = garch(0,1); %show even with Garch terms, not much difference
% in error
mdl2 = arima('D',1,'Seasonality',12,'MALags',1,'SMALags',1,'Variance',garchMdl); %there is a montly trend
fit = estimate(mdl2,Fin);
[yf,ymse] = forecast(fit,12,'Y0',Fin);
[ysim] = simulate(fit,12,'Y0',Fin,'NumPaths',5)
figure
plot(D,F)
hold on
plot(Dateout,Fout,'m')
plot(Dateout,yf,'r');
plot(Dateout,yf+1.96*sqrt(ymse),'r:')
plot(Dateout,yf-1.96*sqrt(ymse),'r:')
% plot(Dateout,ysim) %show montecarl simulations if desired
dynamicDateTicks

% Error calculation looking 1 year forward
error = yf - Fout;
errorpct =abs(error)./Fout*100;
MAE = mean(abs(error));
MAPE = mean(errorpct(~isinf(errorpct)));
title(['Long Term Energy Forecast NSW - Mean Error ',num2str(MAPE),'%'])
xlabel('Date')
ylabel('Total Monthly Demand')
legend('Historical Demand','Actual Demand','Predicted Demand','95% conf bound','95% conf bound')
dynamicDateTicks
% observations - model captures seasonal aspect well
% misses overall declining trend
 
    ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(1):
    ---------------------------------------------------------------
    Conditional Probability Distribution: Gaussian

                                  Standard          t     
     Parameter       Value          Error       Statistic 
    -----------   -----------   ------------   -----------
     Constant        -3205.9       15601.5      -0.205486
        MA{1}      -0.272396             0              0
       SMA{1}      -0.272396     0.0576038       -4.72878
 
 
    GARCH(1,1) Conditional Variance Model:
    ----------------------------------------
    Conditional Probability Distribution: Gaussian

                                  Standard          t     
     Parameter       Value          Error       Statistic 
    -----------   -----------   ------------   -----------
     Constant    7.55219e+09    0.00220816    3.42013e+12
     GARCH{1}       0.913404     0.0139327        65.5582
      ARCH{1}      0.0324951     0.0143506        2.26436
ysim =
   1.0e+07 *
    1.2257    1.2355    1.2986    1.2741    1.2038
    1.2514    1.2091    1.2439    1.2700    1.2224
    1.1931    1.2169    1.2229    1.2539    1.2254
    0.9624    1.0561    1.0654    1.0634    0.9060
    1.0283    0.9893    1.0229    1.0616    0.9764
    1.0087    0.9854    1.0741    1.0551    0.9508
    1.0891    1.0082    1.0156    1.0882    1.0277
    1.2056    1.1002    1.1298    1.2153    1.1230
    1.0377    0.9424    0.9912    1.0334    0.9233
    1.0990    1.0681    1.0955    1.1184    1.0623
    1.1052    0.9738    1.0214    1.1046    0.9750
    1.1331    1.0843    1.1171    1.2382    1.0679

Do the same again, this time stop just before the downturn in energy

% Split data into training and test data set.
F2 = F(1:128);
D2 = D(1:128);
Fin = F2(1:end-12);
Fout = F2(end-11:end);
Datein = D2(1:end-12);
Dateout = D2(end-11:end);

% garchMdl = garch(1,1);

% need to do different Garch model as Garch(1,1) errors out
garchMdl = garch(0,1);
% in error
mdl2 = arima('D',1,'Seasonality',12,'MALags',1,'SMALags',1,'Variance',garchMdl); %there is a montly trend
fit = estimate(mdl2,Fin);
[yf,ymse] = forecast(fit,12,'Y0',Fin);
figure
plot(D2,F2)
hold on
plot(Dateout,Fout,'m')
plot(Dateout,yf,'r');
plot(Dateout,yf+1.96*sqrt(ymse),'r:')
plot(Dateout,yf-1.96*sqrt(ymse),'r:')
dynamicDateTicks

% Error calculation looking 1 year forward
error = yf - Fout;
errorpct =abs(error)./Fout*100;
MAE = mean(abs(error));
MAPE = mean(errorpct(~isinf(errorpct)));
title(['Long Term Energy Forecast NSW - Mean Error ',num2str(MAPE),'%'])
xlabel('Date')
ylabel('Total Monthly Demand')
legend('Historical Demand','Actual Demand','Predicted Demand','95% conf bound','95% conf bound')
dynamicDateTicks
% observations - model captures seasonal aspect well
% misses overal declining trend and error goes up
Warning: Rank deficient, rank = 1, tol =  1.348921e+01. 
Warning: Lower bound constraints are active; standard errors may be inaccurate. 
 
    ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(1):
    ---------------------------------------------------------------
    Conditional Probability Distribution: Gaussian

                                  Standard          t     
     Parameter       Value          Error       Statistic 
    -----------   -----------   ------------   -----------
     Constant        2817.66       14549.1       0.193665
        MA{1}      -0.388308     0.0550042       -7.05961
       SMA{1}      -0.388308     0.0550451       -7.05436
 
 
    GARCH(0,0) Conditional Variance Model:
    ----------------------------------------
    Conditional Probability Distribution: Gaussian

                                  Standard          t     
     Parameter       Value          Error       Statistic 
    -----------   -----------   ------------   -----------
     Constant    1.33401e+11     0.0032421    4.11466e+13

Backtesting results

% Creating BackTest that:
% 1. Rebuild's the model everymonth
% 2. Forecasts ahead 12 months.
% 3. Compares to actual results
% 4. Averages the 12 month monthly error's

steps = length([108:1:length(D)]);

count = 1;
h = waitbar(count/steps,'Please Wait') ;
tic
for i = [108:1:length(D)]
    waitbar( count/steps,h)
    Ttemp = F(1:i,:);
    Tdates = D(1:i,:);
    % Split data into training and test data set.
    Fin = Ttemp(1:end-12,:);
    Find = Tdates(1:end-12,:);
    Fout = Ttemp(end-11:end,:);
    Foutd = Tdates(end-11:end,:);

    garchMdl = garch(0,1);
    mdl3 = arima('D',1,'Seasonality',12,'MALags',1,'SMALags',1,'Variance',garchMdl); %there is a montly trend
    fit = estimate(mdl3,Fin,'display','off');
    [yf,ymse] = forecast(fit,12,'Y0',Fin);

    error = yf - Fout;
    errorpct =abs(error)./Fout*100;
    MAEbtest(i,:) = mean(abs(error));
    MAPEbtest(i,:) = mean(errorpct(~isinf(errorpct)));

    count = count+1;
end
toc
close(h)
Warning: Rank deficient, rank = 1, tol =  5.777667e+01. 
Warning: Rank deficient, rank = 1, tol =  5.549453e+01. 
Warning: Rank deficient, rank = 1, tol =  5.809548e+01. 
Warning: Rank deficient, rank = 1, tol =  5.885973e+01. 
Warning: Rank deficient, rank = 1, tol =  2.300511e+01. 
Warning: Rank deficient, rank = 1, tol =  5.948340e+01. 
Warning: Rank deficient, rank = 1, tol =  1.951760e+01. 
Warning: Rank deficient, rank = 1, tol =  6.046812e+01. 
Warning: Rank deficient, rank = 1, tol =  4.080914e+01. 
Warning: Rank deficient, rank = 1, tol =  6.158290e+01. 
Warning: Rank deficient, rank = 1, tol =  9.615351e+01. 
Warning: Rank deficient, rank = 1, tol =  6.190033e+01. 
Warning: Rank deficient, rank = 1, tol =  5.726661e+01. 
Warning: Rank deficient, rank = 1, tol =  6.194501e+01. 
Warning: Rank deficient, rank = 1, tol =  6.546999e+01. 
Warning: Rank deficient, rank = 1, tol =  5.112819e+01. 
Warning: Rank deficient, rank = 1, tol =  3.937122e+01. 
Warning: Rank deficient, rank = 1, tol =  4.876619e+01. 
Warning: Rank deficient, rank = 1, tol =  4.737291e+01. 
Warning: Rank deficient, rank = 1, tol =  5.888880e+01. 
Warning: Rank deficient, rank = 1, tol =  8.298769e+01. 
Warning: Rank deficient, rank = 1, tol =  6.774759e+01. 
Warning: Rank deficient, rank = 1, tol =  6.747853e+01. 
Warning: Rank deficient, rank = 1, tol =  6.811806e+01. 
Warning: Rank deficient, rank = 1, tol =  2.030700e+02. 
Warning: Rank deficient, rank = 1, tol =  7.843788e+01. 
Warning: Rank deficient, rank = 1, tol =  1.348921e+01. 
Warning: Rank deficient, rank = 1, tol =  3.050921e+01. 
Warning: Rank deficient, rank = 1, tol =  6.778595e+01. 
Warning: Rank deficient, rank = 1, tol =  1.247101e+02. 
Warning: Rank deficient, rank = 1, tol =  9.592531e+01. 
Warning: Rank deficient, rank = 1, tol =  4.407777e+01. 
Warning: Rank deficient, rank = 1, tol =  5.302340e+01. 
Warning: Rank deficient, rank = 1, tol =  8.969668e+01. 
Warning: Rank deficient, rank = 1, tol =  1.992132e+02. 
Warning: Rank deficient, rank = 1, tol =  1.915360e+02. 
Warning: Rank deficient, rank = 1, tol =  5.442397e+01. 
Warning: Rank deficient, rank = 1, tol =  2.843012e+02. 
Warning: Rank deficient, rank = 1, tol =  6.690010e+02. 
Warning: Rank deficient, rank = 1, tol =  3.140100e+02. 
Warning: Rank deficient, rank = 1, tol =  1.477979e+02. 
Warning: Rank deficient, rank = 1, tol =  2.769302e+01. 
Warning: Rank deficient, rank = 1, tol =  4.423201e+01. 
Warning: Rank deficient, rank = 1, tol =  1.604503e+02. 
Warning: Rank deficient, rank = 1, tol =  1.028534e+02. 
Warning: Rank deficient, rank = 1, tol =  7.439498e+01. 
Warning: Rank deficient, rank = 1, tol =  1.251831e+02. 
Warning: Rank deficient, rank = 1, tol =  9.861621e+01. 
Warning: Rank deficient, rank = 1, tol =  1.108615e+02. 
Warning: Rank deficient, rank = 1, tol =  8.482719e+01. 
Warning: Rank deficient, rank = 1, tol =  5.269826e+01. 
Warning: Rank deficient, rank = 1, tol =  1.005678e+02. 
Warning: Rank deficient, rank = 1, tol =  2.156968e+02. 
Warning: Rank deficient, rank = 1, tol =  4.760483e+02. 
Warning: Rank deficient, rank = 1, tol =  3.652522e+02. 
Warning: Rank deficient, rank = 1, tol =  1.875755e+02. 
Warning: Rank deficient, rank = 1, tol =  1.097258e+02. 
Warning: Rank deficient, rank = 1, tol =  1.900944e+02. 
Warning: Rank deficient, rank = 1, tol =  1.455059e+02. 
Warning: Rank deficient, rank = 1, tol =  6.520350e+01. 
Warning: Rank deficient, rank = 1, tol =  1.731317e+02. 
Warning: Rank deficient, rank = 1, tol =  2.088591e+02. 
Warning: Rank deficient, rank = 1, tol =  2.816248e+02. 
Warning: Rank deficient, rank = 1, tol =  1.922926e+02. 
Warning: Rank deficient, rank = 1, tol =  1.816824e+02. 
Warning: Rank deficient, rank = 1, tol =  1.484791e+02. 
Warning: Rank deficient, rank = 1, tol =  4.686741e+02. 
Warning: Rank deficient, rank = 1, tol =  5.389766e+02. 
Warning: Rank deficient, rank = 1, tol =  5.483263e+02. 
Warning: Rank deficient, rank = 1, tol =  5.631838e+02. 
Warning: Rank deficient, rank = 1, tol =  3.212220e+02. 
Warning: Rank deficient, rank = 1, tol =  4.718760e+02. 
Warning: Rank deficient, rank = 1, tol =  3.922035e+02. 
Warning: Rank deficient, rank = 1, tol =  6.800673e+02. 
Warning: Rank deficient, rank = 1, tol =  4.395927e+02. 
Warning: Rank deficient, rank = 1, tol =  8.096288e+02. 
Warning: Rank deficient, rank = 1, tol =  7.132119e+02. 
Warning: Rank deficient, rank = 1, tol =  7.740225e+02. 
Warning: Rank deficient, rank = 1, tol =  2.622484e+02. 
Warning: Rank deficient, rank = 1, tol =  1.783237e+02. 
Warning: Rank deficient, rank = 1, tol =  6.341032e+02. 
Warning: Rank deficient, rank = 1, tol =  3.138847e+02. 
Warning: Rank deficient, rank = 1, tol =  4.849516e+02. 
Warning: Rank deficient, rank = 1, tol =  7.822079e+02. 
Elapsed time is 139.340660 seconds.

Plotting Error

I = find(MAPEbtest ~= 0);
derror_btest = D(I);
mape_btest = MAPEbtest(I);
figure
plot(derror_btest,mape_btest)
dynamicDateTicks
title(['Backtesting Results Mean Abs Perc Error Pct - ',num2str(mean(mape_btest)),'%'])
ylabel('Percentage')
xlabel('Date')