Code covered by the BSD License  

Highlights from
Data Analysis with MATLAB for Excel Users

image thumbnail

Data Analysis with MATLAB for Excel Users

by

 

09 Mar 2007 (Updated )

Materials for public seminar of the same name (slides and demos)

Energy Forecasting

Energy Forecasting

This demo showcases visualization and analysis (heavy statistics) for forecasting energy usage based on historical data. We have access to hour-by-hour utility usage for the month of January, including information on the day of the week and the Heating Degree Days (defined as 65 minus Average Temperature) of each day. Using this information, we will come up with an algorithm for forecasting future energy usage based on parameters such as day-type, forecasted temperature, and time of day.

Contents

Clear the workspace

clear all;
close all;
set(0, 'defaultFigureWindowStyle', 'docked');

Load Data

Load data from worksheet

fileName   = 'January.xls';
sheetName  = 'Total'  ;
energyData = xlsread(fileName, sheetName  , 'B2:Y32');
DayType    = xlsread(fileName, 'Weather'  , 'B2:B32');
HDD        = xlsread(fileName, 'Weather'  , 'C2:C32'); %Heating Degree Days
Temp       = xlsread(fileName, 'Weather'  , 'D2:D32');

numDays   = size(energyData,1);
numHours  = size(energyData,2);

allDays   = {'Monday', 'Tuesday', 'Wednesday', 'Thursday', ...
  'Friday', 'Saturday', 'Sunday'};

Visualization

Quick visualization of data (in 3D and contour)

myCreateFigure(energyData);

Statistics - Examine Distribution

In order to do some statistics, I am going to determine the distribution type of this data set.

% Create an average day
averageDayProfile = mean(energyData);

% Expand to an average month
averageMonth = meshgrid(averageDayProfile, 1:31);

% Look at difference between average month
res = energyData-averageMonth;
myDistFit(res);

Visualize Confidence Intervals

From the distribution analysis, I will conclude that the data set has a normal distribution. With this information, we can perform more in-depth statistics based on normal distribution.

The following custom plotting routine takes data as an input, calculate the estimated mean and 95% confidence interval of the mean, and plots the the results.

% Mean, Standard Deviation, and Confidence Intervals
[meanD,stdevD,ciD] = normfit(energyData );    % across days
[meanH,stdevH,ciH] = normfit(energyData');    % across hours
plotCI(energyData,meanD,stdevD,ciD,meanH,stdevH,ciH);

Observations

As you can see, the daily profile seems to have a very tight confidence interval, suggesting that the general trend throughout the day is similar from day to day.

Average Day Effects

By looking at average profiles for each day of the week, we can make some observations on daily trends.

for id = 1:7  % Monday thru Sunday
  dat(id,:) = mean(energyData(DayType==id,:),1); %#ok
end
figure;
plot(dat');
xlabel('hours');
ylabel('system load (MW)');
legend(allDays{:}, 'Location', 'Best');

Observations

We can see that the morning energy spike is not prominent on the weekends. Also, Mondays tend to have more usage throughout the day, and Saturdays have the lowest usage.

Energy Usage Predictions

Forecast usage based on time of day and day of week.

dayOfWeek = 'Monday';
timeOfDay = 15      ;   % 3PM

% call a function for estimating energy usage
estimateSingleDay(energyData, DayType, dayOfWeek, timeOfDay);
sysLoad =

   84.6506


sysLoadCI =

   81.5431
   87.7581

Correlation with Heating Degree Days

We have a single piece of information per day regarding the temperature of each day (Heating Degree Days). Let's try to see if this information can be used as an indicator for any of the statistics of the profile. We can determine this by examining the correlation coefficients between HDD and the various statistical parameters.

% calculate other statistics

medianH = median(energyData, 2    );
minH    = min   (energyData, [], 2);
maxH    = max   (energyData, [], 2);
rangeH  = range (energyData, 2    );

% Calculate a matrix of correlation coefficients and a matrix of p-values.
% We're only interested in the first row (or column) of the matrices, since
% we want to see how "hdd" correlates to the other variables
[R,P] = corrcoef([HDD,meanH',medianH,minH,maxH,rangeH,stdevH']);

% Display on screen
fprintf('      ');fprintf('%7s ', 'mean', 'median', 'min', 'max', 'range', 'stdev');fprintf('\n');
fprintf('corr: ');fprintf('%7.4f ', R(2:end,1));fprintf('\n');
fprintf('p-val:');fprintf('%7.4f ', P(2:end,1));fprintf('\n');
         mean  median     min     max   range   stdev 
corr:  0.1853  0.1453  0.1389  0.3772  0.3774  0.1850 
p-val: 0.3183  0.4353  0.4560  0.0365  0.0364  0.3191 

Observations

Correlation of 1 means perfect positive correlation. The result seems to show that Heating Degree Days is a good indicator of the maximum system load for that day. It has relatively high correlation and is significant.

Conclusion

We explored historical energy usage data to develop a forecasting system. Statistics and visualizations revealed that there are usage trends throughout the day, and the trends seem to depend on the day of the week. This knowledge can be used for a gross forecast of the energy usage. We also found that there are some correlations between the Heating Degree Days (HDD) and certain statistical parameters of the profile. We may use this information to fine tune our forecast.

Further analysis may provide more insights to developing this forecasting system.

Code Cleanup

set(0, 'defaultFigureWindowStyle', 'remove');

Contact us