How to code Categorical Variables in NARX neural network data input?

8 views (last 30 days)
I am working to predict electricity demand (load) and I am having many categorical variables as inputs to a Neural Network Time Series NARX app (eg: months (12 categories spelled out January -December), days (seven categories: 1 - 7), and Hours in each day (1 thru 24). When I load my excel data table to assign "Inputs" as my variables, the Matlab is not able to read and display my categorical variable "Months" because the values are spelled out January thru December. Should I write a simple line code such as below, or is there a different way to flag those variables as Categorical for NARX neural networks? I prefer not to convert Months into 1-12 as Matlab will assume some scale (Month 12 is higher than Month 6, etc). Thank you in advance!
T.HE = categorical(T.HE); T.MONTH = categorical(T.MONTH);T.WEEKDAY = categorical(T.WEEKDAY);
  3 Comments
SK
SK on 3 Jan 2020
Hi Awesmm, thank you for your willingness to help.
T is my excel table with column variables. Three of the variables are HE (hours ending); MONTHS (Spelled out January, February, etc) and Weekday (numerical 1 thru 7).
When I code these variables as "Categorical" in the command line using the following code:
>> T.HE = categorical(T.HE); T.MONTH = categorical(T.MONTH);T.WEEKDAY = categorical(T.WEEKDAY);
>> y = T.Actual_Wind;
>> x = T(:,3:end);
>> input = tonndata(table2array(x),false, false);
I am getting the following errors....
Error using table2array (line 37)
Unable to concatenate the specified table variables.
Caused by:
Error using categorical/cat (line 52)
Unable to concatenate a double array and a categorical array.
Walter Roberson
Walter Roberson on 3 Jan 2020
You will not be able to proceed with the Mathworks tools and will need to write your own. The Mathworks tools can only work with data that is all (orderable) numeric, or all categorical, or all cell array of character vectors.
Even if you were to switch to all categorical you would have challenges: when you concatenate together categorical arrays, the individual ranges loose their identity and a new categorical array is created that combines all of the categories, renumbering elements. The neural networks would have no way of knowing that the second column could not simultaneously have Tuesday and March for example.
However as I touched on in my Answer, I think you are making a mistake in trying to make the entries unordered. When you make them unordered you are saying that the second day of February has more predictive power for load on the second day of August than the first day of August has for the second day of August.

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 3 Jan 2020
T.MONTH_C = categorical(T.MONTH, {'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'}, 'ordinal', false);
T.HE_C = categorical(T.HOUR, 1:24, {'01:00', '02:00', '03:00', '04:00', ....... '24:00'}, 'ordinal', false);
T.WEEKDAY_C = categorical(T.WEEKDAY, 1:7, {'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'}, 'ordinal', false);
I prefer not to convert Months into 1-12 as Matlab will assume some scale (Month 12 is higher than Month 6, etc)
I do not know what part of the world you live in, but in the part of the world that I live in, the electrical demands between adjacent calendar months are strongly correlated. The relationship between the demands for January and February are much stronger than the relationship between the demands between January and June.
  2 Comments
SK
SK on 3 Jan 2020
Walter: I appreciate your comments a lot. To make sure I understand you correctly, you are recommending to recode ALL my categorical variables into zeros and ones (binary) by creating additional 12 Columns for months, 24 columns for hours and 7 columns for days of the week? There is no other way of letting Neura Network (NARX) know that these are "Categorical Variables"? I am struggling to understand why in Neural Networks it is not as easy to declare those categorical variables as in Regression based Decision Tree models such as Bag or LSBoost.
P.S. Apologies for responding to your comments late. I do appreciate your help. Thx!
Walter Roberson
Walter Roberson on 3 Jan 2020
If I understand correctly, you can pass in a categorical array. However, all entries in the array would draw from the same categorization so you would not be able to create one column of the array that was restricted to weekdays and another column that was restricted to month.
If you were to switch to the one-of-N 0/1 representation then you would be able to combine that with non-categorical columns.
However as I have indicated above, I think that you are making a mistake.
There are several different kinds of electrical load. The ones such as cooking statistically peak around the same time every weekday, possibly a different peak time on Sunday.
The ones such as laundry tend to be more cyclic with irregular period depending on family size and age (e.g. I tend to do laundry on Sunday but families with small children might need laundry every day or two) . You can try day-of-week predictions for this kind but the correlation might not be so strong.
Then there is electricity for heating and cooling. The correlations for those are strong by adjacent calendar days: the weather tomorrow will not be all that difference (on average) from the weather today. Very few places fluctuate randomly between +30C and -30c on a daily basis, but the +30 highs tend to cluster and the -30 lows tend to cluster.
In the part of the world that I live in, the highest electricity demands are February because that is our coldest month and we have to heat a lot. There is also a notable July peak due to the need for cooling.
There are other parts of the world where the peak for the year is reliably local Summer, because of the strong cooling requirements.
These building heating and cooling requirements based upon weather are the biggest predictors by far of electricity load in many places, and you will be making a mistake to convert all of your date information into unordered categorical because the seasonal hints are ordered.
Within stretches short enough to be much the same weather, you do get weekday based and time if day cycles, with industrial use peaking during "working hours" for some industries (others work all night too), and non-heating residential use peaking at evening meal time (and again a little later for dishwasher use). So some cyclic analysis is good, but you need to know what you are analyzing.

Sign in to comment.

More Answers (1)

SK
SK on 3 Jan 2020
Walter can you please elaborate on your comment: "If you were to switch to the one-of-N 0/1 representation then you would be able to combine that with non-categorical columns." If I understand you correctly, you are suggesting here not to create 12 additional column vectors for months but instead recode them into 1/12 for Jan, 2/12 for Feb....... and 1 for Dec (12/12)? Is this correct? If that case I save effort and space instead of creating tones of additional column vectors. Is this correct ? if not, please spell out your answer. Thanks!
Re: energy peaks... we are managing the Nat Gas Combined Cycle plant out in California, CAISO area. It's treaky, as most my models are breaking. So I decided to try Neural Network to predict the Wind and Sun generation load given the temperature forecast (cloud coverage, Temp, wind speed, solar radiation, humidity, etc)
  4 Comments
Walter Roberson
Walter Roberson on 4 Jan 2020
Yes, that makes sense. Version 2 corresponds to using unordered categories, and Version 1 corresponds to using ordered categories.

Sign in to comment.

Categories

Find more on Descriptive Statistics in Help Center and File Exchange

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!