MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

# Non-linear regression

Asked by Yasmine Tamimi on 19 Dec 2011

Hey everyone,

I want to make long-term load forecasting using GA. So the first step is to come up with a model, in one of the papers the objective function is a polynomial of tenth order:

obj= c10*x.^10 + c9*x.^9 + c8*x.^8 + c7*x.^7 + c6*x.^6 + c5*x.^5 + c4*x.^4 + c3*x.^3 + c2*x.^2 + c1*x.^1 + c0*x.^0;

In order to make the obj function ready for the GA I need to estimate the coefficients.

The rest of my code is as follows:

>> f = @(c,x) 1 + c(1)*x.^1 + c(2)*x.^2 + c(3)*x.^3 + c(4)*x.^4 + c(5)*x.^5 + c(6)*x.^6 + c(7)*x.^7 + c(8)*x.^8 + c(9)*x.^9 + c(10)*x.^10;

>> cfit = nlinfit(xdata,ydata,f,c)

all the data that I have are the years from 1982 till 1991 and the corresponding demand in each year.

I didn't understand nlinfit quite well,, what I am supposed to put in place of xdata, ydata and c.

Any help will be appreciated.

## Products

No products are associated with this question.

Answer by Image Analyst on 19 Dec 2011

Why do you call that non-linear regression? It's just a regular polynomial and it's linear in the coefficients, c. You don't have c(6)^2 or log(c(5)) or anything non-linear like that. It's just c(#) to the first power multiplied by the x to some power. Because your x are non-linear does not make it non-linear regression. All your c's are linear so it's linear regression. So you can simply use polyfit() and simplify your life.

Walter Roberson on 19 Dec 2011

A 10th order polynomial will have 10 maxima, 10 minima and 10 zeros, which _I_ would not consider to be "oscillating wildly". But it will indeed rapidly go off the + or - infinity.

Image Analyst on 20 Dec 2011

In examples I've seen, the points "in between" two "training" points tend to move farther away from a straight line in between the two training points as the order of the equation grows larger. That's what I meant. And those in between points are a lot more sensitive to slight changes in the training points locations.

Walter Roberson on 20 Dec 2011

Hmmm, that's probably provable, too -- though it could plausibly be the case that if the zeros were carefully positioned at decreasing intervals that at least one measure of the swing might decrease.... Yes, indeed, I have just constructed a sequence whose zeros do not change, but whose maximums swing less and less as the length of the sequence increases. Certainly, though, in my first trial series the maximums increased distinctly as the length of the sequence increased.

Answer by Richard Willey on 19 Dec 2011

I'd strongly suggestion that you watch a webinar titled "Electricity Load and Price Forecasting with MATLAB". The webinar is available at: http://www.mathworks.com/company/events/webinars/wbnr51423.html

All of the code and the data sets are available on MATLAB Central.

This webinar shows two different ways to model the demand for electric power. The first is based on a neural network. The second uses bagged decision trees. The code also includes safeguards to protect against overfitting.

I'm also going to point you at a blog posting that I wrote on data driven fitting. If you are primarily worried about interpolation you might find this a useful alternative to high order polynomials

http://blogs.mathworks.com/loren/2011/01/13/data-driven-fitting/

## 1 Comment

Yasmine Tamimi on 19 Dec 2011

Thnx a lot Richard for the webinar, but I have to use GA instead of NN and long-term forecasting instead of short.

Answer by Greg Heath on 19 Dec 2011

You definitely do not want a high order polynomial for prediction.

Check out Richard's references.

Greg

Answer by Yasmine Tamimi on 19 Dec 2011

Actually the prediction went wrong, it seems like I am having an error during the running of GA!! so I minimized the order till 2 and I still have the same error!! here's my code:

FIRST M-FILE:

format long e

f = @(c,x) c(1)*x^2 + c(2)*x^1 + c(3)*x^0;

% I have 20 data points for both the years and the load but I should use the first 10 to calculate the coefficients and the other 10 should be predicted using ga:

years = [1982 1983 1984 1985 1986 1987 1988 1989 1990 1991];

load = [1702 2344 2097 2313 2588 2885 4341 4779 5251 5721];

% c1 = 4.165909091390513e+001;

% c2 = -1.650490772918446e+005;

% c3 = 1.634786853371606e+008;

SECOND M-FILE:

%% The objective function function y = load_forecast(x)

c1 = 4.165909091390513e+001;

c2 = -1.650490772918446e+005;

c3 = 1.634786853371606e+008;

y = c1*x(1)^2 + c2*x(2)^1 + c3;

THIRD M-FILE:

GenomeLength = 2; % Number of variables in the fitness function

LB = zeros(1,2); % Lower bound

UB = ones(1,2); % Upper bound

Bound = [LB;UB];

% options structure

options = gaoptimset('Vectorized','on','PopulationType','bitstring','CreationFcn',@int_pop,'MutationFcn',{@mutationuniform,0.04},... 'CrossoverFcn',{@crossoverscattered,0.8}, 'PopInitRange' ,Bound, 'Display','Iter','StallGenL',100,'Generations',150, ... 'PopulationSize',50);

AND THE ERROR THAT I GET IS:

??? Reference to non-existent field 'Verbosity'. Error in ==> gacommon at 79 [Iterate.x,Aineq,bineq,Aeq,beq,lb,ub,msg,exitFlag] = ... Error in ==> ga at 269 [x,fval,exitFlag,output,population,scores,FitnessFcn,nvars,Aineq,bineq,Aeq,beq,lb,ub, ... Error in ==> ga_load_forecast at 27 [X,FVAL] =ga(@load_forecast,2,[],[],[],[],LB,UB,[],options);

FINAL QUESTION: the data that I have shouldn't it be incorporated

within the fitness function or GA in any way or another??

really any help is appreciated..

Yasmine Tamimi on 23 Dec 2011

yes although the code seems correct but it gave me answers far away from the actual!!

Yasmine Tamimi on 23 Dec 2011

here is the link to the paper i was trying to simulate part of their findings (2nd order poly. using their fitness function):
but i used the data from another paper bcz i didn't have access to reference [6] where they got the data from:
http://www.waset.org/journals/waset/v6/v6-32.pdf

Yasmine Tamimi on 23 Dec 2011

sorry i forgot to post my ff, here it is:

k = 0.0001; % k is a scaling constant

Actual_Load1 = [1702 2344 2097 2313 2588 2885 4341 4779 5251 5721];
T = [1982 1983 1984 1985 1986 1987 1988 1989 1990 1991]; % Years
t = 10; % number of years

% Here x is considered as the coefficient of the second order polynomial equal,
% and we want to find their optimal values such that the error(residual) is minimized
sum = 0;

for i = 1:t
sum_residual = abs(((x(1)*(T(i))^2) + (x(2)*T(i)) + x(3))- Actual_Load1(i));
sum = sum_residual;
end

y = 1 + (k * sum_residual);

Answer by Richard Willey on 20 Dec 2011

For what its worth, I just took a very quick look at the data set that you provided.

```years = [1982 1983 1984 1985 1986 1987 1988 1989 1990 1991];
load = [1702 2344 2097 2313 2588 2885 4341 4779 5251 5721];
```

You can fit the years 1988 --> 1991 with an almost perfectly straight line. In a similar fashion, the years 1984 --> 1987 with another straight line. In both cases the R^2 is over .995.

I really don't understand that approach that you're taking... I feel like you're trying to force Genetic Algorithms into the solution space regardless of whether this is warranted.

Given that you're primarily interested in using GA, there's one last resource that I'd recommend looking at:

The "Global Optimization with MATLAB Products" provides a very good introduction to GA. You can watch the webinar at: http://www.mathworks.com/company/events/webinars/wbnr43346.html?seq=1