Non-linear regression

6 views (last 30 days)
Yasmin Tamimi
Yasmin Tamimi on 19 Dec 2011
Hey everyone,
I want to make long-term load forecasting using GA. So the first step is to come up with a model, in one of the papers the objective function is a polynomial of tenth order:
obj= c10*x.^10 + c9*x.^9 + c8*x.^8 + c7*x.^7 + c6*x.^6 + c5*x.^5 + c4*x.^4 + c3*x.^3 + c2*x.^2 + c1*x.^1 + c0*x.^0;
In order to make the obj function ready for the GA I need to estimate the coefficients.
The rest of my code is as follows:
>> f = @(c,x) 1 + c(1)*x.^1 + c(2)*x.^2 + c(3)*x.^3 + c(4)*x.^4 + c(5)*x.^5 + c(6)*x.^6 + c(7)*x.^7 + c(8)*x.^8 + c(9)*x.^9 + c(10)*x.^10;
>> cfit = nlinfit(xdata,ydata,f,c)
all the data that I have are the years from 1982 till 1991 and the corresponding demand in each year.
I didn't understand nlinfit quite well,, what I am supposed to put in place of xdata, ydata and c.
Any help will be appreciated.

Accepted Answer

Image Analyst
Image Analyst on 19 Dec 2011
Why do you call that non-linear regression? It's just a regular polynomial and it's linear in the coefficients, c. You don't have c(6)^2 or log(c(5)) or anything non-linear like that. It's just c(#) to the first power multiplied by the x to some power. Because your x are non-linear does not make it non-linear regression. All your c's are linear so it's linear regression. So you can simply use polyfit() and simplify your life.
  9 Comments
Walter Roberson
Walter Roberson on 20 Dec 2011
Hmmm, that's probably provable, too -- though it could plausibly be the case that if the zeros were carefully positioned at decreasing intervals that at least one measure of the swing might decrease.... Yes, indeed, I have just constructed a sequence whose zeros do not change, but whose maximums swing less and less as the length of the sequence increases. Certainly, though, in my first trial series the maximums increased distinctly as the length of the sequence increased.
Ho Nam Ernest Yim
Ho Nam Ernest Yim on 4 Apr 2018
Can I know are there any other methods I can use also to compare the performances among methods. I used nlinfit and lsqcurvefit, I looked up and found fitnlm and lsqnonlin are same as the about methods. And I have looked into different methods such as ridge , robust , polyfit but none of them fit the case that lsqcurvefit is considering : as in lsqcurvefit(fun,x0,xdata,ydata) *nonlinear case Please help me =( , I have been looking at it for a while

Sign in to comment.

More Answers (4)

Richard Willey
Richard Willey on 19 Dec 2011
I'd strongly suggestion that you watch a webinar titled "Electricity Load and Price Forecasting with MATLAB". The webinar is available at: http://www.mathworks.com/company/events/webinars/wbnr51423.html
All of the code and the data sets are available on MATLAB Central.
This webinar shows two different ways to model the demand for electric power. The first is based on a neural network. The second uses bagged decision trees. The code also includes safeguards to protect against overfitting.
I'm also going to point you at a blog posting that I wrote on data driven fitting. If you are primarily worried about interpolation you might find this a useful alternative to high order polynomials
  1 Comment
Yasmin Tamimi
Yasmin Tamimi on 19 Dec 2011
Thnx a lot Richard for the webinar, but I have to use GA instead of NN and long-term forecasting instead of short.

Sign in to comment.


Greg Heath
Greg Heath on 19 Dec 2011
You definitely do not want a high order polynomial for prediction.
Check out Richard's references.
Greg

Yasmin Tamimi
Yasmin Tamimi on 19 Dec 2011
Actually the prediction went wrong, it seems like I am having an error during the running of GA!! so I minimized the order till 2 and I still have the same error!! here's my code:
FIRST M-FILE:
format long e
f = @(c,x) c(1)*x^2 + c(2)*x^1 + c(3)*x^0;
% I have 20 data points for both the years and the load but I should use the first 10 to calculate the coefficients and the other 10 should be predicted using ga:
years = [1982 1983 1984 1985 1986 1987 1988 1989 1990 1991];
load = [1702 2344 2097 2313 2588 2885 4341 4779 5251 5721];
estimated_coefficients = polyfit(years,load,2);
% c1 = 4.165909091390513e+001;
% c2 = -1.650490772918446e+005;
% c3 = 1.634786853371606e+008;
SECOND M-FILE:
%% The objective function function y = load_forecast(x)
c1 = 4.165909091390513e+001;
c2 = -1.650490772918446e+005;
c3 = 1.634786853371606e+008;
y = c1*x(1)^2 + c2*x(2)^1 + c3;
THIRD M-FILE:
FitnessFcn=@load_forecast;
GenomeLength = 2; % Number of variables in the fitness function
LB = zeros(1,2); % Lower bound
UB = ones(1,2); % Upper bound
Bound = [LB;UB];
% options structure
options = gaoptimset('Vectorized','on','PopulationType','bitstring','CreationFcn',@int_pop,'MutationFcn',{@mutationuniform,0.04},... 'CrossoverFcn',{@crossoverscattered,0.8}, 'PopInitRange' ,Bound, 'Display','Iter','StallGenL',100,'Generations',150, ... 'PopulationSize',50);
[X,FVAL] =ga(@load_forecast,2,[],[],[],[],LB,UB,[],options);
AND THE ERROR THAT I GET IS:
??? Reference to non-existent field 'Verbosity'. Error in ==> gacommon at 79 [Iterate.x,Aineq,bineq,Aeq,beq,lb,ub,msg,exitFlag] = ... Error in ==> ga at 269 [x,fval,exitFlag,output,population,scores,FitnessFcn,nvars,Aineq,bineq,Aeq,beq,lb,ub, ... Error in ==> ga_load_forecast at 27 [X,FVAL] =ga(@load_forecast,2,[],[],[],[],LB,UB,[],options);
FINAL QUESTION: the data that I have shouldn't it be incorporated
within the fitness function or GA in any way or another??
really any help is appreciated..
  12 Comments
Yasmin Tamimi
Yasmin Tamimi on 23 Dec 2011
here is the link to the paper i was trying to simulate part of their findings (2nd order poly. using their fitness function):
but i used the data from another paper bcz i didn't have access to reference [6] where they got the data from:
http://www.waset.org/journals/waset/v6/v6-32.pdf
i hope this will help you understand what i was doing.
Yasmin Tamimi
Yasmin Tamimi on 23 Dec 2011
sorry i forgot to post my ff, here it is:
function y = load_forecast(x)
k = 0.0001; % k is a scaling constant
Actual_Load1 = [1702 2344 2097 2313 2588 2885 4341 4779 5251 5721];
T = [1982 1983 1984 1985 1986 1987 1988 1989 1990 1991]; % Years
t = 10; % number of years
% Here x is considered as the coefficient of the second order polynomial equal,
% and we want to find their optimal values such that the error(residual) is minimized
sum = 0;
for i = 1:t
sum_residual = abs(((x(1)*(T(i))^2) + (x(2)*T(i)) + x(3))- Actual_Load1(i));
sum = sum_residual;
end
y = 1 + (k * sum_residual);

Sign in to comment.


Richard Willey
Richard Willey on 20 Dec 2011
For what its worth, I just took a very quick look at the data set that you provided.
years = [1982 1983 1984 1985 1986 1987 1988 1989 1990 1991];
load = [1702 2344 2097 2313 2588 2885 4341 4779 5251 5721];
You can fit the years 1988 --> 1991 with an almost perfectly straight line. In a similar fashion, the years 1984 --> 1987 with another straight line. In both cases the R^2 is over .995.
I really don't understand that approach that you're taking... I feel like you're trying to force Genetic Algorithms into the solution space regardless of whether this is warranted.
Given that you're primarily interested in using GA, there's one last resource that I'd recommend looking at:
The "Global Optimization with MATLAB Products" provides a very good introduction to GA. You can watch the webinar at: http://www.mathworks.com/company/events/webinars/wbnr43346.html?seq=1
All of the code is available for download from MATLAB Central.
  2 Comments
Yasmin Tamimi
Yasmin Tamimi on 20 Dec 2011
Thnx a lot for the webinar. I am from the part of estimating the coefficients and building my model.
My only problem now is in writing the fitness function for the ga!!
Alex
Alex on 25 Sep 2012
Yasmine, can you solve this? What do you do?

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!