extrapolation.: Negative values obtained

26 views (last 30 days)
tzaloupas
tzaloupas on 20 Feb 2013
Dear all,
I have the following bimonthly data on prices
clear all
A={ 'DJ 2009' [ 0] [ 0] [ 0] [ 0]
'FM 2009' [ 0] [ 0] [ 0] [ 0]
'AM 2009' [ 0] [ 0] [ 0] [ 0]
'JJ 2009' [ 0] [ 0] [ 0] [ 0]
'AS 2009' [ 0] [ 0] [ 0] [ 0]
'ON 2009' [ 0] [ 0] [ 0] [ 0]
'DJ 2010' [ 0] [ 0] [ 0] [ 0]
'FM 2010' [ 0] [ 0] [ 0] [ 0]
'AM 2010' [ 0] [ 0] [ 0] [ 0]
'JJ 2010' [ 0] [ 0] [ 0] [ 0]
'AS 2010' [ 0] [ 0] [ 0] [ 0]
'ON 2010' [ 0] [ 0] [ 0] [ 0]
'DJ 2011' [ 0] [ 0] [ 0] [ 0]
'FM 2011' [ 0] [ 0] [ 0] [ 0]
'AM 2011' [ 0] [ 0] [ 0] [ 0]
'JJ 2011' [8.8618] [12.6597] [6.5630] [5.7126]
'AS 2011' [8.8365] [12.6236] [4.7730] [3.6514]
'ON 2011' [8.2443] [11.7776] [0.7600] [0.7436] };
I need to convert these data to monthly.
So, I change the format of the date column to
datenew={'12/2008'
'1/2009'
'2/2009'
'3/2009'
'4/2009'
'5/2009'
'6/2009'
'7/2009'
'8/2009'
'9/2009'
'10/2009'
'11/2009'
'12/2009'
'1/2010'
'2/2010'
'3/2010'
'4/2010'
'5/2010'
'6/2010'
'7/2010'
'8/2010'
'9/2010'
'10/2010'
'11/2010'
'12/2010'
'1/2011'
'2/2011'
'3/2011'
'4/2011'
'5/2011'
'6/2011'
'7/2011'
'8/2011'
'9/2011'
'10/2011'
'11/2011'};
and I linearly interpolate the in between values using the following code
kk=[];
xi = datenum(datenew, 'mm/yyyy');
values=A(:,2:4);
for c = 1:size(values,2)
k = interp1(xi(1:2:end),cell2mat(values(:,c)),xi);
kk=[kk k];
end
And I obtain
kk =
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
4.3583 6.2261 3.2277
8.8618 12.6597 6.5630
8.8494 12.6419 5.6827
8.8365 12.6236 4.7730
8.5355 12.1937 2.7336
8.2443 11.7776 0.7600
NaN NaN NaN
I need to extrapolate the NaN values in the last row. So , I wrote this command
ll=size(xi,1);
mm=[];
for c = 1:size(kk,2)
idxi= ~isnan(kk(:,c));
lk = interp1(xi(idxi), kk(idxi,c),xi,'linear','extrap');
mm=[mm lk];
end
where
mm =
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
4.3583 6.2261 3.2277
8.8618 12.6597 6.5630
8.8494 12.6419 5.6827
8.8365 12.6236 4.7730
8.5355 12.1937 2.7336
8.2443 11.7776 0.7600
7.9433 11.3477 -1.2794
My problem is that during this extrapolation some prices are negative ( -1.2794)
How could I rectify this problem? One option is to delete the last row which I do not want to do that as I loose an observation.
Any code provided is most welcome.
Thanks

Answers (3)

Babak
Babak on 20 Feb 2013
Edited: Babak on 20 Feb 2013
The reason that the last data point is calculated negative is that, the last column is decreasing and if you take the last 4-5 points you still get negative extrapolated result. you need to include the 3.2277 point and even the data before it in your extrapolation. It still does not guarantee to give you a right result. I would do it myself, not using the interp func. just take the last 6 data and fit a 5th order polynomial to them and find the next point. in general extrapolation gets the last kth data and fits a kth order polynomial to find the (k+1)st point. here's for example for the last column:
y1 = 3.2277;
y2 = 6.5630;
y3 = 5.6827;
y4 = 4.7730;
y5 = 2.7336;
y6 = 0.7600;
y7 unknown
since spacing is even,
x1 = 0; x2 = 1; x3 = 2; x4 = 3; x5 = 4; x6 = 5; x7 = 6;
now given the six points of (xi,yi), i=1,2,..,6 find a 5th order polynomial and compute the value fo the function at x7 which gives you y7. I use polyfit to do this:
x = [x1 x2 x3 x4 x5 x6];
y = [y1 y2 y3 y4 y5 y6]
p = polyfit(x,y,5);
y7 = polyval(p,x7); % result is 9.9258
  2 Comments
tzaloupas
tzaloupas on 21 Feb 2013
IT seems that the approach of polyfit does not always work as I again got negative values
[ 0.0039]
[ 0.0025]
[ 0.0011]
[-2.7705e-004]
Is there any way to avoid having negative values?

Sign in to comment.


Babak
Babak on 21 Feb 2013
Here's the whole concept of polynomial curve fitting:
A first order polynomial is a line, like y=a*x+b which has only 2 parameters a, b. these two parameters can be found when the line goes through 2 points. So, if 2 points are given, you can run a line through those two points and find a, b.
A 2nd order polynomial on the other hand is like y=a*x^2+b*x+c which has 3 parameters of a, b, c. Theses 3 parameters can be found by 3 points, (x1,y1), (x2,y2), (x3,y3).
An nth order polynomial has n+1 coefficients (or constant parameters) that can be found by n+1 number of poins that the polynomial is supposed to go through.
For your problem, you can run any order polynomial. But, usually the one of a higher order is more accurate. Your data of the last column are like this:
3.2277
6.5630
5.6827
4.7730
2.7336
0.7600
xxxxxx
after the 2nd point, the data is decreasing from 6.4530 to 0.7600. So the last 5 points, and fit a polynomial through those points (6.4530 to 0.7600) your polynomial will be such that if you strapolate through it, your last data will be also smaller than 0.7600 because all the data is decreasing. this is why I am suggesting to get all the 6 points of 3.2277 to 0.7600 to fit the polynomial. It is because the first element 3.2277 is smaller than 6.4530 and therefore you will get a more accurate polyfit and then the extrapolated data will be more accurate. If you have more points before 3.2277 and fit higher order polynomials to that, you can get better result.
a first order polynomial (a line) does not give you the right answer.

Sean de Wolski
Sean de Wolski on 21 Feb 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!