Regression analysis for two stocks

12 views (last 30 days)
harsh
harsh on 10 Jun 2012
All,
I am trying to separate data into a "training set" and a "train set" for a co-integration analysis of two stocks. GDL and GDX. I need to attach a function called "ols" for linear regression. I grabbed that code off of spatial-econometrics.com. It's the "Regress" folder, once the file is downloaded. I have attached the main code that I am using to do the analysis, and the ols code that needs to be used for the linear regression, yet I am having a hard time figuring out how to have the analysis talk with the ols to make the code work.
The first code is the main part, the analysis.
>> % written by:
% Ernest Chan
%
% Author of “Quantitative Trading:
% How to Start Your Own Algorithmic Trading Business”
%
% ernest@epchan.com
% www.epchan.com
clear; % make sure previously defined variables are erased.
[num, txt]=xlsread('GLD'); % read a spreadsheet named "GLD.xls" into MATLAB.
tday1=txt(2:end, 1); % the first column (starting from the second row) is the trading days in format mm/dd/yyyy.
tday1=datestr(datenum(tday1, 'mm/dd/yyyy'), 'yyyymmdd'); % convert the format into yyyymmdd.
tday1=str2double(cellstr(tday1)); % convert the date strings first into cell arrays and then into numeric format.
adjcls1=num(:, end); % the last column contains the adjusted close prices.
[num, txt]=xlsread('GDX'); % read a spreadsheet named "GDX.xls" into MATLAB.
tday2=txt(2:end, 1); % the first column (starting from the second row) is the trading days in format mm/dd/yyyy.
tday2=datestr(datenum(tday2, 'mm/dd/yyyy'), 'yyyymmdd'); % convert the format into yyyymmdd.
tday2=str2double(cellstr(tday2)); % convert the date strings first into cell arrays and then into numeric format.
adjcls2=num(:, end); % the last column contains the adjusted close prices.
[tday, idx1, idx2]=intersect(tday1, tday2); % find the intersection of the two data sets, and sort them in ascending order
cl1=adjcls1(idx1);
cl2=adjcls2(idx2);
trainset=1:252; % define indices for training set
testset=trainset(end)+1:length(tday); % define indices for test set
% determines the hedge ratio on the trainset
results=ols(cl1(trainset), cl2(trainset)); % use regression function
hedgeRatio=results.beta;
spread=cl1-hedgeRatio*cl2; % spread = GLD - hedgeRatio*GDX
plot(spread(trainset));
figure;
plot(spread(testset));
figure;
spreadMean=mean(spread(trainset)); % mean of spread on trainset
spreadStd=std(spread(trainset)); % standard deviation of spread on trainset
zscore=(spread - spreadMean)./spreadStd; % z-score of spread
longs=zscore<=-2; % buy spread when its value drops below 2 standard deviations.
shorts=zscore>=2; % short spread when its value rises above 2 standard deviations.
exits=abs(zscore)<=1; % exit any spread position when its value is within 1 standard deviation of its mean.
positions=NaN(length(tday), 2); % initialize positions array
positions(shorts, :)=repmat([-1 1], [length(find(shorts)) 1]); % long entries
positions(longs, :)=repmat([1 -1], [length(find(longs)) 1]); % short entries
positions(exits, :)=zeros(length(find(exits)), 2); % exit positions
positions=fillMissingData(positions); % ensure existing positions are carried forward unless there is an exit signal
cl=[cl1 cl2]; % combine the 2 price series
dailyret=(cl - lag1(cl))./lag1(cl);
pnl=sum(lag1(positions).*dailyret, 2);
sharpeTrainset=sqrt(252)*mean(pnl(trainset(2:end)))./std(pnl(trainset(2:end))) % the Sharpe ratio on the training set should be about 2.3
sharpeTestset=sqrt(252)*mean(pnl(testset))./std(pnl(testset)) % the Sharpe ratio on the test set should be about 1.5
plot(cumsum(pnl(testset)));
save example3_6_positions positions; % save positions file for checking look-ahead bias.
??? Undefined function or method 'tdis_inv' for input arguments of type 'double'.
Error in ==> ols at 64
tcrit=-tdis_inv(.025,nobs);
This second code is the ols for the linear regression.
function results=ols(y,x)
% PURPOSE: least-squares regression
%---------------------------------------------------
% USAGE: results = ols(y,x)
% where: y = dependent variable vector (nobs x 1)
% x = independent variables matrix (nobs x nvar)
%---------------------------------------------------
% RETURNS: a structure
% results.meth = 'ols'
% results.beta = bhat (nvar x 1)
% results.tstat = t-stats (nvar x 1)
% results.bstd = std deviations for bhat (nvar x 1)
% results.yhat = yhat (nobs x 1)
% results.resid = residuals (nobs x 1)
% results.sige = e'*e/(n-k) scalar
% results.rsqr = rsquared scalar
% results.rbar = rbar-squared scalar
% results.dw = Durbin-Watson Statistic
% results.nobs = nobs
% results.nvar = nvars
% results.y = y data vector (nobs x 1)
% results.bint = (nvar x2 ) vector with 95% confidence intervals on beta
%---------------------------------------------------
% SEE ALSO: prt(results), plt(results)
%---------------------------------------------------
% written by:
% James P. LeSage, Dept of Economics
% University of Toledo
% 2801 W. Bancroft St,
% Toledo, OH 43606
% jlesage@spatial-econometrics.com
%
% Barry Dillon (CICG Equity)
% added the 95% confidence intervals on bhat
if (nargin ~= 2); error('Wrong # of arguments to ols');
else
[nobs nvar] = size(x); [nobs2 junk] = size(y);
if (nobs ~= nobs2); error('x and y must have same # obs in ols');
end;
end;
results.meth = 'ols';
results.y = y;
results.nobs = nobs;
results.nvar = nvar;
if nobs < 10000
[q r] = qr(x,0);
xpxi = (r'*r)\eye(nvar);
else % use Cholesky for very large problems
xpxi = (x'*x)\eye(nvar);
end;
results.beta = xpxi*(x'*y);
results.yhat = x*results.beta;
results.resid = y - results.yhat;
sigu = results.resid'*results.resid;
results.sige = sigu/(nobs-nvar);
tmp = (results.sige)*(diag(xpxi));
sigb=sqrt(tmp);
results.bstd = sigb;
tcrit=-tdis_inv(.025,nobs);
results.bint=[results.beta-tcrit.*sigb, results.beta+tcrit.*sigb];
results.tstat = results.beta./(sqrt(tmp));
ym = y - mean(y);
rsqr1 = sigu;
rsqr2 = ym'*ym;
results.rsqr = 1.0 - rsqr1/rsqr2; % r-squared
rsqr1 = rsqr1/(nobs-nvar);
rsqr2 = rsqr2/(nobs-1.0);
if rsqr2 ~= 0
results.rbar = 1 - (rsqr1/rsqr2); % rbar-squared
else
results.rbar = results.rsqr;
end;
ediff = results.resid(2:nobs) - results.resid(1:nobs-1);
results.dw = (ediff'*ediff)/sigu; % durbin-watson
I am very new to Matlab. I am pretty sure I need to modify the "If" statement to fit the main analysis but am having a hard time figuring out the proper language. After 3 days of trying, I figured I try the forums.
  1 Comment
Sargondjani
Sargondjani on 11 Jun 2012
i think most ppl have a problem understanding what you are trying to do... the code is long and not easy to read and these sentence are hard to understand:
"I am having a hard time figuring out how to have the analysis talk with the ols to make the code work."
"I need to modify the "If" statement to fit the main analysis'
so pls give a more precise summary of the procedure (just crucial lines of code) and the problem with that procedure

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!