Regression analysis in Matlab

How can I fit a model to predict a response variable(y) for a set of regressor variables(i.e. x1, x2, x3, x4, x5, x6). Probably the model may or may not be linear one. The 'sample' of simulation data are:
x1=[263,268,273,278,283,288,293,298,303,308,313,318,263,268,273,278,283,288,293];
x2=[323,333,343,353,363,373,343,423,433,473,323,443,463,493,353,363,383,403,453];
x3[10,20,50,40,20,10,30,40,50,40,30,20,20,10,20,30,40,40,20];
x4[0.83,0.88,0.77,0.83,0.84,0.87,0.71,0.84,0.63,0.69,0.83,0.50,0.88,0.83,0.97,0.83,0.96,0.83,0.78];
x5[0.00101325,1.01325,0.000101325,0.101325,1.01325,0.000101325,0.101325,0.0101325,0.000101325,0.101325,0.0101325,0.000101325,0.101325,0.0101325,0.000101325,0.00101325,0.101325,1.01325,1.01325];
x6[0.05,0.06,0.06,0.07,0.08,0.07,0.09,0.1,0.06,0.05,0.04,0.08,0.09,0.1,0.07,0.06,0.06,0.08,0.05];
y=[257.98,262.99,268.05,273.17,278.35,283.59,288.9,294.29,299.75,305.3,310.93,316.64,258.22,263.23,268.29,273.4,278.58,283.82,289.12];
Please advice me.....
T. Aseri

1 Comment

If have Statistics Toolbox, see
doc regress
W/O,
doc slash % NB: the backslash operator '\'

Sign in to comment.

Answers (2)

dpb
dpb on 28 Dec 2013
Edited: dpb on 29 Dec 2013
Now having Matlab open and convenient, to amplify on the above...
Stat Toolbox ...
>> b1=regress(y',[x1' x2' x3' x4' x5' x6'])'
b1 =
1.0102 -0.0005 -0.0090 -6.8343 -0.2722 -13.6140
Base Matlab backslash operator...
>> b2=[[x1' x2' x3' x4' x5' x6']\y']'
b2 =
1.0102 -0.0005 -0.0090 -6.8343 -0.2722 -13.6140
>>
Remarkable similarity, wot? :)
Now, as you might expect, the Toolbox solution has some more interesting outputs...
>> [b,bint,r]=regress(y',[x1' x2' x3' x4' x5' x6']);
>> [b bint]
ans =
1.0102 0.9968 1.0237
-0.0005 -0.0085 0.0075
-0.0090 -0.0404 0.0223
-6.8343 -9.6170 -4.0516
-0.2722 -1.2170 0.6726
-13.6140 -37.7253 10.4972
>> sqrt(sum(r.*r)/length(r))
ans =
0.6206
>> [b,bint,r]=regress(y',[x1' x2' x4']);
>> [b bint]
ans =
b =
1.0095 0.9980 1.0210
-0.0024 -0.0091 0.0043
-7.2257 -9.7197 -4.7316
>> sqrt(sum(r.*r)/length(r))
ans =
0.6663
>> [b,bint,r]=regress(y',[x1' x4']);
>> sqrt(sum(r.*r)/length(r))
ans =
0.6786
>>
Looking at the intervals on the estimated coefficients, only a few of the variables are significant and a much more parsimonious model is possible w/ essentially same SSe as with blindly including all six.
Your mission, should you choose to accept it, is to complete the analysis and judiciously choose the overall best model. I have not considered or looked at any interaction terms you'll note.
ADDENDUM:
Oversight--the above doesn't include the intercept term. Write the model as
b1=regress(y',[ones(size(x1')) x1' x2' x3' x4' x5' x6'])'
or similarly to include it.

4 Comments

Dear dpb, I am really grateful to you, I will check with your suggestion and get back to you soon. Thank you for your support.
T. Aseri
BTW, if you do have the Statistics Toolbox, look at
doc regstats
that does much of the work of computing the ancillary statistics needed.
I do wish TMW would take the last step of providing a nicely formatted table as an option a la SAS or their ilk.
OBTW, NB: I neglected to included an intercept term in the preceding -- see the ADDENDUM to the previous answer. regstats handles this automagically but regress or the backslash operator need the model coded explicitly.
Yes I do have statistics tool box and I am working on it. I need to first learn it then I am able to choose best fitted model with minimum regressor via performing all need tests. Thank you for your precious support, I'll be in touch with you.
Here is the problem, I've entered all data in column format with equal no. of rows (6696):

Sign in to comment.

dpb
dpb on 1 Jan 2014
Edited: dpb on 1 Jan 2014
NB: you created a Matlab dataset object Datas (BTW, altho it doesn't matter to Matlab what a variable name is, "data" are plural from the Latin, the singular is a "datum" point--common US English use has corrupted this terribly) so you must reference the values by the use of the dot to reference the various variables.
Use
Datas.Properties.VarNames
to see the variable names in the Datas object; then you get the actual data by using
Datas.VarName
where "VarName" is the name for the particular variable. Assuming the Excel sheet has headings of the names you've used above, something like
X=[ones(length(Datas),1) Datas.Ta Datas.Tabs ... Datas.eabs];
would appear to be correct. If there are no headers, then the default variable name 'Var1' would have been assigned and it will be an array in which it's somewhat simpler to reference --
b=regress(Datas.Var1(:,7), [ones(length(Datas),1) Datas.Var1(:,1:6)]);
Again, note that you must specify the constant term in the model explicitly with regress
Since you say you have the Statistics Toolbox, I recommend reverting to regstats to get the additional statistics you'll want/need to evaluate the quality of the model directly.
See
doc dataset % and related for details on using the dataset object
Alternatively, of course, you could use one of the other methods of reading in the file ( xlsread comes to mind) and return the data into a base Matlab array which would obviate all the dataset stuff which may not be of much real use for your present purposes.

Categories

Products

Asked:

on 28 Dec 2013

Edited:

dpb
on 1 Jan 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!