Which regression method may give better R^2 values on this scattered data?

1 view (last 30 days)
I need to use linear regression to estimate a linear relation.
The data is a matrix of 22x5 elements, related to 22 samples and 5 parameters / features for each of the samples, and I need to estimate the coefficients a1 to a5 in this equation:
X * A = Y
(X is 22*5; A is 5*1; Y is 22*1) and y values are the experimental / target value.
I have done the job using fmincon function in MATLAB:
% constants used in the program
Number_of_Columns = 5;
Number_of_Samples = 22;
% contains the matrix 22*5, as well as the 6th column as the target
saved_file = 'bank_22.mat';
load(saved_file);
X = zeros(Number_of_Samples , Number_of_Columns);
Y = zeros(Number_of_Samples , 1);
for sample = 1 : Number_of_Samples
for col = 1 : Number_of_Columns
X(sample , col) = bank_22{sample , col}; % Matrix of unknowns
Y(sample , 1) = bank_22{sample , 6}; % Target vector (i.e. Experimental values)
end
end
tol1 = 1e-12;
options = optimset('TolFun' , tol1 , 'TolCon' , tol1 , 'TolX' , tol1 , ...
'MaxIter' , 1e3 , 'MaxFunEvals' , 1e6);
% x = fmincon(fun , x0 , A , b , Aeq , beq , lb , ub , nonlcon , options)
[A , fval] = fmincon(@(A)projMin(A , X , Y) , zeros(Number_of_Columns , 1) , ...
[] , [] , [] , [] , zeros(Number_of_Columns , 1) , ...
ones(Number_of_Columns , 1) , [] , options);
(where projMin is a user-defined function to minimized the difference between X*A and Y by norm-1; and the constraint on the values of a1 to a5 coefficients is that they need to be between zero to 1).
After getting the coefficients, I cross-validated the results by using the obtained values for coefficients a1 to a5 and predicting the values for the vector Y.
The max error and standard deviation are fine, but the calculated R-squared is low. So I wonder which other method may be a reasonable option to find this linear relation.
I appreciate your helpful comments.
P.S. For reference, see the result of cross-validation, i.e. predicted vs. target values.
Update: Sorry for confusion! In the following scatterplot, the spatial position of the small circles themselves shows where they are located, while the line shown on the graph is for the ideal linear relation, with R^2 = 1.0.

Accepted Answer

Sean de Wolski
Sean de Wolski on 17 Jul 2015
Edited: Sean de Wolski on 17 Jul 2015
Have you tried using one of the Machine Learning Regression techniques on it?
doc fitlm
doc stepwiselm
doc fitrtree
doc fitensemble
  3 Comments
Trevos16
Trevos16 on 17 Jul 2015
I just made an update on the question, and added the last paragraph to clarify the cross-validation graph.
Sean de Wolski
Sean de Wolski on 20 Jul 2015
fitlm is a multilinear regression, and stepwiselm is a multilinear regression that tries to remove predictors that don't add value.
The regression trees are not linear regressions but are tools used to fitting data that are sometimes superior to linear regression when linear regression is not well posed for that data set.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!