File Exchange

image thumbnail

Support Vector Regression

version 1.3 (3.45 KB) by

A MATLAB implementation of Support Vector Regression (SVR).

3.28571
10 Ratings

73 Downloads

Updated

View License

Support Vector Regression is a powerful function approximation technique based on statistical learning theory. The method is extremely robust and provides excellent generalization performance while still being able to capture complex relationships in the input data.
This implementation uses the optimization toolbox's sqp solver to minimize the e-insensitive empirical risk functional and regularization term to find the support vectors and their weights.
The purpose of the submission is to provide a "bottom-up" implementation that demonstrates how Support Vector Regression can be implemented in the MATLAB language and to allow the user to experiment with different kernel functions and optimization training algorithms at low-level.

TODO:
- Add more kernel functions
- Make the training independent of the optimization toolbox

Comments and Ratings (63)

Charles

Exciting as I am new to SVM and wish to try and apply to Forex markets. I see there is plenty of literature out there so thank you for the code.

NISHA K G

Ronald Clark

Hi Ruocheng,

I am aware of this issue but haven't had the time to update the files.

I'll update the code soon with some new features too!

As of 2015 Matlab does have a standard implementation with documentation available here https://www.mathworks.com/help/stats/support-vector-machine-regression.html which you can use.

Regrads,
Ronnie

Ruocheng Guo

Hi Ronnie,

Your code is not producing correct value for alpha as you don't have a constraint to force z[1:ntrain] = z[ntrain+1:2*ntrain] for alpha2 == 0 or z[1:ntrain] = -z[ntrain+1:2*ntrain] for alpha1 == 0. So you need to add a constraint for this.

And the second part of upper bound should be c*ones(ntrain,1) instead of 2*c*ones(ntrain,1) as it is either alpha1 or alpha2, should be in range from 0 to c.

Ronald Clark

Hi Nico M, the main difference is that my script is completely self contained (easy to learn from and adapt for research purposes).

Nico M

Nico M (view profile)

Dear Ronnie Clark,

Thanks for sharing.
Because I've just started assessing what MATLAB files are available on SVM, I was wondering: What are the differences with the build in MATLAB function 'CompactRegressionSVM'?

Thanks in advance,
Nico

Su Wutyi Hnin

Dear Ronnie Clark,
Thank you for new optimization script.
i would like to ask one thing.
After we optimize the three parameters , i don't know how to set up that parameters in PREDICT function.
Or it has another function?

Su Wutyi Hnin

Dear Ronnie Clark,
Thank you for new optimization script.
i would like to ask one thing.
After we optimize the three parameters , i don't know how to set up that parameters in PREDICT function.
Or it has another function?

Justin Igwe

Hello @Ronald, your code seems helpful because its predictions follows exactly same trend as my target trend. Unfortunately, the predicted values are all around 20% less than the actual values, which makes MSE in prediction to be above 3500. How can i correct this error? Thanks

Ben Tarrahi

Hi Tom, try making the target vector (y) zero mean, I tested the same data and prediction is almost perfect.

Tom

Tom (view profile)

I used the data (x=[1 2 3 4 5 6 7 8 9 10]', y=[1 2 3 4 5 6 7 8 9 10]'), x is for both training and predicting. The predicted result is [-1.7568 -0.7587 0.2414 1.2389 2.2402 3.2354 4.2407 5.2294 6.2448 7.2202]
Why the predicted result is always about 2.7 lower then the actual value? Any idea?

Su Wutyi Hnin

i would like to know matlab code example for hybrid genetic algorithm and support vector regression

CHlin

CHlin (view profile)

Hi Ronald Clark, Please could you advise how to use my data in training the SVM. My data is reliability data which is function of time
for example
R(t1)=.9,R(t2)=.89, . . . . R(t30)=.97
and t1=.4, t2=.7, t3=1.5,.......t30=9
so I was woundring what the input should be to x and y. could be R(ti) for x and ti for y?
thank you very much

Ronald Clark

Hi yzc233, do you have the optimization toolbox?

Hi ats, the easiest way to find the parameters is to do a grid search over a range of values.

ats

ats (view profile)

Thanks for your awesome work.
I am wondering, how could we specify the optimal value for Cost and Gamma?

yzc233

yzc233 (view profile)

Undefined function 'optimoptions' for input arguments of type 'char'.?

Hi Clark,
Thanks for the wonderful code, I would like to know how you analyse the figures that are produced by the code and what is the best way of crossvalidation to choose the best C, and epsilon.
thanks
Patrick

Ronald Clark

The code as you gave it works for me:

xdata =[198 187 184 178 166 150 144 145 181 ];
ydata =183;
c=400 ;
epsilon=0.00000025
kernel='gaussian'
varargin=0.5

-----------------

svrobj = svr_trainer(xdata,ydata,400,0.000000025,'gaussian',0.5);

Please try the command as above ^ and let me know what happens. Also, check that you have not accidentally made xdata or ydata a cell array.

You would obviously need more than one training point to give useful predictions!

Tak120

Tak120 (view profile)

Hello Clark,
Thanks for sharing this file,
i tested this code for time series prediction, so i put:
xdata =[198 187 184 178 166 150 144 145 181 ];
ydata =183;
c=400
epsilon=0.00000025
kernel='gaussian'
varargin=0.5

but i have this : Undefined operator '-' for input arguments of type 'cell'.

Error in svr_trainer>@(x,y)exp(-lambda*(norm(x.feature-y.feature,2)^2)) (line 26)
        kernel_function = @(x,y) exp(- lambda * (norm(x.feature-y.feature,2)^2));

Error in svr_trainer (line 54)
    M = arrayfun(kernel_function,xi,xj);

Can you please help me to solve this.
Thank you!

Ronnie Clark

Ronnie Clark (view profile)

Hi Purushottam

I think it is a problem with your input data (hence varargin).

Could you give an example of the input your are supplying?

hi,thanks for sharing this file.
i am getting error in optimoptions and varargin. what is solution to solve this type of error?

Hi,thanks for sharing this file, do you have any suggestion on how to modify your code to make it able to take multiple inputs features? Thanks!

Hi, thanks for sharing this file
I am working in Activity Recognition project, and I would like to use your file to do that is it possible?

Also, can I enter my train and test data of accelerometer files into your code to classify the activities?

thanks and regards
Fatimah mohammed

Said Hassan

Hi Clark,
Thanks for sharing the file.
I'd like to know if this product can be used in predicting concentrations from absorbance data ??
my e-mail: said.hassan@pharma.cu.edu.eg

Aladdin

Dear Ronnie Clark,

Thank you so much for the shared function for svm (svr_trainer). I find it very helpful. However, I have two queries about it:

1/ Does the function consider the complex numbers ?
2/ I have the following data;
    x_train: 2x1 (row vector)
    y_train: 2x72 (Matrix with 2 culumns and 72 lines)
how can I input this data ? the function keeps telling me (Dimensions of matrices being concatenated are not consistent.)

Please I will highly appreciate your answer since this is related to my PhD defense and I really need to finish it as soon as possible.

Thank you in advance, Looking forward to hearing from you.

My Email: adjouama@outlook.com

Best.

Ronald Clark

Hi Atiyo, yes you can have a multi dimensional output but it won't model dependencies between the individual output variables and the input i.e it will treat the output as a single element.
----------
Hi Royi, you can just use a linear kernel, however you need to have some means of evaluating similarity so you do need some kernel.

Royi Avital

Hi Ronnie,
Simple question, what if I don't want to apply any Kernel on the data (Or chose the identity function as the Kernel)?

It seems I have to select a Kernel in your code.

Is there an issue with having no Kernel?
Is there no sense in going Kernel free?

Thank You.

Dear Ronnie Clark,
Thanks for your file exchange. I would like to ask you if the code is suitable for multi-output regression? If not, then how can I adapt it to a MIMO situation?

Hui Song

Dear Ronnie Clark, thanks for your file exchange
I want to know which are the output of the training and the prediction, respectively?
Thanks.

Dear Ronnie Clark, thanks for your file exchange

I have been experiencing a problem with prediction function,,when i am predicted output from the training input it will give great results,, but when i use other data rather than training input it will give same output for every point which is wrong prediction. can you help me out on this???

Thiago

Thiago (view profile)

Dear Ronnie Clark,
thanks for sharing with us your code. I see you have been maintaining this for a while now. I have been experiencing a strange behavior: every fit I make with your function gives me an additional shift. You can generate a very simple dataset (1d straight line for instance) and when you plot the prediction against the original data you'll see this (vertical) shift. My guess is that b is either ill-defined or being lost somewhere. Sorry I couldn't debug further.

If you can check this, it'd be fantastic!
Th

julienD

Hi,
Thank you for the file, How you choose epsilon and c parameter ?

amelite

I have the same question about the f/2 at the end of the file. The predicted result seems half or the ground truth data.

Nur

Nur (view profile)

Hi,
I have problem to define function 'optimoptions'.

Zheng

Zheng (view profile)

Thank you, OP. But be careful to use the code, there are some bugs. For, example, H=0.5%[M...]
and f=f/2. The most critical one is that after augment the variable space, there should be constrain between the variables.

mostafa

Hi
i want "stock price prediction using support vector regression"
code in matlab
can you help me ?

Charles Zhao

Hi,

I have been looking at the kernels available in your code, and the implementation of the tanh kernel puzzles me. You have it as:

prod(tanh(a*x.feature'*y.feature+c))

when it should be:

tanh(a*x.feature*y.feature'+c)

(since x.feature is a row vector)?

Have I misunderstood the nature of the kernel? Should prod only be there for spline?

Minho Xu

HI Clark
I have some question about the f/2 at the end of the file . Why do we want to take the half of the prediction ?
Can you please explain Thank you

Ronnie Clark

Ronnie Clark (view profile)

c is the cost associated with the training errors and epsilon defines when the errors start to be penalised (ie. the 'insensitive region'). There is really no hard rule to set these values - they're typically found using an exhaustive search that gives the best cross validation rates.

For more information take a look at http://www.svms.org/parameters/

Matthias

I'm new to SVR. What are c and epsilon? What are reasonable ranges (maybe relative to the range of your y)?

Ronnie Clark

Ronnie Clark (view profile)

That is strange, could you please give a small sample of the data you're using (ie. All_Train', All_target', vargargin..)?

Coo Boo

Hi
Thanks for your file exchange.
I tried the new version and I got the following warning and error message:
Warning: Cannot use sparse matrices with sqp algorithm: converting to full.
> In fmincon at 477
  In svr_trainer at 49
  In Train_FL_SVR_Reg at 40
Error using .*
Matrix dimensions must agree.
Error in svr_trainer/W (line 91)
    cost = sum(alpha.*ydata - epsilon*abs(alpha));
Error in fmincon (line 640)
      initVals.f = feval(funfcn{3},X,varargin{:});
Error in svr_trainer (line 49)
    alpha = fmincon(@W, alpha0, [],[],Aeq, beq, lb, ub,[],options);
Error in Train_FL_SVR_Reg (line 40)
net = svr_trainer(All_Train',All_Target',c,epsilon,kernel,varargin);
Caused by:
    Failure in initial user-supplied objective function evaluation. FMINCON
    cannot continue.

Thanks for your help.

Ronnie Clark

Ronnie Clark (view profile)

Hi Omari. Yes, on the surface it will work just like with ANN.

Omari

Omari (view profile)

Hi Ron, Today is my first day reading about SVR, I work in prediction of thermal stresses with ANN, now i want to work with SVR. I would like to ask you if is possible to predict thermal stresses with svr like with ANN? I work with about 15 inlets. thanks in advance Ron, regards

Everton

That's right. I misunderstood the results, sorry.
Thanks!

Ronald Clark

Hi Everton

The XOR you posted does work. I get:

[0,0] -> 8.3824e-04 (ie. 0.00083824)
[1,0] -> 0.9991
[0,1] -> 0.9991
[1,1] -> 8.3824e-04

Its not exact but very close.

Everton

Hi
is SVR supposed to work on a simple XOR example? What am I doing wrong? Thanks

epsilon=0.000025;
c=40000;
xdata = [ 0 0 ; 0 1 ; 1 0; 1 1 ]';
ydata = [ 0; 1 ; 1 ; 0 ]';
[~, alpha0,b0] = svr(xdata,ydata,[],[],[],c,epsilon);
result = svr(xdata,[], [ 0 ; 0 ], alpha0, b0,[],[]);
disp(result) % 8.3789e-04 expected: 0

Ronnie Clark

Ronnie Clark (view profile)

Hi Ion

They are (n,1) where n is the dimensionality of your data :)

Ion Vasile

Hi Ronnie!

  I want to ask you what are the dimensions fo xdata and ydata?

  Thank you

Ronnie Clark

Ronnie Clark (view profile)

Yes you are correct, this implementation uses the substitution alpha=alpha1-alpha2, abs(alpha)=alpha1+alpha2 and alpha1*alpha2=0 which makes performing the optimization easier. This formulation of the dual problem is well documented in the literature so I do believe it is correct - see for example "An Introduction to Support Vector Machines and Other Kernel-based Learning Methods (28 March 2000) by Nello Cristianini, John Shawe-Taylor", pg 116.

Secondly, yes, using a matrix formulation for the objective function will be faster - I just haven't had time to implement it.

Regards!
Ronnie

Xingxin

Your dual problem formulation doesn't seem correct. There should be two Lagrangian multipliers, alpha1 and alpha2, associated with the lower and upper bound constrains in the primal problem. The objective function for the dual problem is:
-0.5(alpha1-alpha2)'*K*(alpha1-alpha2)+(alpha1-alpha2)'y-epsilon*(alpha1+alpha2)'1
where 0<=alpha1,alpha2<=C, and (alpha1-alpha2)'1=0.

In your svr function, it seems you changed the variable alpha=alpha1-alpha2 and used abs(alpha) for alpha1+alpha2. This does not look correct.

I am also stuck using matlab optimization tools such as fmincon or quadprog to implement SVR where 2 arguments are in the objective function.

Also, I notice the objective function calculates with for-loops. This is much slower than using matrix operation.

Rahul

Rahul (view profile)

while i have used this one iam getting the"Undefined function 'svr' for input arguments of type 'double'" this type of error please anyone tell me the Sollution And i hope to send solution to 'rahulmekala@gmail.com'also..

Rahul

Rahul (view profile)

can u tell me can i use this code for fisheriris Data...
ii have tried but i could't get the result...
what i have to do now??
please tell me..
urgently..

tayari

tayari (view profile)

hello again
how you set the values ​​xdata, ydata, X1 and Y1

Ronnie Clark

Ronnie Clark (view profile)

Hi

It's the number of training vectors.

tayari

tayari (view profile)

hello
please what's the "ntrain"
thank you

Ronnie Clark

Ronnie Clark (view profile)

Ok, sunspot.dat contains yearly sunspot activity data measured by the Wolf relative sunspot number.

So, assuming you have the data x(n), x(n-1)...x(1) and what you want to forecast is x(n+1), you have to decide how you want to forecast this value ie. what data do you have that could possibly give you an accurate prediction? Now, as long-term sunspot activity is typically cyclical, a possible set of input data (xdata) to use might be a vector of past values x(n), x(n-1)...

The output (ydata) is then the desired sunspot activity forecast, x(n+1) which in this case would be a single value.

However, using only endogenous variables as the input (ie. past load values) will probably not give an accurate prediction of the amplitudes of the sunspot activity. A better idea will be to augment the input data (xdata) with exogenous variables that are correlated to the sunspot activity such as solar wind data, magnetic storm activity, flare activity or radio and X-ray emission data, with the goal of allowing the SVR to model the relationship that exists between these variables and the relative sunspot number and hopefully achieving a more accurate forecast.

There is a small example file in the submission which shows how to use the function. Here is a bit more detailed explanation of the parameters:

alpha - is weight of each support vector (generated by the function on training)
beta - is a linear constant (like an offset in the svr model - also generated on training)
c - is the cost of the 'training errors' (user parameter that must be set)
epsilon - is the magnitude of the 'insensitive' region (user parameter that must be set)

Hope that helps!

David Franco

Hello again Ronnie,

But, for example, if I have a series of past values (x(n), x(n-1), x(n-2)...). This series will be xdata or ydata? Could you please explain me the inputs arguments xdata, ydata, x, alpha, b, c, epsilon? Or maybe, help me to make a test file for time series forecasting... with sunspot.dat (it is included on Matlab) for example (I am newbie on Matlab).

Thank you very much sir!

Ronnie Clark

Ronnie Clark (view profile)

Yes sure you can! Time series forecasting doesn't necessarily need a 1D input though - you can also use
1) A number of features extracted from the time series as the input or,
2) A series of past values as the input eg. x(n), x(n-1), x(n-2)....

David Franco

Hello,

Can I use this code to make a regression with an 1D input?
For example: time series forecasting.

Thank you!

Ronnie Clark

Ronnie Clark (view profile)

Thank you!

Yes, it should be possible to use fmincon. I will have a look and update the file shortly. I'm also planning on adding more kernel functions and cleaning up the code.

Ronnie,

       Awesome function. Would it be possible to upload a version that doesn't require the optimization toolbox? I know you utilize multivariate constrained nonlinear optimization. I tried to merge John D' Ericos fmincon minimization into this but it didn't work out right.

Mehmet Pilgir

Updates

1.3

- Now using convex quadratic programming to train the SVR
- More kernels added

1.2

- Greatly simplified the use of the function
- Vectorized some operations
- Added more kernels

1.1

Updated the description

MATLAB Release
MATLAB 8.0 (R2012b)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video