Crossvalidation: anonymous function handle with toolbox classifiers

3 views (last 30 days)
Hi everyone,
I'll like to use the matlab crossvalidation function (crossval) with a randomforest classification toolbox (specifically http://code.google.com/p/randomforest-matlab/). As the predfun is defined in the documentation ( http://www.mathworks.com/help/toolbox/stats/crossval.html) I should give a function that retrieves the predictions for a set of test data XTEST. So, in agreement with the syntax, I should give a function like this:
classf= @(XTRAIN,ytrain,XTEST) classRF_predict(XTEST,classRF_train(XTRAIN,ytrain,1000));
such function takes as input the XTEST, the model itself that needs XTRAIN and ytrain. The problem comes when I try to run the cross validation, getting the follow error message.
cvMCR = crossval('mcr',X,y,'predfun',classf)
Error using crossval>evalFun (line 465)
The function
'@(XTRAIN,ytrain,XTEST)classRF_predict(XTEST,classRF_train(XTRAIN,ytrain,1000))'
generated the following error:
Cannot concatenate a double array and a nominal array.
Error in crossval>getLossVal (line 502)
funResult = evalFun(funorStr,arg(1:end-1));
Error in crossval (line 401)
[funResult,outarg] = getLossVal(i, nData, cvp, data,
predfun);
I'll really appreciate help.
Regards!

Answers (4)

Ilya
Ilya on 26 Apr 2012
I think you've hit a bug in the crossval function. My guess is that classRF_predict returns numeric labels, and crossval does not process them correctly for the 'mcr' criterion. The workaround is to convert class labels returned by classRF_predict to the nominal type:
classf= @(XTRAIN,ytrain,XTEST) nominal(classRF_predict(XTEST,classRF_train(XTRAIN,ytrain,1000)));
and execute the call to crossval in the same way as before
cvMCR = crossval('mcr',X,y,'predfun',classf)
Alternatively, you could use the other signature for crossval
vals = crossval(fun,X,y)
and define
fun = @(Xtrain,Ytrain,Xtest,Ytest) mean(Ytest ~= classRF_predict(Xtest,classRF_train(Xtrain,Ytrain,1000)));
In this case, since you are comparing the true and predicted labels yourself, you can keep them numeric.
Let me know if either solution works for you.

Ilya
Ilya on 26 Apr 2012
I am not an expert on the randomforest-matlab package, so my advice could be off. I find two things in your post worth investigating:
  1. It is strange that you use Xtest as the 1st input to classRF_predict(XTEST,classRF_train(XTRAIN,ytrain,1000)). Usually it is the trained object that is the 1st argument.
  2. Make sure that the array of class labels, y, you pass to crossval has the same type as labels returned by classRF_predict.

Cristobal
Cristobal on 26 Apr 2012
Ilya,
Have you used this crossval with an external toolbox? If so, could you give me an example?
it's a bit strange because the original function follow
model = classRF_train(XTRAIN,ytrain);
yfit = classRF_predict(XTEST,model);
As you can see I just putted in the first line into the second.
And yes I'm sure that classRF_predict returns (yfit) the same domain that the ytrain.
I'm thinking that there should be something wrong about the first point, when I replaced one function into another, but I can't figure it out.
Regards

Cristobal
Cristobal on 26 Apr 2012
I think I'm on a trail here. I noticed that the problem where on the input parameters. The example from http://www.mathworks.com/help/toolbox/stats/crossval.html, works with classify function and the fisheriris data set, where the targetout are cell data type. Therefore I tried to cast the input of the anonymous function to double, int8, string... Looking the error message I saw that at some point I fails when comparing a nominal data with double data (which is not supported). The code line 410 inside crossval.m function make the proper comparison. In order to work I hijacked the original function
temploss = sum(outarg ~= funResult);
to
temploss = sum(double(outarg) ~= funResult);
forcing outarg variable to be double.
I really don't know if there's a simplest way to solve this problem. I think is not the best solution but it works.
  1 Comment
Ilya
Ilya on 27 Apr 2012
Did you see my answer above?
You can do modify crossval if you'd like, but in that case do
temploss = sum(outarg ~= nominal(funResult));
That way you can continue using crossval with labels of all types. After what you did, you can only use crossval with handles that return labels of type double.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!