I can't import the database file. I think my matlab can only import the JDBC files. I use Mac OS by the way.
I would appreciate any help. I am new at this
Michelle-
These results are entirely consistent with how classification trees work. Simply rescaling each of the inputs by multiplying them with different coefficients should have no effect on the tree.
For exactly why this is, I'd recommend Breiman's book (which is referenced in the doc), but the short answer is that trees sort each predictor's observations and try a candidate split within each of the gaps. The tree will then select the split that gives the "best" splitting criterion (and that's an entirely different discussion). Scaling the predictor only serves to scale this process, but it doesn't fundamentally change the results.
As an example: suppose we have a simple set of obervations where the predictor has been measured at 1, 2, 4, and 10. The tree will try splits at 1.5, 3, and 7. Let's say that the "best" split is at 7.
Now we go ahead and rescale this input-- mulitply it by 100 or some other coefficient. Now, the tree tries splits at 150, 300, and 700, and it will still select the split at 700. Rescaling doesn't change anything.
Now, if we were to cleverly create _new_ predictors out of a well-chosen combination (linear or otherwise) of our existing predictors, then that certainly would change the tree's performance. For instance, make a 6th predictor in your X from Altman's coefficient's times your original X-- then you might get some interesting results.
Hi Michael,
I've noticed a very curious result of the treebagger, and was wondering if you have had experience with this.
I send a matrix X which has 5 columns of a variety of accounting data. I also send Y, which is a vector of credit ratings. I have about 3000 rows of data.
I understand that the bankruptcy academic community have done extensive research to determine optimal coefficients to predict bankruptcy. I thought it would be interesting to see what the impacts of the coefficients are on the bagging results. So I created a vector, coeff, and multiplied each row of X by the parameters in coeff.
for example: Altman's coefficients are {0.717 0.847 3.107 0.42 0.998}
Curiously, varying the coefficients has NO effect on the oobErrors. I've run exhaustive loops to vary all of the coefficients to track this down.
It seems like in the case of the limit where the coefficient goes to zero, there should be an effect.
thanks for any insights.
Comment only