I think the implementation is correct, though it's not the most efficient. The update procedures in this code can be proved to be equivalent to those proposed in "Updating Mean and Variance Estimates: An Improved Method." You can also find it in wikipedia. (en.wikipedia.org/wiki/Algorithms_for_calculating_variance)

Thank you for checking. The reason is that there are cases where the cardinality of the data is unknown before applying clustering. Well, I can first check the cardinality of the data and then use k <= cardinality but I just thought it would be great if your code could handle that situation.

My input data X is a set of one dimensional scalar and the values are taken from a finite discrete set S, e.g., S = {1,2,3,4,5}. When I run yael_kmeans with K>|S|, it looks like yael_kmeans goes into an infinite loop. Do you have any idea on how to fix it?

I released the code for new regression tree forests method described in "Growing Regression Forests by Classification: Applications to Object Pose Estimation. The European Conference on Computer Vision (ECCV), 2014." The method outperforms various regression methods such as traditional binary regression tree forests as well as Support Vector Regression and Kernel Partial Least Squares Regression. The code is available from here. It is written in matlab.
http://www.kotahara.com/download-k-clusters-regression-forest.html

I think the implementation is correct, though it's not the most efficient. The update procedures in this code can be proved to be equivalent to those proposed in "Updating Mean and Variance Estimates: An Improved Method." You can also find it in wikipedia. (en.wikipedia.org/wiki/Algorithms_for_calculating_variance)

I released the code for new regression tree forests method described in "Growing Regression Forests by Classification: Applications to Object Pose Estimation. The European Conference on Computer Vision (ECCV), 2014." The method outperforms various regression methods such as traditional binary regression tree forests as well as Support Vector Regression and Kernel Partial Least Squares Regression. The code is available from here. It is written in matlab.
http://www.kotahara.com/download-k-clusters-regression-forest.html

Comment only

29 Apr 2014

Boosted Binary Regression Trees
Boosted Binary Regression Trees is a powerful regression method which can handle vector targets.

Hi Hossein,
I think the implementation is correct, though it's not the most efficient. The update procedures in this code can be proved to be equivalent to those proposed in "Updating Mean and Variance Estimates: An Improved Method." You can also find it in wikipedia. (en.wikipedia.org/wiki/Algorithms_for_calculating_variance)
The update procedures in the above method are:
SSE_{N} = SSE_{N-1} + (x_N - \bar x_{N-1})(x_N - \bar x_N)
The second term of RHS is
(x_N - \bar x_N + \bar x_N - \bar x_{N-1} ) ( x_N - \bar x_N ) = ( x_N - \bar x_N )^2 + (x_N - \bar x_N )(\bar x_N - \bar x_{N-1} )
The procedure in my code is
SSE_{N} = SSE_{N-1} + (\bar x_N - \bar x_{N-1})^2 * (N-1) + ( x_N - \bar x_N )^2
So now we want to show
(x_N - \bar x_N )(\bar x_N - \bar x_{N-1} ) = (\bar x_N - \bar x_{N-1})^2 * (N-1)
First divide both side by (\bar x_N - \bar x_{N-1}). Now we need to show (\bar x_N - \bar x_{N-1})(N-1) = x_N - \bar x_N
We can show this using the update procedure for mean (i.e., \bar x_N = \bar x_{N-1} + (x_N - \bar x_{N-1}) / N
The proof for sseRight should be done similarly.
As I said, this is not as efficient as the original procedure, so I will consider modifying the code.
Thanks.

Comment only

14 Feb 2014

Boosted Binary Regression Trees
Boosted Binary Regression Trees is a powerful regression method which can handle vector targets.

Hi Kota,
Are you sure that the implementation of findBestSplit is correct? I have doubts about the SSE update. I mean the lines 129 and 130
for( int k=0; k<targetDim; k++ ){
sseLeft = sseLeft + ( aveLeft[k] - aveLeftPre[k] ) * ( aveLeft[k] - aveLeftPre[k] ) * ( sizeLeft - 1 ) + ( target(sortByValue[j].idx,k) - aveLeft[k] ) * ( target(sortByValue[j].idx,k) - aveLeft[k] );
sseRight = sseRight - ( aveRightPre[k] - aveRight[k] ) * ( aveRightPre[k] - aveRight[k] ) * sizeRight - ( target(sortByValue[j].idx,k) - aveRightPre[k] ) * ( target(sortByValue[j].idx,k) - aveRightPre[k] );
}
I understand the procedure, but I think the update rule is not correct.
Thanks

Comment only

13 Feb 2014

Boosted Binary Regression Trees
Boosted Binary Regression Trees is a powerful regression method which can handle vector targets.

Comment only