Dataset condensation and distance function optimization in KNN classifier

2 views (last 30 days)
Hi,
I am working on a decision system for stocks. I have a lot of data (time series 6000 stocks, 10 years of daily data on various metrics related to valuation, price momentum, street estimates, intrinsic business quality, etc.)
From my reading, it sounds like a KNN classifier is the easiest and best type of framework for me to focus on (after considering NN's, decision tree's, etc.). However, the MATLAB provided toolboxes seem to lack some important components that I would need. Namely: data condensation and some way of optimizing the distance function.
I Googled a few things and found that the "Hart" algorithm is often used for condensation ("CNN"), and found this link which seems to be the kind of thing I need (<http://mirlab.org/jang/matlab/toolbox/machineLearning/help/dsCondense_help.html#2)>. Unfortunately it doesn't seem this code is freely available.
For optimizing distance functions there seem to be more freely available code online, such as http://www.cs.cmu.edu/~liuy/distlearn.htm and http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html.
Does anybody know where I can find good code to accomplish condensing and distance function optimization? Any other comments on the general approach would be greatly appreciated.
THANK YOU!
Regards, Mike

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!