Feature selection using clustering
2 views (last 30 days)
Show older comments
I have to select features using clastering method - Ward's algorithm.
Short description of dataset: 16000 records, 5400 features (float) each record. I make some subset because working on the full set causes out of memory.
Reading Matlab docs it is quite easy:
X = load('subset.data');
Y = pdist(X);
Z = linkage(Y,'ward');
T = cluster(Z,'maxclust',2); % I set 2 clasters because in my dataset is 2 classes of objects. But now, I'm not sure if it is ok.
% PCA visualization
[W, pc] = princomp(X);
scatter(pc(:,1),pc(:,2),10,T,'filled')
And now, I don't know what to do next. How can I select features? Now, I think that instead of Y = pdist(X) it should be Y = pdist(X'), because I want to have clusters of features and than select some of them, right? But the problem is Y = pdist(X') causes out of memory. I would be greatful for answer, if my way of thinking is correct.
Thank you in advance!
0 Comments
Answers (0)
See Also
Categories
Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!