Feature selection using clustering

2 views (last 30 days)
Kamil
Kamil on 28 Apr 2011
I have to select features using clastering method - Ward's algorithm.
Short description of dataset: 16000 records, 5400 features (float) each record. I make some subset because working on the full set causes out of memory.
Reading Matlab docs it is quite easy:
X = load('subset.data');
Y = pdist(X);
Z = linkage(Y,'ward');
T = cluster(Z,'maxclust',2); % I set 2 clasters because in my dataset is 2 classes of objects. But now, I'm not sure if it is ok.
% PCA visualization
[W, pc] = princomp(X);
scatter(pc(:,1),pc(:,2),10,T,'filled')
And now, I don't know what to do next. How can I select features? Now, I think that instead of Y = pdist(X) it should be Y = pdist(X'), because I want to have clusters of features and than select some of them, right? But the problem is Y = pdist(X') causes out of memory. I would be greatful for answer, if my way of thinking is correct.
Thank you in advance!

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!