Code covered by the BSD License  

Highlights from
Statistical Learning Toolbox

from Statistical Learning Toolbox by Dahua Lin
Functions for statistical learning, pattern recognition and computer vision, covering many topics.

Description of slkmeans
Home > sltoolbox > cluster > slkmeans.m

slkmeans

PURPOSE ^

SLKMEANS Performs K-Means Clustering on samples

SYNOPSIS ^

function [means, labels] = slkmeans(X, varargin)

DESCRIPTION ^

SLKMEANS Performs K-Means Clustering on samples

 $ Syntax $
   - [means, labels] = slkmeans(X, ...)

 $ Arguments $
   - X:        the sample matrix
   - means:    the center(mean) vectors of the clusters
   - labels:   the labels of the clusters which the samples belong to

 $ Description $
   - [means, labels] = slkmeans(X, ...) Performs K-Means Clustering on 
     the data X with each column representing a sample. If k is the 
     number of clusters. In the output argument, means are the d x k 
     vectors representing the centers of clusters. labels indicates 
     which cluster the elements belong to. You can specify the following
     additional properties.

     \*
     \t  Table 1. Clustering properties
     \h    name        &     description
          'K'          & The number of initial clusters, default = 3.
          'init_means' & The initial values of cluster centers. 
                         (default = [], that is random draw)
          'clsfunc'    & The function for classifying samples given
                         the means of clusters. It can be one of the
                         following string:
                         1. 'normal' (default): use slmetric_pw for
                            distance calculation;
                         2. 'samplewise': classify samples one-by-one
                         3. 'ann': classify samples using annsearch
                            (annsearch is required)
                         or, clsfunc can be a function handle using
                         following syntax labels = f(centers, data).
          'maxiter'    & The maximum number of iterations (default = 100)
          'annthres'   & The threshold of center annealing
                         (default = 0)
          'weights'    & The weights of the samples
          'verbose'    & Whether to show dynamic information in the 
                         procedure (default = true)

 $ History $
   - Created by Dahua Lin on Oct 7th, 2005
   - Modified by Dahua Lin on Apr 24th, 2006
       - Upgrade the function to base on sltoolbox v4
       - Add the clsfunc properties, so that the user can customize
         the behaviour of classification step according to the context.
   - Modified by Dahua Lin on Aug 28, 2006
       - Based on the new framework function slkmeansex
       - Incorporate the support of center annealing
   - Modified by Dahua Lin on Sep 14th, 2006
       - use sllabelinds to increase the efficiency of gathering the
         samples in the same cluster in the estimation step.

CROSS-REFERENCE INFORMATION ^

This function calls:
  • annsearch ANNSEARCH Approximate Nearest Neighbor Search
  • slkmeansex SLKMEANSEX Performs Generalized K-means
  • slmetric_pw SLMETRIC_PW Compute the metric between column vectors pairwisely
  • slmean SLMEAN Compute the mean vector of samples
  • slignorevars SLIGNOREVARS Ignores the input variables
  • sllabelinds SLLABELINDS Extract indices corresponding to specified labels
  • slparseprops SLPARSEPROPS Parses input parameters
This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [means, labels] = slkmeans(X, varargin)
0002 %SLKMEANS Performs K-Means Clustering on samples
0003 %
0004 % $ Syntax $
0005 %   - [means, labels] = slkmeans(X, ...)
0006 %
0007 % $ Arguments $
0008 %   - X:        the sample matrix
0009 %   - means:    the center(mean) vectors of the clusters
0010 %   - labels:   the labels of the clusters which the samples belong to
0011 %
0012 % $ Description $
0013 %   - [means, labels] = slkmeans(X, ...) Performs K-Means Clustering on
0014 %     the data X with each column representing a sample. If k is the
0015 %     number of clusters. In the output argument, means are the d x k
0016 %     vectors representing the centers of clusters. labels indicates
0017 %     which cluster the elements belong to. You can specify the following
0018 %     additional properties.
0019 %
0020 %     \*
0021 %     \t  Table 1. Clustering properties
0022 %     \h    name        &     description
0023 %          'K'          & The number of initial clusters, default = 3.
0024 %          'init_means' & The initial values of cluster centers.
0025 %                         (default = [], that is random draw)
0026 %          'clsfunc'    & The function for classifying samples given
0027 %                         the means of clusters. It can be one of the
0028 %                         following string:
0029 %                         1. 'normal' (default): use slmetric_pw for
0030 %                            distance calculation;
0031 %                         2. 'samplewise': classify samples one-by-one
0032 %                         3. 'ann': classify samples using annsearch
0033 %                            (annsearch is required)
0034 %                         or, clsfunc can be a function handle using
0035 %                         following syntax labels = f(centers, data).
0036 %          'maxiter'    & The maximum number of iterations (default = 100)
0037 %          'annthres'   & The threshold of center annealing
0038 %                         (default = 0)
0039 %          'weights'    & The weights of the samples
0040 %          'verbose'    & Whether to show dynamic information in the
0041 %                         procedure (default = true)
0042 %
0043 % $ History $
0044 %   - Created by Dahua Lin on Oct 7th, 2005
0045 %   - Modified by Dahua Lin on Apr 24th, 2006
0046 %       - Upgrade the function to base on sltoolbox v4
0047 %       - Add the clsfunc properties, so that the user can customize
0048 %         the behaviour of classification step according to the context.
0049 %   - Modified by Dahua Lin on Aug 28, 2006
0050 %       - Based on the new framework function slkmeansex
0051 %       - Incorporate the support of center annealing
0052 %   - Modified by Dahua Lin on Sep 14th, 2006
0053 %       - use sllabelinds to increase the efficiency of gathering the
0054 %         samples in the same cluster in the estimation step.
0055 %
0056 
0057 %% parse and verify input arguments
0058 if ndims(X) ~= 2 
0059     error('sltoolbox:invaliddims', 'X should be a 2D matrix');
0060 end
0061 
0062 opts.K = 3;
0063 opts.init_means = [];
0064 opts.clsfunc = 'normal';
0065 opts.maxiter = 100;
0066 opts.annthres = 0;
0067 opts.weights = [];
0068 opts.verbose = true;
0069 opts = slparseprops(opts, varargin{:});
0070 
0071 n = size(X, 2);
0072 
0073 if ischar(opts.clsfunc)
0074     switch opts.clsfunc
0075         case 'normal'
0076             fh_classify = @classify_normal;
0077         case 'samplewise'
0078             fh_classify = @classify_samplewise;
0079         case 'ann'
0080             fh_classify = @classify_ann;
0081         otherwise
0082             error('sltoolbox:invalidarg', ...
0083                 'Invalid clsfunc option %s', opts.clsfunc);
0084     end
0085 elseif isa(opts.clsfunc, 'function_handle')
0086     fh_classify = opts.clsfunc;
0087 else
0088     error('sltoolbox:invalidarg', ...
0089         'clsfunc can be either a string or a function handle');
0090 end
0091 
0092 
0093 %% Perform K-means
0094 
0095 estfunctor = {@kmeans_est};
0096 clsfunctor = {@kmeans_classify, fh_classify};
0097 annfunc = @kmeans_anneal;
0098 
0099 [means, labels] = slkmeansex(X, n, estfunctor, clsfunctor, ...
0100     'K', opts.K, ...
0101     'init_centers', opts.init_means, ...
0102     'maxiter', opts.maxiter, ...
0103     'annthres', opts.annthres, ...
0104     'annfunc', annfunc, ...
0105     'weights', opts.weights, ...
0106     'verbose', opts.verbose);
0107 
0108 
0109 
0110 %% Core slot functions
0111 
0112 function centers = kmeans_est(centers, X, K, weights, labels)
0113 
0114 d = size(X, 1);
0115 if isempty(centers)
0116     centers = zeros(d, K);
0117 end
0118 
0119 Inds = sllabelinds(labels, 1:K);
0120 for i = 1 : K    
0121     si = Inds{i};
0122     
0123     if ~isempty(si)
0124         curX = X(:, si);    
0125         if isempty(weights)
0126             curw = [];
0127         else
0128             curw = weights(si);
0129         end        
0130         centers(:, i) = slmean(curX, curw);    
0131     end
0132 end
0133 
0134 
0135 function labels = kmeans_classify(centers, X, n, fh_classify)
0136 
0137 slignorevars(n);
0138 labels = fh_classify(centers, X);
0139 
0140 
0141 function centers = kmeans_anneal(centers, inds_discard)
0142 
0143 centers(:, inds_discard) = [];
0144 
0145 
0146 
0147 
0148 %% The functions for classifying samples to clusters
0149 
0150 function L = classify_normal(centers, data)
0151 
0152 dists = slmetric_pw(centers, data, 'eucdist');
0153 [md, L] = min(dists, [], 1);
0154 slignorevars(md);
0155     
0156 function L = classify_samplewise(centers, data)
0157 
0158 n = size(data, 2);
0159 L = zeros(1, n);
0160 for i = 1 : n
0161     curdists = slmetric_pw(centers, data(:, i), 'eucdist');
0162     [md, p] = min(curdists);
0163     L(i) = p;
0164 end
0165 slignorevars(md);
0166 
0167 function L = classify_ann(centers, data)
0168 
0169 L = annsearch(centers, data, 1);
0170 L = L(:)';
0171 
0172 
0173 
0174 
0175 
0176 
0177 
0178     
0179

Generated on Wed 20-Sep-2006 12:43:11 by m2html © 2003

Contact us at files@mathworks.com