Code covered by the BSD License  

Highlights from
Statistical Learning Toolbox

from Statistical Learning Toolbox by Dahua Lin
Functions for statistical learning, pattern recognition and computer vision, covering many topics.

Description of slkmeansex
Home > sltoolbox > cluster > slkmeansex.m

slkmeansex

PURPOSE ^

SLKMEANSEX Performs Generalized K-means

SYNOPSIS ^

function [centers, labels, info] = slkmeansex(X, n, estfunctor, clsfunctor, varargin)

DESCRIPTION ^

SLKMEANSEX Performs Generalized K-means

 $ Syntax $
   - [centers, labels] = slkmeansex(X, n, estfunctor, clsfunctor, ...)
   - [centers, labels, info] = slkmeansex(X, n, estfunctor, clsfunctor, ...)

 $ Arguments $
   - X:            the samples to be clustered
   - n:            the number of samples
   - estfunctor:   the functor to estimate means(centers), as follows:
                   centers = estfunc(centers, X, K, weights, labels, ...)
                   when input centers is empty, it performs initial
                   estimation, otherwise, it performs updating. 
                   In addition, it should ignore the samples with 
                   labels being zeros or negative numbers.
   - clsfunctor:   the functor to classify samples
                   labels = clsfunc(centers, X, n, ...)  
                   it should produce 1 x n row vector.
   - centers:      the clustered centers
   - labels:       the labels indicating which sample belong to which center
                   a 1 x n row vector.
   - info:         the information on iteration process

 $ Description $
   - [centers, labels] = slkmeansex(X, n, estfunctor, clsfunctor, ...) 
     is a generalized version of K-means. It actually implements an
     iterative process to estimate centers from clustered samples and
     re-clustered the samples according to centers.
     You can specify the following properties:
       - 'K':              the number of initial number of clusters
                           (default = 3)
       - 'init_centers':   the initial centers.
       - 'maxiter':        the maximum number of iterations
                           (default = 100);
       - 'annthres':       the threshold of annealing
                           when the sum of sample weights for a center
                           is below annthres * the total weight, the
                           center will be discarded. (default = 0)
       - 'annfunc':        the function to discard a set of centers
                           centers = annfunc(centers, inds_discard);
       - 'weights':        the weights of the samples (default = [])
       - 'verbose':        whether to show progress information
                           (default = true)

 $ Remarks $
   - The X and centers can be in any form that conform to the specified
     functors.

   - If init_centers is specified, K should be exactly the number of
     initial centers.

   - If annthres is 0, then no centers will be discarded even some centers
     have no support samples in the process. The estfunctor should keep
     those centers unchanged.

 $ History $
   - Created by Dahua Lin, on Aug 28, 2006
   - Modified by Dahua Lin, on Aug 30, 2006
       - utilize slevalfunctor and slsharedisp
   - Modified by Dahua Lin, on Aug 31, 2006
       - based on slreevallearn

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:
  • slkmeans SLKMEANS Performs K-Means Clustering on samples

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [centers, labels, info] = slkmeansex(X, n, estfunctor, clsfunctor, varargin)
0002 %SLKMEANSEX Performs Generalized K-means
0003 %
0004 % $ Syntax $
0005 %   - [centers, labels] = slkmeansex(X, n, estfunctor, clsfunctor, ...)
0006 %   - [centers, labels, info] = slkmeansex(X, n, estfunctor, clsfunctor, ...)
0007 %
0008 % $ Arguments $
0009 %   - X:            the samples to be clustered
0010 %   - n:            the number of samples
0011 %   - estfunctor:   the functor to estimate means(centers), as follows:
0012 %                   centers = estfunc(centers, X, K, weights, labels, ...)
0013 %                   when input centers is empty, it performs initial
0014 %                   estimation, otherwise, it performs updating.
0015 %                   In addition, it should ignore the samples with
0016 %                   labels being zeros or negative numbers.
0017 %   - clsfunctor:   the functor to classify samples
0018 %                   labels = clsfunc(centers, X, n, ...)
0019 %                   it should produce 1 x n row vector.
0020 %   - centers:      the clustered centers
0021 %   - labels:       the labels indicating which sample belong to which center
0022 %                   a 1 x n row vector.
0023 %   - info:         the information on iteration process
0024 %
0025 % $ Description $
0026 %   - [centers, labels] = slkmeansex(X, n, estfunctor, clsfunctor, ...)
0027 %     is a generalized version of K-means. It actually implements an
0028 %     iterative process to estimate centers from clustered samples and
0029 %     re-clustered the samples according to centers.
0030 %     You can specify the following properties:
0031 %       - 'K':              the number of initial number of clusters
0032 %                           (default = 3)
0033 %       - 'init_centers':   the initial centers.
0034 %       - 'maxiter':        the maximum number of iterations
0035 %                           (default = 100);
0036 %       - 'annthres':       the threshold of annealing
0037 %                           when the sum of sample weights for a center
0038 %                           is below annthres * the total weight, the
0039 %                           center will be discarded. (default = 0)
0040 %       - 'annfunc':        the function to discard a set of centers
0041 %                           centers = annfunc(centers, inds_discard);
0042 %       - 'weights':        the weights of the samples (default = [])
0043 %       - 'verbose':        whether to show progress information
0044 %                           (default = true)
0045 %
0046 % $ Remarks $
0047 %   - The X and centers can be in any form that conform to the specified
0048 %     functors.
0049 %
0050 %   - If init_centers is specified, K should be exactly the number of
0051 %     initial centers.
0052 %
0053 %   - If annthres is 0, then no centers will be discarded even some centers
0054 %     have no support samples in the process. The estfunctor should keep
0055 %     those centers unchanged.
0056 %
0057 % $ History $
0058 %   - Created by Dahua Lin, on Aug 28, 2006
0059 %   - Modified by Dahua Lin, on Aug 30, 2006
0060 %       - utilize slevalfunctor and slsharedisp
0061 %   - Modified by Dahua Lin, on Aug 31, 2006
0062 %       - based on slreevallearn
0063 %
0064 
0065 %% parse and verify input arguments
0066 
0067 if nargin < 4
0068     raise_lackinput('slkmeansex', 4);
0069 end
0070 
0071 opts.K = 3;
0072 opts.init_centers = [];
0073 opts.maxiter = 100;
0074 opts.annthres = 0;
0075 opts.annfunc = [];
0076 opts.weights = [];
0077 opts.verbose = true;
0078 opts = slparseprops(opts, varargin{:});
0079 
0080 if opts.K > n
0081     error('sltoolbox:rterror', ...
0082         'The initial K is larger than the number of samples');
0083 end
0084 
0085 if opts.annthres > 0
0086     if isempty(opts.annfunc)
0087         error('sltoolbox:invalidarg', ...
0088             'You should specify annfunc when annthres > 0');
0089     end
0090 end
0091 
0092 w = opts.weights;
0093 if ~isempty(w)
0094     if ~isequal(w, [1 n])
0095         error('sltoolbox:sizmismatch', ...
0096             'The weights should be a 1 x n row vector');
0097     end
0098 end
0099 
0100 
0101 %% Initialization
0102 
0103 slsharedisp_attach('slkmeansex', 'show', opts.verbose);
0104 
0105 slsharedisp('Intialize K-Means');
0106 
0107 if isempty(opts.init_centers)
0108     initcinds = randsample(n, opts.K);
0109     labels = zeros(1, n);
0110     labels(initcinds) = 1:opts.K;
0111     
0112     K = opts.K;
0113     centers = slevalfunctor(estfunctor, [], X, K, w, labels);
0114 else
0115     K = opts.K;
0116     centers = opts.init_centers;
0117 end
0118 
0119 slsharedisp_incindent;
0120 slsharedisp('initial K = %d', K);
0121 slsharedisp_decindent;
0122 
0123 labels = slevalfunctor(clsfunctor, centers, X, n);
0124 
0125 
0126 %% Updating
0127 
0128 slsharedisp('Update K-Means');
0129 slsharedisp_incindent;
0130 
0131 km_estfunctor = {@kmeansex_est, estfunctor, opts};
0132 km_evalfunctor = {@kmeansex_eval, clsfunctor};
0133 km_cmpfunctor = {@kmeansex_cmp};
0134 
0135 models = {centers, K};
0136 data = {X, n, w};
0137 [models, labels, info] = slreevallearn(models, labels, data, ...
0138     km_estfunctor, km_evalfunctor, km_cmpfunctor, ...
0139     'iter', {'maxiter', opts.maxiter, 'titlebreak', false}, 'isrecorded', false);
0140 
0141 centers = models{1};
0142 
0143 slsharedisp_decindent;
0144 slsharedisp_detach;
0145 
0146 %% Core functions
0147 
0148 % models = {centers, K}
0149 % data = {X, n, w}
0150 
0151 function models = kmeansex_est(models, data, labels, estfunctor, opts)
0152 
0153 X = data{1};
0154 w = data{3};
0155 centers = models{1};
0156 K = models{2};
0157 
0158 if ~isempty(centers) && opts.annthres > 0    
0159     if isempty(w)
0160         w = ones(1, length(labels));
0161     end
0162     cw = sllabeledsum(w, labels, 1:K);
0163     wthres = opts.annthres * sum(cw) / K;
0164     if any(cw < wthres)
0165         inds_ann = find(cw < wthres);
0166         centers = feval(opts.annfunc, centers, inds_ann);
0167         K = K - length(inds_ann);
0168         
0169         models = {centers, K};
0170         return;
0171     end
0172 end
0173 
0174 centers = slevalfunctor(estfunctor, centers, X, K, w, labels);
0175 models = {centers, K};
0176 
0177 
0178 function labels = kmeansex_eval(models, data, labels, clsfunctor)
0179 
0180 X = data{1};
0181 n = data{2};
0182 centers = models{1};
0183 
0184 slignorevars(labels);
0185     
0186 labels = slevalfunctor(clsfunctor, centers, X, n);
0187 
0188 
0189 function isconverged = kmeansex_cmp(models_prev, models, labels_prev, labels)
0190     
0191 K_prev = models_prev{2};
0192 K = models{2};
0193 n = length(labels);
0194 
0195 slsharedisp_attach('kmeansex_cmp');
0196 
0197 isconverged = false;
0198 if K == K_prev
0199     nchanged = sum(labels ~= labels_prev);
0200     slsharedisp('K = %d: %d / %d changed', K, nchanged, n);
0201 
0202     if nchanged == 0
0203         isconverged = true;
0204     end
0205 else
0206     slsharedisp('K = %d ==> %d', K_prev, K);
0207 end
0208 
0209 slsharedisp_detach();
0210 
0211 
0212 
0213 
0214 
0215 
0216     
0217     
0218     
0219     
0220         
0221     
0222     
0223     
0224 
0225 
0226 
0227 
0228 
0229

Generated on Wed 20-Sep-2006 12:43:11 by m2html © 2003

Contact us at files@mathworks.com