image thumbnail

Rapid Centroid Estimation

by

 

10 Sep 2012 (Updated )

An efficient particle swarm approach for rapid optimization of cluster centroids.

RCE( in, k, varargin )
function [labels, centroids, history] = RCE( in, k, varargin )
%RCE Rapid Centroid Estimation (RCE) clustering.
%   IDX = RCE(X, K) partitions the points in the N-by-P data matrix X
%   into K clusters.  This partition minimizes the sum, over all clusters, of
%   the within-cluster sums of point-to-cluster-centroid distances.
%   
%   Rows of X correspond to features, columns correspond to variables.  
%   By default, RCE uses Euclidean distances.
%
%   RCE treats NaNs as missing data, and ignores any rows of X that
%   contain NaNs.
%
%   [IDX, C] = RCE(X, K) returns the K cluster centroid locations in
%   the K-by-P matrix C.
%
%   [IDX, C, s] = RCE(X, K) returns the search history s of the algorithm
%
%   [ ... ] = RCE(..., 'PARAM1',val1, 'PARAM2',val2, ...) specifies
%   optional parameter name/value pairs to control the iterative algorithm
%   used by RCE.  Parameters are:
%
%   'Distance' - Distance measure, in P-dimensional space, that RCE
%      should minimize with respect to.  Choices are:
%          'sqEuclidean'  - Squared Euclidean distance (the default)
%          'sEuclidean'   - Standardized Euclidean distance
%          'sqsEuclidean' - Squared Standardized Euclidean distance
%          'cityblock'    - Sum of absolute differences, a.k.a. L1 distance
%          'cosine'       - One minus the cosine of the included angle
%                           between points (treated as vectors)
%          'correlation'  - One minus the sample correlation between points
%                           (treated as sequences of values)
%          'hamming'      - Percentage of bits that differ (only suitable
%                           for binary data)
%          'minkowski'    - Minkowski Distance. Default value for 'P' is 2.
%                           assign argument 'P' and a value to alter.
%          'chebychev'    - Chebychev distance function
%          'spearman'     - Spearman distance
%          'jaccard'      - Jaccard Distance
%
%   Other Parameters:
%          'P'        - Set the P value for minkowski distance, default is 2
%          'subsprob' - Set the substitution probability (for RCE+), default is 0.00
%          'swarm'    - Set the number of groups in the swarm (for swarm
%                       RCE), default is 1 (normal RCE)
%          'MaxStagnation' - Set the maximum iteration to stagnate before 
%                       particle reset (for RCE^r), default is 0
%          'fun'      - defines the minimizing fitness function. The RCE 
%                       abstract fitness function should obey the syntax of
%                       fit = function_name(distance_matrix,labels)
%                       the default fitness function used in RCE is the sum 
%                       of intracluster distances normalized by data volume.
%          'Display'  - Level of display output.  Choices are 'off', (the
%                       default) and 'on'.
%          'MaxIter'  - Maximum number of iterations allowed.  Default is 100.
%          'dxmax'    - Set the maximum dx
%
%
%   Example:
%
%       X = [randn(2,50)+ones(2,50), randn(2,50)-ones(2,50)];
%       figure;
%       [cidx, ctrs, s] = RCE(X, 2, 'Distance','minkowski','P',8, ...
%                             'Display','on', ...
%                             'swarm',3, ...
%                             'subsprob',0.02,'dxmax',0.1, 'maxiter',500);
%       disp('cluster centroids');
%       cidx
%       disp('cluster centroids'); 
%       ctrs
%       figure;
%       silhouette(X',cidx,'sqEuclidean');
%
%   See also KMEANS, LINKAGE, CLUSTERDATA, SILHOUETTE.
%
%   RCE is proposed as a derivate of PSC algorithm with radically reduced 
%   time complexity. The quality of the results produced are similar to PSC.
%   The advantage of RCE is RCE has lesser time taken per iteration and
%   faster convergence [1-4].
%
%   There are 3 main components in RCE:
%   1. Cognitive       : RCE particles is attracted to a closest data. 
%   2. Social          : Each data attracts RCE particles to the coordinate 
%                        that an RCE particle has been closest to
%   3. Self-organizing : Each data points in the cluster tries to pull the
%                        corresponding RCE particle to cluster centre of mass
%
%   RCE operates inside a globally interdependent neighborhood [1-2]. 
%   In the neighborhood, a slight change in the position of a particle triggers 
%   a global reaction involving every other particle in the swarm. This 
%   reaction naturally occurs to preserve the gravitational equilibrium in 
%   the neighborhood. A particle in RCE is a centroid prototype. An RCE group 
%   encodes a possible solution to the clustering problem.
%   
%   Recent updates in RCE includes:
%       - the new lighter white noise update scheme
%       - substitution, 
%       - particle reset, 
%       - swarm
%
%   The new white noise update scheme reduces the frequency and volume of
%   the random number generated per iteration [4]. Compared to the previously
%   proposed update scheme [1-2], the number of randoms generated in white
%   noise update scheme is independent of the number of particles and the
%   number of groups.
%
%   Substitution strategy allows RCE particle to distrupt an achieved
%   equilibrium position by moving towards other particles. [3]
%
%   Particle reset reinitialize the x and dx of every particle in the group
%   when stagnation counter exceeds MaxStagnation [4]   
%
%   Swarm strategy allows multiple groups of RCE to work in paralel. 
%   In the swarm strategy each group will share its knowledge to other RCE 
%   group. [3]
%
%   An increase of computational complexity is apparent in the current swarm 
%   strategy. One should consider the computational load that may result from adding 
%   more groups in the swarm.
%
% References:
%sqseuclidean
%   [1] M. Yuwono, S.W. Su, B. Moulton, H. Nguyen, "Method for increasing 
%       the computational speed of an unsupervised learning approach for 
%       data clustering", in Proc 2012 IEEE Congress on Evolutionary Computation, 
%       2012, pp.2957-2964. 
%   [2]	M. Yuwono, S.W. Su, B. Moulton, H. Nguyen, "Fast unsupervised learning 
%       method for rapid estimation of cluster centroids", in Proc 2012 IEEE 
%       Congress on Evolutionary Computation, 2012, pp.889-896. 
%   [3] M. Yuwono, S.W. Su, B. Moulton, & H. Nguyen, "Optimization Strategies 
%       for Rapid Centroid Estimation", in Proc 34rd Annual International 
%       Conference of the IEEE EMBS, San Diego, 2012, pp.6212-6215. 
%   [4] M. Yuwono, S.W. Su, B. Moulton, & H. Nguyen, "Rapid Centroid 
%       Estimation and its Variants", unpublished
%
%   Copyright 2011-2012 Mitchell Yuwono.
%   $Revision: 1.0.5 $  $Date: 2012/11/24 15:19:00 $
    
    % check number of arguments
    if nargin < 2
        error(message('stats:rce:TooFewInputs'));
    end

    % remove missing data
    wasnan = isnan(in);
    wasnan = logical(sum(wasnan,1));
    in = in(:,wasnan==0);
    
    hadNaNs = any(wasnan);
    
    if hadNaNs
        warning(message('stats:rce:MissingDataRemoved'));
    end

    % parse input arguments
    
    pnames = {'distance' 'maxiter' 'display' 'subsprob' 'P' 'swarm' 'dxmax' 'fun' 'maxstagnation'};
    dflts =  {'euclidean' 100      'off' 0.00  2 1 0.01 @wcdist 0};
    
    parser = inputParser;
    for i = 1:length(pnames)
        parser.addParamValue(lower(pnames{i}),lower(dflts{i}));
    end
    parser.parse(varargin{:});
    r = parser.Results;
    
    
    distNames = {'euclidean','sqeuclidean','seuclidean','sqseuclidean','cityblock','minkowski','chebychev','cosine','correlation','spearman','hamming','jaccard','sqpearson'};
        
    validatestring(r.distance,distNames);
    
    %% basic memory
    maxiter = r.maxiter;
    fitness_change = [];
    display = r.display;
    max_stagnation = r.maxstagnation;
    substitution_probability = r.subsprob;
    
    dxmax = r.dxmax;
    w = 0.9;
    dimension = size(in,1);
    volume = size(in,2);    
    
    std_in = std(in,[],2);
    mean_in = mean(in,2);
    tmp2 = std_in*ones(1,volume);
    
    norm_in = bsxfun(@rdivide, bsxfun(@minus,in,mean_in), std_in);
    omega = [min(in,[],2), max(in,[],2)];
    
    if(dimension > 2)
        randomProjectionMatrix = normrnd(0,0.2,2,size(in,1));        
    else
        randomProjectionMatrix = eye(size(in,1));  
    end 
    
    %% initialize groups
    
    particle_trajectory = [];
    nm = r.swarm;
    collective_minimum = zeros(dimension,k*nm);
    
    fit = nan(nm,maxiter);
    fun = r.fun;
       
    labels = zeros(nm,volume);
    
    ni = zeros(nm,k);
    
    dx = zeros(dimension,volume,nm);
    x = zeros(dimension,k,nm);
    p = zeros(dimension, volume, k, nm);
    g = zeros(dimension,volume,nm);
    p_dist = inf(k,volume,nm);
    g_dist = inf(nm,volume);
    distance_matrix = p_dist;
    %fit = inf(r.swarm,1);    
    distance = r.distance;
    p_value = r.p;
    
    omega_l = omega(:,1)*ones(1,k);
    omega_h = omega(:,2)*ones(1,k);
    for m = 1:nm
        x(:,:,m) = unifrnd(omega_l,omega_h);
        p_dist(:,:,m) = distmat(in,x(:,:,m),distance,mean_in,std_in,p_value,norm_in);
        for i = 1:k
            p(:,:,i,m) = x(:,i,m)*ones(1,volume);
        end        
        fit(m,1) = inf;
    end
    
    M = x;
    M_labels = labels;
    M_fit = inf(m,1);
    cnt = zeros(m,1);
    Ym = inf;
    
    %% update iterations
    
    t0 = tic;
    for it = 1:maxiter        
        in_w = in + unifrnd(-tmp2,tmp2);
        for m = 1:r.swarm
            %% calculate distance matrix            
            xm = x(:,:,m);
            distance_matrix(:,:,m) = distmat(in,x(:,:,m),distance,mean_in,std_in,p_value,norm_in);
            
            %% update personal best
            smaller_distance = distance_matrix(:,:,m)<p_dist(:,:,m);
            
            for i = 1:k
                tmp = x(:,i,m) * ones(1,volume);
                pbest_id_i = smaller_distance(i,:);
                p_dist(i,pbest_id_i,m) = distance_matrix(i,pbest_id_i,m);
                p(:,pbest_id_i,i,m) = tmp(:,pbest_id_i);  
            end
            
            %% update global best
            
            [Y, I] = min(distance_matrix(:,:,m),[],1);
            labels(m,:) = I;
            
            for i = 1:k
                ni(m,i) = nnz(labels(m,:) == i);
            end
            
            id = Y<g_dist(m,:);
    
            if(nnz(id) > 0)
                g_dist(m,id) = Y(id);
                g(:,id,m) = xm(:,I(id));
            end
            
            %% update minimum
                                    
            fit_t = fun(distance_matrix(:,:,m),labels(m,:));
            if(fit_t < M_fit(m))        
                M_fit(m) = fit_t;
                M(:,:,m) = xm;
                M_labels(m,:) = labels(m,:);
                cnt(m) = 0;
            else
                cnt(m) = cnt(m) + 1;
            end
            
            fit(m,it) = M_fit(m);
            
            
            %% update collective minimum
            
            if(nm > 1)
                if(~isequal(collective_minimum(:,(1+(m-1)*k):(m*k)), M(:,:,m)))
                    collective_minimum (:,(1+(m-1)*k):(m*k)) = M(:,:,m);
                end
                
                cm_dist = distmat(collective_minimum,x(:,:,m),distance,mean_in,std_in,p_value);
                [~, cm_I] = min(cm_dist,[],1);                
            end
            
            %% update position
            
            for i = 1:k
                C = logical(ismembc2(labels(m,:),i));
                
                if(nm > 1)
                    K = logical(ismembc2(cm_I,i));
                else
                    K = 0;
                end
                
                N = nnz(C);
                
                if (N>0)
                    gbest = g(:,C,m);
                    pbest = p(:,C,m);
                    
                    y = in_w(:,C);
                                        
                    delta = bsxfun(@minus,pbest,x(:,i,m));
                    
                    delta(isnan(delta)) = 0;
                    cognitive = sum(delta,2)/N;       

                    delta = bsxfun(@minus,gbest,x(:,i,m));       
                    delta(isnan(delta)) = 0;
                    social = sum(delta,2)/N;

                    delta = bsxfun(@minus,y,x(:,i,m));
                    delta(isnan(delta)) = 0;
                    self_org = sum(delta,2)/N;
                    

                    if (nnz(K)>0)
                        cm = collective_minimum(:,K);
                        delta = bsxfun(@minus,cm,x(:,i,m));
                        delta(isnan(delta)) = 0;
                        cm = sum(delta,2)/nnz(K);
                    else
                        cm = 0;
                    end

                else
                    
                    [~,I_win] = max(ni(m,:));
                    social = x(:,I_win,m)-x(:,i,m);
                    cognitive = 0;
                    self_org = 0;
                    cm = 0;
                end
                
                %% calcualte dx
                dx(:,i,m) = w*dx(:,i,m) +(cognitive + social + self_org + cm)/12;
                
                
                vsign = sign(dx(:,i,m));
                dx(:,i,m) = max(abs(dx(:,i,m)),dxmax*omega(:,1));
                dx(:,i,m) = min(abs(dx(:,i,m)),dxmax*omega(:,2));
                dx(:,i,m) = vsign.*dx(:,i,m);

                %% update x
                x(:,i,m)= x(:,i,m) + dx(:,i,m);    
                
            end 
            
            %% substitution
            Is = rand([1 k]) < substitution_probability;

            if(nnz(Is)>0)
                tmp = randi(k,[1,k]);
                for i = 1:k
                    if(Is(i) == 1)
                        x(:,i,m) = x(:,tmp(i),m) + normrnd(0, std_in);
                        dx(:,i,m) = 0;
                    end
                end
            end
            
            %% particle reset
            if(max_stagnation > 0 && cnt(m) > max_stagnation)                          
                if(strcmp(display,'on'))
                    disp(['[t=' num2str(toc(t0), '%10.3f') 's]' ' particle reset on group ' num2str(m)]);
                end
                
                cnt(m) = 0;
                x(:,:,m) = unifrnd(omega_l,omega_h);
                dx(:,:,m) = 0;                
            end
            
        end
        
        %% inertia weight
        w = w*0.95;
        
        %% swarm best
        [Ym_t, Im]= min(M_fit);
        
        
        if(Ym_t < Ym)
            Ym = Ym_t;
            fitness_change = [fitness_change, it];
            
            %% if display on, plot swarm
            if(strcmp(display,'on'))
                disp(['[t=' num2str(toc(t0), '%10.3f') 's]' ' iteration ' num2str(it) ' fitness=' num2str(Ym)]);                
                particle_trajectory = [particle_trajectory; sort(M(:,:,Im),2)];               
                subplot(4,1,[1:3]);
                plot_swarm(in, M_labels(Im,:), sort(M(:,:,Im),2), particle_trajectory, randomProjectionMatrix);
                subplot(4,1,4);
                plot(fit','LineWidth',1);
                grid on;
                pause(0.01);
            end

        end
        %% toc time
        t(it) = toc(t0);
        
    end
    
    %% format output
    history.fitness_graph = fit;
    history.t = t;
    history.fitness_change = fitness_change;
    if(~isempty(particle_trajectory))
        history.particle_trajectory = particle_trajectory;
    end
    history.randomProjectionMatrix = randomProjectionMatrix;
    history.minima = M;
    history.labels = M_labels;
    if(nm > 1)
        history.swarm_minimum_centroids = M(:,:,Im);
        history.swarm_minimum_labels = M_labels(Im,:);
        history.swarm_fitness = min(fit,[],1);
    end
    
    labels = M_labels(Im,:);
    centroids = M(:,:,Im);
        
end
    
    

Contact us