Code covered by the BSD License  

Highlights from
DM Utils (data mining utils)

image thumbnail

DM Utils (data mining utils)

by

 

15 Jan 2012 (Updated )

The tools for dealing with distance matrix, improving data mining capabilities

P=pair_dist_seq(X,fun,varargin)
function P=pair_dist_seq(X,fun,varargin)
% sequential version of paiwise distance similar to PDIST function
% calculates out=pair_dist_seq(X,fun,parameters)
% input data:
%
%   X - input data, rows of X correspond to observations,
%       columns correspond to variables; fun - distance function
%       (single row vs row !!!),
%   fun - one of predefined distance functions 'manhattan', 'euclidean',
%       'cosine','minp_manhattan', 'minkowski' with parameter p (order) 
%       or a handler to a distance function of a form d=fun(x,y,params),
%       where:
%           -params is a variable list of parameters,
%           -x and y are single row vectors it is DIFFERENT to the pdist.
%   varargin - a list of parameters passed directly as params of a distance
%       function
%
% The distance functions works faster if given as function handler
% f.e using name2fun(name) function - anonymous handler.
% If no fun is given the 'euclidean' is assumed as default.
% The function is used by pair_dist_par when no parallel toolbox is installed
% or no pool of workers is avaialble
%
% Copyright 2011 - P. Skurowski
% Author : P. Skurowski
% Place: Institute of informatics, Silesian Univ. of  Technology
%   v. 0.2 - Minkowski distance added, modified minperm (faster)
%   v. 1.0 - final public version
% See also PAIR_DIST_PAR, NAME2FUN, PDIST

if nargin < 2
    fun='euclidean';
end
if ~isa(fun, 'function_handle')
    if isstr(fun)
        fun=name2fun(fun);
    else
        error('Provide a handler to distfun(x,y,params)or one of builtin names')
    end
end
% tic;
m=size(X,1); % M=(m.^2-m)/2;

P=cell(1,m-1);
% init=toc;tic;
for i=1:m-1
    tmp=double(nan(1,m-i));
    idx=0;
    xi=X(i,:);
    for j=(i+1):m
        
        idx=idx+1;
        xj=X(j,:);
        tmp(idx)=feval(fun,xi,xj,varargin{:}); %expands
    end
    P{i}=tmp;
end
p=toc;tic;
clear tmp

P=cat(2,P{1,:}); % P=cell2mat(P);
% finisz=toc;
% if nargout>1
%     timers=[init, finisz, p];
% end

end

function handler=name2fun(fun)

switch fun
    case 'manhattan'
        handler = @(x,y)sum( abs(x-y) );  % manhattan (minkowski 1)
    case 'euclidean'
        handler = @(x,y)sqrt( sum( (x-y).^2 )); % euclid (minkowski 2)
    case 'cosine'
        handler = @(x,y)1-x*y'/sqrt(x*x')/sqrt(y*y'); % cosine distance
    case 'minp_manhattan' %minimal permutation manhattan dist
        handler = @(x,y)min(sum(abs(bsxfun(@minus,x(perms(1:5)),y)),2)); 
        %handler = @(x,y)min(sum(abs(y(perms(1:5))- ...
        %    x(ones(factorial(length(x)), 1),:)),2));  
    case 'minkowski'
        handler = @(x,y,p)( sum( abs(x-y).^p )).^(1/p); % Minkowski
    otherwise
        error('Unknown method.');
end

end

Contact us