Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
QT clustering implementation/details?

Subject: QT clustering implementation/details?

From: Misha Koshelev

Date: 19 Jan, 2010 23:23:02

Message: 1 of 6

Dear Sirs or Madams:

I am trying to find and/or implement the Quality Threshold clustering algorithm in MATLAB:
http://en.wikipedia.org/wiki/Cluster_analysis#QT_clustering_algorithm

Actually, any clustering algorithm that results in the same clustering given initial inputs is good for me.

Something easy & fast would be best.

Any tips much appreciated.

Thank you!

Sincerely yours
Misha Koshelev

Subject: QT clustering implementation/details?

From: misha680

Date: 19 Jan, 2010 23:50:01

Message: 2 of 6

Here is what I have found so far:
R has a version
http://rss.acs.unt.edu/Rdoc/library/flexclust/html/qtclust.html

Original paper:
http://genome.cshlp.org/content/9/11/1106.full
with this step-by-step diagram
http://genome.cshlp.org/content/9/11/1106/F5.large.jpg

Perl implementation:
http://stackoverflow.com/questions/1973109/how-can-i-implement-qt-quality-threshold-clustering-in-perl

At the moment I am a little confused on how to compute a cluster
diameter...?

Thank you
Sincerely yours
Misha Koshelev

On Jan 19, 5:23 pm, "Misha Koshelev" <mk144...@bcm.edu> wrote:
> Dear Sirs or Madams:
>
> I am trying to find and/or implement the Quality Threshold clustering algorithm in MATLAB:http://en.wikipedia.org/wiki/Cluster_analysis#QT_clustering_algorithm
>
> Actually, any clustering algorithm that results in the same clustering given initial inputs is good for me.
>
> Something easy & fast would be best.
>
> Any tips much appreciated.
>
> Thank you!
>
> Sincerely yours
> Misha Koshelev

Subject: QT clustering implementation/details?

From: Misha Koshelev

Date: 20 Jan, 2010 22:14:04

Message: 3 of 6

misha680 <misha680@gmail.com> wrote in message <6b2d2873-b332-4160-87a3-af2bc7cbd471@34g2000yqp.googlegroups.com>...
> Here is what I have found so far:
> R has a version
> http://rss.acs.unt.edu/Rdoc/library/flexclust/html/qtclust.html
>
> Original paper:
> http://genome.cshlp.org/content/9/11/1106.full
> with this step-by-step diagram
> http://genome.cshlp.org/content/9/11/1106/F5.large.jpg
>
> Perl implementation:
> http://stackoverflow.com/questions/1973109/how-can-i-implement-qt-quality-threshold-clustering-in-perl
>
> At the moment I am a little confused on how to compute a cluster
> diameter...?
>
> Thank you
> Sincerely yours
> Misha Koshelev
>
> On Jan 19, 5:23 pm, "Misha Koshelev" <mk144...@bcm.edu> wrote:
> > Dear Sirs or Madams:
> >
> > I am trying to find and/or implement the Quality Threshold clustering algorithm in MATLAB:http://en.wikipedia.org/wiki/Cluster_analysis#QT_clustering_algorithm
> >
> > Actually, any clustering algorithm that results in the same clustering given initial inputs is good for me.
> >
> > Something easy & fast would be best.
> >
> > Any tips much appreciated.
> >
> > Thank you!
> >
> > Sincerely yours
> > Misha Koshelev

Ok here's my implementation using Euclidean distance:
function idx=qtclusteuclid(G,d,D)
% QT clustering algorithm as described in:
%
% Heyer, L. J., Kruglyak, S., Yooseph, S. (1999). Exploring expression
% data: Identification and analysis of coexpressed genes. Genome Research
% 9, 1106–1115.
%
% http://genome.cshlp.org/content/9/11/1106.full
% http://genome.cshlp.org/content/9/11/1106/F5.large.jpg
%
% if two sets A{i} have same cardinality, we pick one with smallest diameter
% our distance measure is Euclidean distance
%
% input:
% G-nxp data to cluster
% d-diameter threshold
% D-Euclidean distance for all 0<i<j<=n
%
% output:
% idx-nx1 vector containing cluster indices
%
% Misha Koshelev
% January 20th, 2009
% Montague Laboratory
    
    n=size(G,1);
    if n<=1
        idx=ones(n,1);
        return;
    end
    if nargin<3
        D=Inf*ones(n,n);
        for i=1:n
            D(i,i)=0;
            for j=i+1:n
                D(i,j)=sqrt(sum((G(i,:)-G(j,:)).^2));D(j,i)=D(i,j);
            end
        end
    end

    C=[];Ccard=0;Cdiam=0;
    for i=1:n
        flag=true;
        A=[i];Acard=1;Adiam=0;
        while flag&&length(A)<n
            pts=1:n;pts(A)=[];
            jdiam=zeros(length(pts),1);
            for pidx=1:length(pts)
                % We only need to compute maximal distance from new point j to all
                % existing points in cluster
                jdiam(pidx)=max(D(pts(pidx),A));
            end
            [minjdiam,pidx]=min(jdiam);j=pts(pidx);
            if sum(jdiam==minjdiam)>1
                dbstack;keyboard;
            end
            
            if max(Adiam,minjdiam)>d
                flag=false;
            else
                A=[A,j];
                Acard=Acard+1;
                Adiam=max(Adiam,minjdiam);
            end
        end

        A=sort(A);
        if Acard>Ccard
            C=A;
            Ccard=Acard;
            Cdiam=Adiam;
        end
    end

    idx=ones(n,1);
    GmC=1:n;GmC(C)=[];
    idx(GmC)=qtclusteuclid(G(GmC,:),d,D(GmC,GmC))+1;
    
function d=diam(G,clust)
% http://thesaurus.maths.org/mmkb/entry.html?action=entryByConcept&id=3279
% The largest distance between any two points in a set is called the set's diameter.
%
% input:
% G-nxp data for our cluster
% clust-vector of row indices into G
%
    
    dsqr=0;
    for k=1:length(clust)
        for k2=k:length(clust)
            dsqr=max(dsqr,sum((G(clust(k),:)-G(clust(k2),:)).^2));
        end
    end
    d=sqrt(dsqr);
    

Subject: QT clustering implementation/details?

From: Misha Koshelev

Date: 20 Jan, 2010 22:20:21

Message: 4 of 6

Oops should be this sorry :(

function idx=qtclusteuclid(G,d,D)
% QT clustering algorithm as described in:
%
% Heyer, L. J., Kruglyak, S., Yooseph, S. (1999). Exploring expression
% data: Identification and analysis of coexpressed genes. Genome Research
% 9, 1106–1115.
%
% http://genome.cshlp.org/content/9/11/1106.full
% http://genome.cshlp.org/content/9/11/1106/F5.large.jpg
%
% if two sets A{i} have same cardinality, we pick first one
% our distance measure is Euclidean distance
%
% input:
% G-nxp data to cluster
% d-diameter threshold
% D-Euclidean distance for all 0<i<j<=n
%
% output:
% idx-nx1 vector containing cluster indices
%
% Misha Koshelev
% January 20th, 2009
% Montague Laboratory
    
    n=size(G,1);
    if n<=1
        idx=ones(n,1);
        return;
    end
    if nargin<3
        D=Inf*ones(n,n);
        for i=1:n
            D(i,i)=0;
            for j=i+1:n
                D(i,j)=sqrt(sum((G(i,:)-G(j,:)).^2));D(j,i)=D(i,j);
            end
        end
    end

    C=[];Ccard=0;Cdiam=0;
    for i=1:n
        flag=true;
        A=[i];Acard=1;Adiam=0;
        while flag&&length(A)<n
            pts=1:n;pts(A)=[];
            jdiam=zeros(length(pts),1);
            for pidx=1:length(pts)
                % We only need to compute maximal distance from new point j to all
                % existing points in cluster
                jdiam(pidx)=max(D(pts(pidx),A));
            end
            [minjdiam,pidx]=min(jdiam);j=pts(pidx);
            if sum(jdiam==minjdiam)>1
                dbstack;keyboard;
            end
            
            if max(Adiam,minjdiam)>d
                flag=false;
            else
                A=[A,j];
                Acard=Acard+1;
                Adiam=max(Adiam,minjdiam);
            end
        end

        A=sort(A);
        if Acard>Ccard
            C=A;
            Ccard=Acard;
            Cdiam=Adiam;
        end
    end

    idx=ones(n,1);
    GmC=1:n;GmC(C)=[];
    idx(GmC)=qtclusteuclid(G(GmC,:),d,D(GmC,GmC))+1;
    
function d=diam(G,clust)
% http://thesaurus.maths.org/mmkb/entry.html?action=entryByConcept&id=3279
% The largest distance between any two points in a set is called the set's diameter.
%
% input:
% G-nxp data for our cluster
% clust-vector of row indices into G
%
    
    dsqr=0;
    for k=1:length(clust)
        for k2=k:length(clust)
            dsqr=max(dsqr,sum((G(clust(k),:)-G(clust(k2),:)).^2));
        end
    end
    d=sqrt(dsqr);
    

Subject: QT clustering implementation/details?

From: Misha Koshelev

Date: 20 Jan, 2010 22:59:04

Message: 5 of 6

Actually here is three different versions:
http://people.hnl.bcm.edu/misha/tmp/qtclust.zip

I posted to File Exchange. Will take above link down at some point.

QT clustering algorithm as described in:

Heyer, L. J., Kruglyak, S., Yooseph, S. (1999). Exploring expression data: Identification and analysis of coexpressed genes. Genome Research 9, 1106–1115.

http://genome.cshlp.org/content/9/11/1106.full
http://genome.cshlp.org/content/9/11/1106/F5.large.jpg

qtclusteuclid.m - use Euclidean distance as measure
qtclustjacknife.m - use Jackknife Correlation (see article above)
qtclustmod.m - use Euclidean distance, modified to always return 2 clusters.

Misha

Subject: QT clustering implementation/details?

From: Misha Koshelev

Date: 21 Jan, 2010 16:23:05

Message: 6 of 6

Update: on File Exchange

Latest version:
http://www.mathworks.com/matlabcentral/fileexchange/26432

Only Euclidean distance:
http://www.mathworks.com/matlabcentral/fileexchange/26431

Thank you!
Misha

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us