Thread Subject: Feedback for GENEONTOLOGYDEMO in Bioinformatics Toolbox

Subject: Feedback for GENEONTOLOGYDEMO in Bioinformatics Toolbox

From: kmouts@hotmail.com

Date: 26 Jun, 2008 06:09:33

Message: 1 of 2

Hello,

I have 3 remarks/questions concerning the demo in "Gene Ontology Enrichment in Microarray Data":


At the cell script: %% Looking at Probability of Gene Ontology Annotation,
the hypergeometric probability function is being calculated:
pvalues = hygepdf(genesclustercount,max(geneschipcount), max(genesclustercount),geneschipcount);

Alas, at the reference [4] Gentleman, R. 'Using GO for Statistical Analyses'. Bioconductor vignette May 16, 2005 http://bioconductor.org/docs/vignettes.html
it analyzes the use of hypergeometric distribution having N the total number of genes, and m the interesting ones. Not max(geneschipcount) and max(genesclustercount) , which is respectively the maximum occurrence of genes in any ontology node, and the maximum occurence of interesting genes in any ontology node. How this difference is justified?

    2. At the cell script: %% "Finding Annotated Genes From the Microarray"
the correspondent GO terms are found for evey gene, but also the relative GO terms with the command:
      goid = getrelatives(GO,goid);
why is this so? Doing this, we get GO terms for a gene not only the more general ones (1 generation of ancestors), but the more specific too (1 generation of descendants). What is the logic of this?

    3. Is there any way we could limit the retrieval of ancestors up to a defined level (e.g. only 3 generations upwards)?

Konstantinos

Subject: Feedback for GENEONTOLOGYDEMO in Bioinformatics Toolbox

From: Lucio Andrade-Cetto

Date: 26 Jun, 2008 16:26:02

Message: 2 of 2

1. You are correct, we have already noticed this. However,
after running some experiments we determined that the
difference should not influence the final results. With the
data of the demo you’ll notice that the same branch of the
ontology appears by both methods, but when using max
(instead of sum) you can notice that other branches in the
ontology that may be significant also appear. The baseline
distribution for gene ontology enrichment is still a
research problem; this paper explains the problem a little
further:
http://nar.oxfordjournals.org/cgi/content/abstract/35/suppl_
1/D322

2. This is just an alternative way of propagating the
evidence, there are many methods to do this, in this case
you may also be interested about detecting more ‘specific’
terms that are statistically significant, recall that GO is
acyclic but not a tree, so there may be a lower term that
shows up when two or more ancestors are significant.

3. Yes, the options HEIGHT and DEPTH in GETANCESTORS and
GETDESCENDANTS respectively do this. Please note that there
also exist other input arguments, such as ‘RELATIONTYPE’
and ‘EXCLUDE’, that can help you to control differently the
way in which the evidence is propagated through the
ontology.

Thanks for your feedback.
Lucio Cetto

"kmouts@hotmail.com" <kmouts@hotmail.com> wrote in message
<28181489.1214460606717.JavaMail.jakarta@nitrogen.mathforum.
org>...
> Hello,
>
> I have 3 remarks/questions concerning the demo in "Gene
Ontology Enrichment in Microarray Data":
>
>
> At the cell script: %% Looking at Probability of Gene
Ontology Annotation,
> the hypergeometric probability function is being
calculated:
> pvalues = hygepdf(genesclustercount,max(geneschipcount),
max(genesclustercount),geneschipcount);
>
> Alas, at the reference [4] Gentleman, R. 'Using GO for
Statistical Analyses'. Bioconductor vignette May 16, 2005
http://bioconductor.org/docs/vignettes.html
> it analyzes the use of hypergeometric distribution having
N the total number of genes, and m the interesting ones.
Not max(geneschipcount) and max(genesclustercount) , which
is respectively the maximum occurrence of genes in any
ontology node, and the maximum occurence of interesting
genes in any ontology node. How this difference is
justified?
>
> 2. At the cell script: %% "Finding Annotated Genes
From the Microarray"
> the correspondent GO terms are found for evey gene, but
also the relative GO terms with the command:
> goid = getrelatives(GO,goid);
> why is this so? Doing this, we get GO terms for a gene
not only the more general ones (1 generation of ancestors),
but the more specific too (1 generation of descendants).
What is the logic of this?
>
> 3. Is there any way we could limit the retrieval of
ancestors up to a defined level (e.g. only 3 generations
upwards)?
>
> Konstantinos

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Public Submission Policy

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.

Contact us at files@mathworks.com