Path: news.mathworks.com!not-for-mail
From: "vicky " <vivek_mutalik@yahoo.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Is this kind of regression possible?
Date: Sat, 1 Dec 2007 16:19:41 +0000 (UTC)
Organization: University of California, San Francisco
Lines: 120
Message-ID: <fis1it$ldn$1@fred.mathworks.com>
References: <fio8nm$6k4$1@fred.mathworks.com> <fiqisu$l62$1@fred.mathworks.com> <fiqkd0$a9h$1@fred.mathworks.com> <fiqmvs$c6o$1@fred.mathworks.com>
Reply-To: "vicky " <vivek_mutalik@yahoo.com>
NNTP-Posting-Host: webapp-03-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1196525981 21943 172.30.248.38 (1 Dec 2007 16:19:41 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Sat, 1 Dec 2007 16:19:41 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 998324
Xref: news.mathworks.com comp.soft-sys.matlab:440351


"Roger Stafford" 
<ellieandrogerxyzzy@mindspring.com.invalid> wrote in 
message <fiqmvs$c6o$1@fred.mathworks.com>...
> "vicky " <vivek_mutalik@yahoo.com> wrote in message 
<fiqkd0$a9h
> $1@fred.mathworks.com>...
> > Hi Roger,
> > 
> > Thanks very much for responding to my query and 
offering to
> > help me out. 
> > 
> > I must say, You have understood my question correctly. 
I
> > appreciate your solution to the problem. Ive tried 
that.
> > taken just binary 1 or 0 representation for each base
> > instead of 1000, 0100, type. Im thankful to Walter 
Roberson
> > who gave me that routine.
> > 
> > Most important issue with my result was that my matrix 
X,
> > which describes these binary representations is not 
full
> > rank. I think some columns are "collinear (or
> > multicollinear)". Matrix Y is my activity vector. So i 
was
> > trying to use PLSR to avoid this collinearity and to 
include
> > full DNA sequence (about 50 letter length), which was 
also
> > not successfull due to lack of correlations. Im still
> > working on that.
> > 
> > Meanwhile, I was thinking of why not just take all 
equations
> > in one residual sum of square equation and solve it for
> > smaller segments of DNA (Not sure if optimization 
toolbox
> > would be helpful in that). I was trying to 
eliminate 'T' so
> > that i can have less variables to handle. 
> > Someone suggested that this may be easy in 
mathematica. I
> > dont understand how will that be. If it is easy, then 
Matlab
> > will be my choice.
> > Thanks again. Do send your comments.
> > Regards,
> > Vivek
> --------
>   First of all, my apologies for the multiplicity of 
copies of my previous reply; I 
> think there were seven in all, much to my disgust.  I 
waited for several 
> minutes to elicit a response from the MathWorks' 
newsreader and then 
> clicked the "post message" button again.  After many 
minutes I clicked once 
> more, but I can't imagine where seven copies came from &#8211; 
unless possibly my 
> mouse click bounced repeatedly.
> 
>   Back to the subject at hand, when you said "taken just 
binary 1 or 0 
> representation for each base instead of 1000, 0100, 
type" you gave me the 
> impression that you interpreted the sequences like 1 0 0 
0 or 0 1 0 0 as 
> being single binary scalars.  That isn't what I meant.  
These are to be four 
> distinct elements with each element a 0 or 1.  This 
makes M have 4*n 
> columns and W have 4*n rows.
> 
>   I don't understand what you mean in your final 
paragraph, "Meanwhile, I was 
> thinking of why not just take all equations in one 
residual sum of square 
> equation and solve it for smaller segments of DNA".  I 
see no particular 
> reason for taking smaller segments.  What is there to 
gain by that?
> 
> Roger Stafford
>-----------------------------------------

Thanks for ur reply. 
I know what you meant. I`ve taken distinct zero's and 
one's for each element. That is generating 1 or 0 for each 
A, C G and T. So I have 4*N combinations, where N is 
length of elements. My matrix looks like:

     A C G T A C G T A C G T A C G T
M = [1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0; % ACAG
     1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0; % ATTC
     0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0; % GGTA
     0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1; % CCGT
     ......
In this case I generate binaries for Full length of 50 
letters per row (not just 4). This can be only solved by 
PLSR or PCA. But if use only 4 elements (16 columns)like 
in this matrix, then i can use backlash (regress) function 
to solve, as my to total number of rows is more than 40. 
So this matrix if 40*16 matrix for regress.

meanwhile, to see whether i can solve this "one equation 
of residual sum of sqare" i take smaller segments, 
1. To see whether 'activity' (response) can be explained 
by such smaller segments (rather than full 50 elements)
2. IF i take more variables (elements, columns) than 
response (activities) then i'll have multiple solutions. 
Since in this case rows < columns.

Whole idea is to get "weights" per elements which should 
desribe my activity. Do you have any alternate suggestions 
for this than what im trying to do?