Path: news.mathworks.com!not-for-mail
From: "Vivek " <vivek_mutalik@yahoo.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Is this kind of regression possible?
Date: Fri, 30 Nov 2007 15:22:52 +0000 (UTC)
Organization: University of California, San Francisco
Lines: 81
Message-ID: <fip9sc$n3e$1@fred.mathworks.com>
References: <fio8nm$6k4$1@fred.mathworks.com> <fiotok$a9s$1@fred.mathworks.com>
Reply-To: "Vivek " <vivek_mutalik@yahoo.com>
NNTP-Posting-Host: webapp-03-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1196436172 23662 172.30.248.38 (30 Nov 2007 15:22:52 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Fri, 30 Nov 2007 15:22:52 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 998324
Xref: news.mathworks.com comp.soft-sys.matlab:440221


"Per Sundqvist" <per.sundqvist@uam.es> wrote in message
<fiotok$a9s$1@fred.mathworks.com>...
> "Vivek " <vivek_mutalik@yahoo.com> wrote in message
> <fio8nm$6k4$1@fred.mathworks.com>...
> > Hi,
> > 
> > I m having difficulty in formulating following problem. If
> > you have any suggestions that'll be great.
> > 
> > Ive set of "aligned DNA sequences" with their activities. I
> > want to do regression so that i can get weights for each
> > base (A,C,G,T). This may help me in understanding which
> > bases are 'important and contribute' towards measured
> activity. 
> > Example: My activity VS sequence table looks like
> > (1) 08 ACAG
> > (2) 10 ATTC
> > (3) 05 GGTA
> > (4) 04 CCGT
> > (5) ... ....
> >    ....etc
> > 
> > My solution would be: to minimize the residual sum of
square:
> > (here W is weight of that particular base, which is what im
> > trying to estimate)
> > 
> > = [8 - (W1A + W2C + W3A + W4G)]^2 + [10 - (W1A + W2T + W3T
> > +W4C)]^2 + and so on.
> > 
> > to reduce the parameters to be determined, I can substract
> > weight of 'T' from each of weights and finally add sum of
> > all 'T' (as if 'T' is in all positions).
> > 
> > That is:
> > 
> > = [8 - (W1A-W1T + W2C-W2T + W3A-W3T + W4G-W4T) +
> > (W1T+W2T+W3T+W4T)]^2
> >  +
> > [10 - (W1A-W1T + W4C -W1T) + (W1T+W2T+W3T+W4T) ]^2
> > 
> > + so on;
> > 
> > Is it making any sense? So by this way, i was thinking of
> > getting weights for all bases by using some kind of residual
> > minimizing function. Is it possible ?
> 
> F=[8 - (W1A + W2C + W3A + W4G)]^2 + [10 - (W1A + W2T + W3T
>    +W4C)]^2 + and so on.
> 
> dF/dW1A=0, etc...
> 
> 8+10=(1+1)*W1A+1*W2C+1*W3A+...
> 
> b=A*w
> 
> Hmm, this minimization looks like it is equal to solving a
> linear equation Aw=b. You have 16 unknown right? W1A W2A W3A
> W4A,W1C W2C,... =w, the unknown vector. So you need 16
> equation at least to get these 16 weights. If you have more
> equations you take the backslash-least square solution. You
> will get a matrix A, which you have to work out by some
> clever way, depending on how your data is arranged. (1+1)
> shoul be replaced by the number of sequences that have A at
> the first position, and in the element of b you sum the
> values of these 8+10+....
> 
> Maby it hels you a little,
> Per
> 
-----------------------------
You are absolutely correct.  W is unknown vector. I cant
have more unknowns than number of equations for
backslash-least square solution. Thats why i was thinking of
reducing my variables by taking T as zero. I was thinking of
adding all 8+10+ ...wont give right answer (do u think  it
will?. If i keep them in one column vector (b), and ive
matrix A representing occurance of letters. i need to see
how will I understand which weight corresponds to which
position. Thanks for replying.