Path: news.mathworks.com!not-for-mail
From: "Per Sundqvist" <per.sundqvist@uam.es>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Is this kind of regression possible?
Date: Fri, 30 Nov 2007 11:56:04 +0000 (UTC)
Organization: Chalmers Tekniska H&#246;gskola
Lines: 67
Message-ID: <fiotok$a9s$1@fred.mathworks.com>
References: <fio8nm$6k4$1@fred.mathworks.com>
Reply-To: "Per Sundqvist" <per.sundqvist@uam.es>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1196423764 10556 172.30.248.37 (30 Nov 2007 11:56:04 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Fri, 30 Nov 2007 11:56:04 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 266682
Xref: news.mathworks.com comp.soft-sys.matlab:440195


"Vivek " <vivek_mutalik@yahoo.com> wrote in message
<fio8nm$6k4$1@fred.mathworks.com>...
> Hi,
> 
> I m having difficulty in formulating following problem. If
> you have any suggestions that'll be great.
> 
> Ive set of "aligned DNA sequences" with their activities. I
> want to do regression so that i can get weights for each
> base (A,C,G,T). This may help me in understanding which
> bases are 'important and contribute' towards measured
activity. 
> Example: My activity VS sequence table looks like
> (1) 08 ACAG
> (2) 10 ATTC
> (3) 05 GGTA
> (4) 04 CCGT
> (5) ... ....
>    ....etc
> 
> My solution would be: to minimize the residual sum of square:
> (here W is weight of that particular base, which is what im
> trying to estimate)
> 
> = [8 - (W1A + W2C + W3A + W4G)]^2 + [10 - (W1A + W2T + W3T
> +W4C)]^2 + and so on.
> 
> to reduce the parameters to be determined, I can substract
> weight of 'T' from each of weights and finally add sum of
> all 'T' (as if 'T' is in all positions).
> 
> That is:
> 
> = [8 - (W1A-W1T + W2C-W2T + W3A-W3T + W4G-W4T) +
> (W1T+W2T+W3T+W4T)]^2
>  +
> [10 - (W1A-W1T + W4C -W1T) + (W1T+W2T+W3T+W4T) ]^2
> 
> + so on;
> 
> Is it making any sense? So by this way, i was thinking of
> getting weights for all bases by using some kind of residual
> minimizing function. Is it possible ?

F=[8 - (W1A + W2C + W3A + W4G)]^2 + [10 - (W1A + W2T + W3T
   +W4C)]^2 + and so on.

dF/dW1A=0, etc...

8+10=(1+1)*W1A+1*W2C+1*W3A+...

b=A*w

Hmm, this minimization looks like it is equal to solving a
linear equation Aw=b. You have 16 unknown right? W1A W2A W3A
W4A,W1C W2C,... =w, the unknown vector. So you need 16
equation at least to get these 16 weights. If you have more
equations you take the backslash-least square solution. You
will get a matrix A, which you have to work out by some
clever way, depending on how your data is arranged. (1+1)
shoul be replaced by the number of sequences that have A at
the first position, and in the element of b you sum the
values of these 8+10+....

Maby it hels you a little,
Per