Path: news.mathworks.com!not-for-mail
From: "Vivek " <vivek_mutalik@yahoo.com>
Newsgroups: comp.soft-sys.matlab
Subject: Is this kind of regression possible?
Date: Fri, 30 Nov 2007 05:57:10 +0000 (UTC)
Organization: University of California, San Francisco
Lines: 40
Message-ID: <fio8nm$6k4$1@fred.mathworks.com>
Reply-To: "Vivek " <vivek_mutalik@yahoo.com>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1196402230 6788 172.30.248.37 (30 Nov 2007 05:57:10 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Fri, 30 Nov 2007 05:57:10 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 998324
Xref: news.mathworks.com comp.soft-sys.matlab:440151


Hi,

I m having difficulty in formulating following problem. If
you have any suggestions that'll be great.

Ive set of "aligned DNA sequences" with their activities. I
want to do regression so that i can get weights for each
base (A,C,G,T). This may help me in understanding which
bases are 'important and contribute' towards measured activity. 
Example: My activity VS sequence table looks like
(1) 08 ACAG
(2) 10 ATTC
(3) 05 GGTA
(4) 04 CCGT
(5) ... ....
   ....etc

My solution would be: to minimize the residual sum of square:
(here W is weight of that particular base, which is what im
trying to estimate)

= [8 - (W1A + W2C + W3A + W4G)]^2 + [10 - (W1A + W2T + W3T
+W4C)]^2 + and so on.

to reduce the parameters to be determined, I can substract
weight of 'T' from each of weights and finally add sum of
all 'T' (as if 'T' is in all positions).

That is:

= [8 - (W1A-W1T + W2C-W2T + W3A-W3T + W4G-W4T) +
(W1T+W2T+W3T+W4T)]^2
 +
[10 - (W1A-W1T + W4C -W1T) + (W1T+W2T+W3T+W4T) ]^2

+ so on;

Is it making any sense? So by this way, i was thinking of
getting weights for all bases by using some kind of residual
minimizing function. Is it possible ?