Thread Subject: Alternate to Regression

Subject: Alternate to Regression

From: vicky

Date: 25 Jan, 2008 07:19:03

Message: 1 of 14

Hi,
I have a matrix X of independent variables and Vector Y of
experimental data. Matrix X is not a full rank and hence i
was using PINV function than backslash for multiple linear
regression. Results are not that encouraging.

Can anyone suggest if i can use Maximum likelihood
estimation or any other technique for parameter estimation?

Thanks

Subject: Alternate to Regression

From: John D'Errico

Date: 25 Jan, 2008 11:48:02

Message: 2 of 14

"vicky " <vivek_mutalik@yahoo.com> wrote in message
<fnc2h7$m30$1@fred.mathworks.com>...
> Hi,
> I have a matrix X of independent variables and Vector Y of
> experimental data. Matrix X is not a full rank and hence i
> was using PINV function than backslash for multiple linear
> regression. Results are not that encouraging.
>
> Can anyone suggest if i can use Maximum likelihood
> estimation or any other technique for parameter estimation?

I think it is almost impossible to expect that
ML will improve this sow's ear into a silk purse.

Get better data. Or make your model simpler,
to reflect your inadequate data.

John

Subject: Alternate to Regression

From: Greg Heath

Date: 25 Jan, 2008 15:29:05

Message: 3 of 14

On Jan 25, 2:19 am, "vicky " <vivek_muta...@yahoo.com> wrote:
> Hi,
> I have a matrix X of independent variables and Vector Y of
> experimental data. Matrix X is not a full rank and hence i
> was using PINV function than backslash for multiple linear
> regression. Results are not that encouraging.

How can we help if we don't know the salient details?

size(X),size(Y),rank(X),rank(Y),rank([X Y])(or [X;Y]),
Rsquare,...

BACKSLASH can yield solutions which are more satisfying
than those obtained with PINV.

However, the best advice (see John's post) is to either get
more measurements or get rid of the redundant and/or
irrelevant variables.

I recently posted a reply that suggested how to do the latter.
Search on

greg-heath STEPWISEFIT

and sort by date.

> Can anyone suggest if i can use Maximum likelihood
> estimation or any other technique for parameter estimation?

You can use it but don't expect to get better results.

Hope this helps.

Greg

Subject: Alternate to Regression

From: vicky

Date: 25 Jan, 2008 18:15:04

Message: 4 of 14

"John D'Errico" <woodchips@rochester.rr.com> wrote in
message <fnci9i$2bg$1@fred.mathworks.com>...
> "vicky " <vivek_mutalik@yahoo.com> wrote in message
> <fnc2h7$m30$1@fred.mathworks.com>...
> > Hi,
> > I have a matrix X of independent variables and Vector Y of
> > experimental data. Matrix X is not a full rank and hence i
> > was using PINV function than backslash for multiple linear
> > regression. Results are not that encouraging.
> >
> > Can anyone suggest if i can use Maximum likelihood
> > estimation or any other technique for parameter estimation?
>
> I think it is almost impossible to expect that
> ML will improve this sow's ear into a silk purse.
>
> Get better data. Or make your model simpler,
> to reflect your inadequate data.
>
> John

Thanks John for your suggestion. Point noted down. I cant
change my data, but can think of simpler model. As Im
handling on of the troubled problems in bioinformatics
(promoter analysis), I wont have very simple solution i think.


Subject: Alternate to Regression

From: vicky

Date: 25 Jan, 2008 18:18:01

Message: 5 of 14

Greg Heath <heath@alumni.brown.edu> wrote in message
<fa844f0e-b212-4e51-8933-1476708ddec5@v4g2000hsf.googlegroups.com>...
> On Jan 25, 2:19 am, "vicky " <vivek_muta...@yahoo.com> wrote:
> > Hi,
> > I have a matrix X of independent variables and Vector Y of
> > experimental data. Matrix X is not a full rank and hence i
> > was using PINV function than backslash for multiple linear
> > regression. Results are not that encouraging.
>
> How can we help if we don't know the salient details?
>
> size(X),size(Y),rank(X),rank(Y),rank([X Y])(or [X;Y]),
> Rsquare,...
>
> BACKSLASH can yield solutions which are more satisfying
> than those obtained with PINV.
>
> However, the best advice (see John's post) is to either get
> more measurements or get rid of the redundant and/or
> irrelevant variables.
>
> I recently posted a reply that suggested how to do the latter.
> Search on
>
> greg-heath STEPWISEFIT
>
> and sort by date.
>
> > Can anyone suggest if i can use Maximum likelihood
> > estimation or any other technique for parameter estimation?
>
> You can use it but don't expect to get better results.
>
> Hope this helps.
>
> Greg
>
Thanks Greg for your suggestion. I'll try Stepwisefit.
The other details u asked are as follows.

size(X) = 96 51
size(Y) = 96
rank(X) = 1
rank(Y) = 39
rank([X Y]) = 40
R-square = 0.6
Adjusted R-square = 0.08
Correlation coefficient = 0.75

Please suggest if you have comments!
Thanks again for discussing.
> Rsquare

Subject: Alternate to Regression

From: Greg Heath

Date: 26 Jan, 2008 01:37:33

Message: 6 of 14

On Jan 25, 1:18 pm, "vicky " <vivek_muta...@yahoo.com> wrote:
> Greg Heath <he...@alumni.brown.edu> wrote in message
>
> <fa844f0e-b212-4e51-8933-1476708dd...@v4g2000hsf.googlegroups.com>...
>
> > On Jan 25, 2:19 am, "vicky " <vivek_muta...@yahoo.com> wrote:
> > > Hi,
> > > I have a matrix X of independent variables and Vector Y of
> > > experimental data. Matrix X is not a full rank and hence i
> > > was using PINV function than backslash for multiple linear
> > > regression. Results are not that encouraging.
>
> > How can we help if we don't know the salient details?
>
> > size(X),size(Y),rank(X),rank(Y),rank([X Y])(or [X;Y]),
> > Rsquare,...
>
> > BACKSLASH can yield solutions which are more satisfying
> > than those obtained with PINV.
>
> > However, the best advice (see John's post) is to either get
> > more measurements or get rid of the redundant and/or
> > irrelevant variables.
>
> > I recently posted a reply that suggested how to do the latter.
> > Search on
>
> > greg-heath STEPWISEFIT
>
> > and sort by date.
>
> > > Can anyone suggest if i can use Maximum likelihood
> > > estimation or any other technique for parameter estimation?
>
> > You can use it but don't expect to get better results.
>
> Thanks Greg for your suggestion. I'll try Stepwisefit.
> The other details u asked are as follows.
>
> size(X) = 96 51
> size(Y) = 96
> rank(X) = 1
> rank(Y) = 39
> rank([X Y]) = 40

Correction

size(Y) = [96 1]
rank(Y) = 1 % Obviously(you only have 1 output)

size(X) = [96 51]
rank(X) = 39 % At least 12 variables are irrelevant or
redundant
rank([X Y]) = 40

Uh-oh.

rank([X Y]) > rank(X) means that X and Y are not linearly dependent!

Therefore, a linear model (in coefficients) that is also linear in
the
variables is questionable.

> R-square = 0.6

NMSEb = 0.4 % Biased normalized mean-square error.

> Adjusted R-square = 0.08

NMSEu = 0.92 % Unbiased normalized mean-square error.

Almost as bad as using the constant model Y = mean(Y) where
NMSEu = 1!

> Correlation coefficient = 0.75

I thought CC = sqrt(Rsquare) = 0.77

 ... or is my equation just a rule of thumb?

> Please suggest if you have comments!

Try a linear model that is second order in the variables by creating
Xnew with squares and cross-products. If

rank[Xnew Y] > rank(X)

then try nonlinear modelling (e.g., neural networks, projection
pursuit,...). However, run STEPWISEFIT as described in my
previous post on X and Xnew to reduce the input dimensionality
before trying to design a neural net.

Hope this helps.

Greg

Subject: Alternate to Regression

From: Greg Heath

Date: 26 Jan, 2008 15:34:45

Message: 7 of 14

On Jan 25, 8:37=A0pm, Greg Heath <he...@alumni.brown.edu> wrote:
> On Jan 25, 1:18 pm, "vicky " <vivek_muta...@yahoo.com> wrote:
>
> >Greg Heath<he...@alumni.brown.edu> wrote in message
>
> > <fa844f0e-b212-4e51-8933-1476708dd...@v4g2000hsf.googlegroups.com>...
>
> > > On Jan 25, 2:19 am, "vicky " <vivek_muta...@yahoo.com> wrote:
> > > > Hi,
> > > > I have a matrix X of independent variables and Vector Y of
> > > > experimental data. Matrix X is not a full rank and hence i
> > > > was using PINV function than backslash for multiple linear
> > > > regression. Results are not that encouraging.
>
> > > How can we help if we don't know the salient details?
>
> > > size(X),size(Y),rank(X),rank(Y),rank([X Y])(or [X;Y]),
> > > Rsquare,...
>
> > > BACKSLASH can yield solutions which are more satisfying
> > > than those obtained with PINV.
>
> > > However, the best advice (see John's post) is to either get
> > > more measurements or get rid of the redundant and/or
> > > irrelevant variables.
>
> > > I recently posted a reply that suggested how to do the latter.
> > > Search on
>
> > >greg-heathSTEPWISEFIT
>
> > > and sort by date.
>
> > > > Can anyone suggest if i can use Maximum likelihood
> > > > estimation or any other technique for parameter estimation?
>
> > > You can use it but don't expect to get better results.
>
> > Thanks Greg for your suggestion. I'll try Stepwisefit.
> > The other details u asked are as follows.
>
> > size(X) =3D 96 =A051
> > size(Y) =3D 96
> > rank(X) =3D 1
> > rank(Y) =3D 39
> > rank([X Y]) =3D 40
>
> Correction
>
> size(Y) =A0=3D [96 1]
> rank(Y) =3D 1 =A0 =A0 =A0 =A0 % Obviously(you only have 1 output)
>
> size(X) =3D [96 =A051]
> rank(X) =3D 39 =A0 =A0 =A0 =A0 % At least 12 variables are irrelevant or
> redundant

Correction.

Only shows that at least 12 variables are mathematically redundant.

For regression, practical redundancy can be awkwardly quantified
by rank(X,tol) where tol is a user supplied threshold on the size of
the eigenvalues of cov(X). For example, tol =3D 0.01*trace(cov(X)).

However, It would be more convenient if the function RANK were
modified to have an input "pct" that would only include the minimum
number of eigenvalues for sum(eigenvalues) > (pct/100)*trace(cov(X)).

After redundant variables are removed to obtain Xnew, Tests for
irrelevancy of other variables have to be based on [X Y]
correlation information. A peek at the elements of corrcoef([X Y])
can be informative.

Since cov(X) quantifies the spread of all of the data, care
should be used to make inferences about the separability of
classes. Typically, cov(X) should be partitioned into "between-class"
and "within-class" components before making any rash assumptions
on the relevance of variables w.r.t the separability of classes.

> rank([X Y]) =3D 40
>
> Uh-oh.
>
> rank([X Y]) > rank(X) means that X and Y are not linearly dependent!

=2E.. with respect to the default value of tol in RANK.

-----SNIP

Hope this helps.

Greg

Subject: Alternate to Regression

From: vicky

Date: 27 Jan, 2008 00:48:01

Message: 8 of 14

Greg Heath <heath@alumni.brown.edu> wrote in message
<3547420d-4e16-41d5-9401-610bb080f55c@h11g2000prf.googlegroups.com>...
> On Jan 25, 8:37=A0pm, Greg Heath <he...@alumni.brown.edu>
wrote:
> > On Jan 25, 1:18 pm, "vicky " <vivek_muta...@yahoo.com>
wrote:
> >
> > >Greg Heath<he...@alumni.brown.edu> wrote in message
> >
> > >
<fa844f0e-b212-4e51-8933-1476708dd...@v4g2000hsf.googlegroups.com>...
> >
> > > > On Jan 25, 2:19 am, "vicky "
<vivek_muta...@yahoo.com> wrote:
> > > > > Hi,
> > > > > I have a matrix X of independent variables and
Vector Y of
> > > > > experimental data. Matrix X is not a full rank and
hence i
> > > > > was using PINV function than backslash for
multiple linear
> > > > > regression. Results are not that encouraging.
> >
> > > > How can we help if we don't know the salient details?
> >
> > > > size(X),size(Y),rank(X),rank(Y),rank([X Y])(or [X;Y]),
> > > > Rsquare,...
> >
> > > > BACKSLASH can yield solutions which are more satisfying
> > > > than those obtained with PINV.
> >
> > > > However, the best advice (see John's post) is to
either get
> > > > more measurements or get rid of the redundant and/or
> > > > irrelevant variables.
> >
> > > > I recently posted a reply that suggested how to do
the latter.
> > > > Search on
> >
> > > >greg-heathSTEPWISEFIT
> >
> > > > and sort by date.
> >
> > > > > Can anyone suggest if i can use Maximum likelihood
> > > > > estimation or any other technique for parameter
estimation?
> >
> > > > You can use it but don't expect to get better results.
> >
> > > Thanks Greg for your suggestion. I'll try Stepwisefit.
> > > The other details u asked are as follows.
> >
> > > size(X) =3D 96 =A051
> > > size(Y) =3D 96
> > > rank(X) =3D 1
> > > rank(Y) =3D 39
> > > rank([X Y]) =3D 40
> >
> > Correction
> >
> > size(Y) =A0=3D [96 1]
> > rank(Y) =3D 1 =A0 =A0 =A0 =A0 % Obviously(you only have
1 output)
> >
> > size(X) =3D [96 =A051]
> > rank(X) =3D 39 =A0 =A0 =A0 =A0 % At least 12 variables
are irrelevant or
> > redundant
>
> Correction.
>
> Only shows that at least 12 variables are mathematically
redundant.
>
> For regression, practical redundancy can be awkwardly
quantified
> by rank(X,tol) where tol is a user supplied threshold on
the size of
> the eigenvalues of cov(X). For example, tol =3D
0.01*trace(cov(X)).
>
> However, It would be more convenient if the function RANK were
> modified to have an input "pct" that would only include
the minimum
> number of eigenvalues for sum(eigenvalues) >
(pct/100)*trace(cov(X)).
>
> After redundant variables are removed to obtain Xnew,
Tests for
> irrelevancy of other variables have to be based on [X Y]
> correlation information. A peek at the elements of
corrcoef([X Y])
> can be informative.
>
> Since cov(X) quantifies the spread of all of the data, care
> should be used to make inferences about the separability of
> classes. Typically, cov(X) should be partitioned into
"between-class"
> and "within-class" components before making any rash
assumptions
> on the relevance of variables w.r.t the separability of
classes.
>
> > rank([X Y]) =3D 40
> >
> > Uh-oh.
> >
> > rank([X Y]) > rank(X) means that X and Y are not
linearly dependent!
>
> =2E.. with respect to the default value of tol in RANK.
>
> -----SNIP
>
> Hope this helps.
>
> Greg

Hi Greg,

Thanks a ton! Your both suggestions are so-out-of-regular
textbook suggestions, that i cant really thank you enough
for discussing this. I'll have to really absorb your
comments before i reply.

Thanks again

Vivek

Subject: Alternate to Regression

From: vicky

Date: 27 Jan, 2008 00:49:01

Message: 9 of 14

Greg Heath <heath@alumni.brown.edu> wrote in message
<54a23e25-c7fe-45a4-b650-7ec3f8d4e7cb@v4g2000hsf.googlegroups.com>...
> On Jan 25, 1:18 pm, "vicky " <vivek_muta...@yahoo.com> wrote:
> > Greg Heath <he...@alumni.brown.edu> wrote in message
> >
> >
<fa844f0e-b212-4e51-8933-1476708dd...@v4g2000hsf.googlegroups.com>...
> >
> > > On Jan 25, 2:19 am, "vicky " <vivek_muta...@yahoo.com>
wrote:
> > > > Hi,
> > > > I have a matrix X of independent variables and
Vector Y of
> > > > experimental data. Matrix X is not a full rank and
hence i
> > > > was using PINV function than backslash for multiple
linear
> > > > regression. Results are not that encouraging.
> >
> > > How can we help if we don't know the salient details?
> >
> > > size(X),size(Y),rank(X),rank(Y),rank([X Y])(or [X;Y]),
> > > Rsquare,...
> >
> > > BACKSLASH can yield solutions which are more satisfying
> > > than those obtained with PINV.
> >
> > > However, the best advice (see John's post) is to
either get
> > > more measurements or get rid of the redundant and/or
> > > irrelevant variables.
> >
> > > I recently posted a reply that suggested how to do the
latter.
> > > Search on
> >
> > > greg-heath STEPWISEFIT
> >
> > > and sort by date.
> >
> > > > Can anyone suggest if i can use Maximum likelihood
> > > > estimation or any other technique for parameter
estimation?
> >
> > > You can use it but don't expect to get better results.
> >
> > Thanks Greg for your suggestion. I'll try Stepwisefit.
> > The other details u asked are as follows.
> >
> > size(X) = 96 51
> > size(Y) = 96
> > rank(X) = 1
> > rank(Y) = 39
> > rank([X Y]) = 40
>
> Correction
>
> size(Y) = [96 1]
> rank(Y) = 1 % Obviously(you only have 1 output)
>
> size(X) = [96 51]
> rank(X) = 39 % At least 12 variables are irrelevant or
> redundant
> rank([X Y]) = 40
>
> Uh-oh.
>
> rank([X Y]) > rank(X) means that X and Y are not linearly
dependent!
>
> Therefore, a linear model (in coefficients) that is also
linear in
> the
> variables is questionable.
>
> > R-square = 0.6
>
> NMSEb = 0.4 % Biased normalized mean-square
error.
>
> > Adjusted R-square = 0.08
>
> NMSEu = 0.92 % Unbiased normalized mean-square
error.
>
> Almost as bad as using the constant model Y = mean(Y) where
> NMSEu = 1!
>
> > Correlation coefficient = 0.75
>
> I thought CC = sqrt(Rsquare) = 0.77
>
> ... or is my equation just a rule of thumb?
>
> > Please suggest if you have comments!
>
> Try a linear model that is second order in the variables
by creating
> Xnew with squares and cross-products. If
>
> rank[Xnew Y] > rank(X)
>
> then try nonlinear modelling (e.g., neural networks,
projection
> pursuit,...). However, run STEPWISEFIT as described in my
> previous post on X and Xnew to reduce the input dimensionality
> before trying to design a neural net.
>
> Hope this helps.
>
> Greg

Sorry for my typo mistakes. Thanks!

Subject: Alternate to Regression

From: vicky

Date: 29 Jan, 2008 03:19:02

Message: 10 of 14

> rank([X Y]) > rank(X) means that X and Y are not linearly
dependent!
> Therefore, a linear model (in coefficients) that is also
linear in the variables is questionable.
> Try a linear model that is second order in the variables
by creating Xnew with squares and cross-products. If
> rank[Xnew Y] > rank(X)
> then try nonlinear modelling (e.g., neural networks,
projection pursuit,...). However, run STEPWISEFIT as
described in my previous post on X and Xnew to reduce the
input dimensionality before trying to design a neural net.

%**********************
Hi Greg,

To generate Xnew, i used x2fx function, which gives design
matrix for Full quadratic model with Constant, linear,
interaction, and squared terms.
Thus calculated design matrix :
rank[Xnew Y] > rank(X)
Thus may be this is nonlinear by coefficients.

I couldnt understand ur second post, where u mentioned about
 Rank X, tol). Can u please explain? or if u have any
reference for this, that'be great. Thanks again.


Subject: Alternate to Regression

From: Greg Heath

Date: 30 Jan, 2008 07:32:01

Message: 11 of 14

On Jan 28, 10:19=A0pm, "vicky " <vivek_muta...@yahoo.com> wrote:
> > rank([X Y]) > rank(X) means that X and Y are not linearly
> dependent!
> > Therefore, a linear model (in coefficients) that is also
>
> linear in the variables is questionable.> Try a linear model that is secon=
d order in the variables
>
> by creating Xnew with squares and cross-products. If> rank[Xnew Y] =A0 =A0=
> =A0rank(X)
> > then try nonlinear modelling (e.g., neural networks,
>
> projection pursuit,...). However, run STEPWISEFIT as
> described in my previous post on X and Xnew to reduce the
> input dimensionality before trying to design a neural net.
>
> %**********************
> Hi Greg,
>
> To generate Xnew, i used x2fx function, which gives design
> matrix for Full quadratic model with Constant, linear,
> interaction, and squared terms.
> Thus calculated design matrix :
> rank[Xnew Y] =A0 > =A0rank(X)
> Thus may be this is nonlinear by coefficients.
>
> I couldnt understand ur second post, where u mentioned about
> =A0Rank X, tol). Can u please explain? or if u have any
> reference for this, that'be great. =A0Thanks again.


doc rank
help rank

Hope this helps.

Greg

Subject: Alternate to Regression

From: Vassili Pastushenko

Date: 30 Jan, 2008 10:20:04

Message: 12 of 14

"vicky " <vivek_mutalik@yahoo.com> wrote in message
<fnc2h7$m30$1@fred.mathworks.com>...
> Hi,
> I have a matrix X of independent variables and Vector Y of
> experimental data. Matrix X is not a full rank and hence i
> was using PINV function than backslash for multiple linear
> regression. Results are not that encouraging.
>
> Can anyone suggest if i can use Maximum likelihood
> estimation or any other technique for parameter estimation?
>
> Thanks

Hi Vicki,
as I understand, REGRESS does not bother about centering X-Y
data. Did you check whether your data are centered? If not,
you will surely be much more pleased with the results having
centered the data first.

How to do the centering, is also a question, but the
simplest way is just to subtract corresponding mean values.
 

Subject: Alternate to Regression

From: Vassili Pastushenko

Date: 30 Jan, 2008 10:25:04

Message: 13 of 14

"vicky " <vivek_mutalik@yahoo.com> wrote in message
<fnc2h7$m30$1@fred.mathworks.com>...
> Hi,
> I have a matrix X of independent variables and Vector Y of
> experimental data. Matrix X is not a full rank and hence i
> was using PINV function than backslash for multiple linear
> regression. Results are not that encouraging.
>
> Can anyone suggest if i can use Maximum likelihood
> estimation or any other technique for parameter estimation?
>
> Thanks

Hi Vicki,
as I understand, REGRESS does not bother about centering X-Y
data. Did you check whether your data are centered? If not,
you will surely be much more pleased with the results having
centered the data first.

How to do the centering, is also a question, but the
simplest way is just to subtract corresponding mean values.
 

Subject: Alternate to Regression

From: Vassili Pastushenko

Date: 30 Jan, 2008 10:25:04

Message: 14 of 14

"vicky " <vivek_mutalik@yahoo.com> wrote in message
<fnc2h7$m30$1@fred.mathworks.com>...
> Hi,
> I have a matrix X of independent variables and Vector Y of
> experimental data. Matrix X is not a full rank and hence i
> was using PINV function than backslash for multiple linear
> regression. Results are not that encouraging.
>
> Can anyone suggest if i can use Maximum likelihood
> estimation or any other technique for parameter estimation?
>
> Thanks

Hi Vicki,
as I understand, REGRESS does not bother about centering X-Y
data. Did you check whether your data are centered? If not,
you will surely be much more pleased with the results having
centered the data first.

How to do the centering, is also a question, but the
simplest way is just to subtract corresponding mean values.
 

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
maximum likelihood vicky 25 Jan, 2008 02:20:22
regression vicky 25 Jan, 2008 02:20:22
rssFeed for this Thread

Public Submission Policy

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.

Contact us at files@mathworks.com