Linear Regression with Y as your Dependent Variable

6 views (last 30 days)
Howdy! I've had the issue where I have to calculate a non-linear line with y as the dependent variable to make a regression for later steps in a problem. I used polyfit, polyval and it worked the first time(I'm not sure how as the graph was strange but it gave the right values), but now on the more highly non-linear example it breaks. How should I go about doing this?
I thought about inputting the y value as x and visa versa but this introduces problems later on. I've attached a portion of the code that finds the regressions. Please excuse the comments, it's how I take notes as I think.
xD=0.85; xF=0.5; xB=0.03; degree=9; R=1.5; % Stripping operating line: y = a*x + b, where
a=(R/(R+1));b=xD/(R+1);
yF=a*xF+b; %This appears to be the intersection of the line with the q line
xe=[0 .02 .05 .1 .2 .3 .4 .5 .6 .7 .8 .9 .94 .96 .98 1];
ye=[0 .192 .377 .527 .656 .713 .746 .771 .794 .822 .858 .912 .942 .959 .978 1];
%Data points as described in the question
pp1=polyfit(ye,xe,degree); %This is where the problem starts
y=0:0.01:1;
pp=polyfit(xe,ye,degree);
range=0:0.01:1; %This defines the range that we want the function to go over
y=polyval(pp,range);
plot(range,y,[0 1],[0 1],[xB xF],[xB yF],[xD xF],[xD yF],[xF xF],[xF yF],'--')
pE=polyfit([xD xF],[xD yF],1);
EDIT: I found that interp1 seems to work, but I'll leave this open for a bit if anyone has a more elegant solution.
  2 Comments
Are Mjaavatten
Are Mjaavatten on 22 Feb 2018
plot(xe,ye,'o',range,y)
shows that pp gives a decent fit to the data, so it seems to me that you have solved the task.
The high polynomial degree means that extrapolating outside the given interval will not be meaningful, but that may not be relevant.
But I obviously miss something, since I fail to see the idea behind the first three lines in your code, and the resulting straight lines in the plot. Please explain!
Tommaso Costantini
Tommaso Costantini on 26 Feb 2018
The first three lines are used later, I apologize about the confusion. They function at a later point to create additional lines that the data is bound by (the stripping and enriching lines) using the McCabe-Thiele method. I used the higher degree polynomial as it fit the steps created later in the program, although you are correct that it does work to plot the line.

Sign in to comment.

Accepted Answer

Star Strider
Star Strider on 22 Feb 2018
Do you actually need to fit a curve to your data? Consider using the interp1 (link) function if you only need to get data from it.
  2 Comments
Tommaso Costantini
Tommaso Costantini on 22 Feb 2018
Ideally yes, but I found a way around it. interp1 is working quite well, thank you so much!
Star Strider
Star Strider on 22 Feb 2018
Edited: Star Strider on 26 Feb 2018
As always, my pleasure!
ADDENDUM
With respect to the McCabe-Thiele method, see the File Exchange contribution McCabe-Thiele Method for an Ideal Binary Mixture (link).

Sign in to comment.

More Answers (1)

John D'Errico
John D'Errico on 22 Feb 2018
Edited: John D'Errico on 22 Feb 2018
xe=[0 .02 .05 .1 .2 .3 .4 .5 .6 .7 .8 .9 .94 .96 .98 1];
ye=[0 .192 .377 .527 .656 .713 .746 .771 .794 .822 .858 .912 .942 .959 .978 1];
Lets look at your data.
plot(xe,ye,'o')
Now, one thing you need to understand about polynomials (thus polyfit) is they abhor a singularity. What do you have at x==0? Its pretty much a singularity (here, a point of nearly infinite slope.) Polynomial models simply don't have points where you have an essentially infinite slope. In order to get any kind of fit there you would need a very high degree polynomial model, and high order polynomial models are a BAD idea.
If you think about it, even things like Taylor series (POLYNOMIALS!) represent functions with singularities very poorly. What you see are always massive convergence problems. As I said it before, polynomials abhor a singularity.
What can you do here? There are several entirely valid approaches. It would depend on what you will do with the model, and what other datasets that you will encounter look like.
You have apparently very little noise in the data. It seems quite smooth and well-behaved. If that point at x==0 is NOT expected to be a point of infinite slope, then a simple spline will usually be adequate. It is sometimes dangerous to use the spline option in interp1 that. A traditional spline will often be poor on functions like this, because a spline is itself made of polynomials, and polynomials don't like points of infinite slope. You get lucky here, because at x==0 things seem not quite that bad.
xeint = linspace(0,1,500);
yeint = interp1(xe,ye,xeint,'spline');
plot(xe,ye,'ro',xeint,yeint,'b-')
So the spline interpolant actually did ok there. In some cases, a pchip interpolant (essentially just a different kind of spline that can be found as an option in interp1) would have been necessary.
As I said, we got lucky here. I entirely expected to see oscillations (extraneous bumps and wiggles in the curve between the data points) in the spline fit. That would have been indicative that a spline model was inadequate, a poor choice. But it seems to work entirely well on the data I see here.
  2 Comments
Tommaso Costantini
Tommaso Costantini on 26 Feb 2018
Thank you so much for your input, I really enjoyed the insight gained from it! Basically, I can easily fit the data but I wanted to have the code give me the coefficients of the line as a function of y (x(y)=equation). In doing so, I could solve for the intersection of a horizontal line at a fixed y to make steps using the McCabe-Thiele method for calculating theoretical plates.
The idea is to go from the 45 degree line (y=x) to the data points, then down to the stripping/enriching line depending on where I'm at. As is, I'm experimenting more with the interpolation since it doesn't quite give perfect intersections and has a couple other errors in special cases. Thanks again for your time, and I apologize about the slow reply.
John D'Errico
John D'Errico on 26 Feb 2018
Edited: John D'Errico on 26 Feb 2018
I'd need to see the special cases where you found problems with a spline to know how to fix things.
But solving the problem where you flip the relationship between x and y is simple enough. There are two good solutions available. One is the trivial, just flip x and y. Since your curve is smooth and monotonic,
xe=[0 .02 .05 .1 .2 .3 .4 .5 .6 .7 .8 .9 .94 .96 .98 1];
ye=[0 .192 .377 .527 .656 .713 .746 .771 .794 .822 .858 .912 .942 .959 .978 1];
x_y = spline(ye,xe);
ytarget = 0.75;
fnval(x_y,ytarget)
ans =
0.4147
So the value of xe that yields ye of 0.75.
Or, you could have left the relatinoship in the form ye(xe), and then just used a solver to find the location of interest.
y_x = spline(xe,ye);
xtarget = fzero(@(x) ppval(y_x,x) - ytarget,[0 1])
xtarget =
0.41473
Were you using my SLM toolbox, I provide a solver in there that would work.
slmsolve(y_x,ytarget)
ans =
0.41473
But there is absolutely no reason to need it here. fzero is entirely sufficient.
Now, it is possible that one reason you indicated some problems is the data may not always be so well-behaved. That is something I cannot know, because all I have seen is one relationship that is well-behaved.
In some cases, I might recommend use of my SLM toolbox to build a model, in one of the directions I showed above. But I really cannot say, since a simple spline (or even interp1) is entirely adequate here, in either direction.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!