Inaccurate roots found using Newton Raphson method for 2 variables

6 views (last 30 days)
I'm currently studying MIT OCW and my code for Exercise 11 (as in the link) is
X=[10;10];
for j=1:1000000
g=[sin(X(1)+2*((X(2))^2))-X(2), (X(1)^2)*X(2)-X(1)]';
Jg=[cos(X(1)+2*((X(2))^2)), 4*X(2)*cos(X(1)+2*((X(2))^2))-1; 2*X(1)*X(2)-1, X(1)^2];
X=X-Jg\g;
end
format long
X
Using X=[0,0] as starting point, I found the root (0,0), which is accurate.
Using X=[1,1], I found the root (1.47235, 0.67919). For this outcome, g = (-1.1102e-016, 0), which is still close to the expected value (0,0).
However, I can't find other roots accurately, for example, when using X=[1,3] as starting point, the result is (0.32594,3.04793), and g = (-2.9919, -0.0021411), which is far from the expected value (0,0). This problem persists with more iterations. Also, with more iterations, the root is still (0.32594,3.04793), which shows that the values of X converges onto (0.32594,3.04793).
I checked my Jacobian matrix (Jg) using Wolfram Alpha and it is correct.
May I have an explanation for this? Pardon my ignorance as I just completed my A-levels and have not studied university linear algebra.

Accepted Answer

John D'Errico
John D'Errico on 12 Mar 2018
Edited: John D'Errico on 12 Mar 2018
One of the first things I ALWAYS tell people to do, especially when they see something confusing to them, is to plot EVERYTHING. So lets start out with a plot.
Let me split your two expressions apart. While having them as one vector valued function is good to when you are using Newton's method, for plotting purposes, I need them separate.
g1 = @(X1,X2) sin(X1+2*X2.^2)-X2;
g2 = @(X1,X2) X1.^2.*X2-X1;
So, first, without even looking at any plot, I see that this is a homogeneous one. No constant term. And clearly X1=X2=0 is a solution. So I will expect to see at least one solution.
How can we plot two equations like this? The best trick is not to think of the problem as surfaces, but to view it as a contour plotting problem. A contour plot identifies the locus of points in the (X1,X2) plane that create a level surface, so a constant function value, here that value is zero. Bear with me.
[XX1,XX2] = meshgrid(-10:.01:10,-3.5:.01:3.5);
contour(XX1,XX2,g1(XX1,XX2),[0 0],'r')
hold on
contour(XX1,XX2,g2(XX1,XX2),[0 0],'g')
grid on
xlabel X(1)
ylabel X(2)
Now, wherever we see an intersection between the red and green curves, we expect to see a solution of the two simultaneous equations.
So clearly there is a root at the point [0,0]. I'd expect to see a second for X(1)=0, with X(2) just a wee bit larger than 0.5.
By zooming the plot a bit, I can identify a root around [1.47,.68], as well as a solution near [2.33,0.43], etc.
plot(0,0,'bo')
plot(1.47235, 0.67919,'bo')
plot(0.32594,3.04793,'bo')
plot(1,3,'ms')
I've added the points you have located, as well as the start point of [1,3].
It looks like things have converged nicely for two solutions. It looks like you have missed a few though, but the contour plot should suggest starting values that would converge to any of the solutions I see.
One thing to remember about optimization problems is the concept of a basin of attraction. That is the locus of all points that when used as starting values, will converge to a given solution. So each solution we see on the overlaid contour plot will have its own basin of attraction. Basins need not be contiguous things. And they will generally not be nice simple convex sets, since you can think of the union of all such basins as tiling the plane.
In fact, some basins of attraction will converge to points that are not a solution. You may find points of divergency, usually out at infinity, but also stable points where the gradient is zero but not at a solution. And you may find other strange occurrence, like cycles, where the solution oscillates between two points that are not solutions themselves. (Probably some other degeneracies I've missed in the list too.)
Finally, different optimizers will have different basins of attraction.
A quick surface plot of the functions might help too. It might help to visualize why g1 has those nicely scalloped curves.
figure,surf(XX1,XX2,g1(XX1,XX2))
shading interp
So g1 is a sinusoidal thing, with waves that look like ripples on a pond. That I would expect. I also expect g2 to look hyperbolic.
figure,surf(XX1,XX2,g2(XX1,XX2))
shading interp
As it does.
So I think you need to understand that you say you "expected" this to converge to one root or another given a specific starting value, you might be overly optimistic. Good starting values can be an imperative for any numerical optimization problem, especially when you expect multiple solutions. And any time you see a trig function in a problem, expect multiple solutions.
The only question worth investigating is why when you start at [1,3], is that Newton's method was not convergent. That is probably a question of a non-convergent "solution" a stable point for Newton's method that is not indeed a solution. (I'm assuming you did code the Jacobian properly.)
Since I'm too lazy to do honest algebra, MATLAB tells me:
syms x y
gradient(sin(2*y^2 + x) - y)
ans =
cos(2*y^2 + x)
4*y*cos(2*y^2 + x) - 1
gradient(y*x^2 - x)
ans =
2*x*y - 1
x^2
You wrote this (after I converted it to a function handle):
Jg = @(X) [cos(X(1)+2*((X(2))^2)), 4*X(2)*cos(X(1)+2*((X(2))^2))-1; 2*X(1)*X(2)-1, X(1)^2];
So it looks like you got the Jacobian coded properly. TEST EVERYTHING SOMEBODY CLAIMS.
I'll also write g the same way, as a function handle.
g=@(X) [sin(X(1)+2*((X(2))^2))-X(2), (X(1)^2)*X(2)-X(1)]';
Now, what happens when we start at the non-solution that you found?
X = [0.32594;3.04793];
X = X- Jg(X)\g(X)
X =
0.299020239751565
3.31811970669477
X = X- Jg(X)\g(X)
X =
0.325936592753727
3.04792676355187
X = X- Jg(X)\g(X)
X =
0.299020266071587
3.31811931386382
X = X- Jg(X)\g(X)
X =
0.325936561424046
3.04792717417395
X = X- Jg(X)\g(X)
X =
0.299020269274169
3.31811935548774
X = X- Jg(X)\g(X)
X =
0.325936564685558
3.04792713016343
X = X- Jg(X)\g(X)
X =
0.299020268932314
3.31811935102443
It seems that rather than converging, Newton has gotten into a cycle, one that it is not able to escape. Remember that Newton is not always convergent, given ANY sets of starting values. A nice thing is it is usually quickly convergent (thus quadratic convergence), once you get near a solution. If I must make a guess, I'd bet that Newton is getting confused due to those ripples. Remember, the shape of the plot of g1. Start out too far, and don't be surprised if Newton-Raphson will get trapped.
Enough of beating this to death. I hope.
  1 Comment
Siao Chi Mok
Siao Chi Mok on 12 Mar 2018
Thanks for your explanation. Well I'm still very new to the world of MATLAB and university mathematics (I just completed A-levels), so I'll try to understand. I did get a hint of what you said. You reminded me that I sometimes did get into a cycle while carrying out iterations in A-level mathematics (not using Newton-Raphson method). So is the case of starting point [1;3]. I got it now.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!