You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
Newton's method for minimisation returns a critical point
42 views (last 30 days)
Show older comments
I am trying to implement the newton's method to find the minima in the Himmelblau function.
The code does work most of the time, but on cases like this where my initial guess is (0.5 , 1) it returns a critical point of the function. I understand this is because the gradient becomes 0 and no new points are generated.
Now my question would be, is this normal with this method? Is there a way of getting around this problem?
Thanks for any help
close all; clear; clc
% Initialisation of variables to use
x0 = [0.5;1];
tol = 1e-4;
maxits = 50;
% Himmelblau function
him = @(x,y) (x.^2 + y - 11).^2 + (x + y.^2 - 7).^2;
% Gradient of the Himmelblau
grad_him = @(x,y) [[4*x.^3 + 4*x.*y - 42*x + 2*y.^2 - 14];[4*y.^3 + 4*x.*y - 26*y + 2*x.^2 - 22]];
% Hessian matrix of the Himmelblau
hessian_him = @(x,y) [[ 12*x.^2 + 4*y - 42 , 4*x + 4*y ];[ 4*x + 4*y , 12*y.^2 + 4*x - 26 ]];
% Call to newton's function and displaying our results accordingly
[r, iters, flag] = newton_min(grad_him,hessian_him,x0,tol,maxits);
fprintf ("<strong>Newton's method</strong>\n\n");
switch (flag)
case 0
fprintf ("There was a convergence on f\n\n");
fprintf("The minima found is: \n");
disp(r);
fprintf("It took %d iterations.\n\n",iters);
case 1
fprintf ("There was a convergence on x\n\n");
fprintf("The minima found is: \n");
disp(r);
fprintf("It took %d iterations.\n\n",iters);
otherwise
fprintf ("There was no convergence\n\n");
end
function [r, iters, flag] = newton_min(dg,ddg,x0,tol,maxits)
x = x0(1); y = x0(2);
r = NaN;
flag = -1;
for iters = 1 : maxits
x_old = [x;y];
x_new = x_old - (ddg(x,y)\dg(x,y));
if norm(dg(x,y)) < tol
flag = 0;
r = x_new;
return;
end
if norm(x_new - x_old) <= (tol + eps*norm(x_new))
flag = 1;
r = x_new;
return;
end
x = x_new(1);
y = x_new(2);
end
end
Accepted Answer
Matt J
on 10 Nov 2020
Yes, it's normal.
30 Comments
Matt J
on 10 Nov 2020
Edited: Matt J
on 10 Nov 2020
Is there a way of getting around this problem?
You can make the following change to your update step, but some would argue that it is not strictly Newton's method anymore.
H=ddg(x,y);
if any(eig(H)<=0), H=eye(2); end
x_new = x_old - (H\dg(x,y));
Also, you really need to add a line search to ensure global convergence, though I suspect that won't be necessary for this specific function.
Dussan Radonich
on 10 Nov 2020
Thank you, that's perfect. Would that line search be fminbnd with the function on one variable between two arbitrary points?
J. Alex Lee
on 10 Nov 2020
the line search idea would be to damp the step down H by some value less than 1
x_new = x_old - lam*H\dg(x,y)
If that's what Matt J has in mind, it doesn't seem too meaningful to me since [the modified] H was randomly chosen to not follow the gradient at all; in fact it could have been anything that's positive definite, right? So I don't think it's necessarily guaranteed that doing a line search in a random direction will get you a residual value lower than the current iterate.
Dussan Radonich
on 10 Nov 2020
I tried this, with multiple vlaues for lam. And it would never converge
Bruno Luong
on 10 Nov 2020
Sorry I don't understand what is the heck you guys propose.
H=ddg(x,y);
if any(eig(H)<=0), H=eye(2); end
x_new = x_old - (H\dg(x,y));
It stucks at (0.5,1) because dg at this points is zero, when you change H to eye, x_new is still == x_old, so it's still stuck.
Matt J
on 10 Nov 2020
If you are going to use a constant lam, it must be smaller than one over the largest possible Hessian eigenvalue. An adaptive way to choose lam would be to minimize,
L= @(lam) him(x_old - lam*H\dg(x,y))
Matt J
on 10 Nov 2020
Edited: Matt J
on 10 Nov 2020
@Bruno,
The initial point (0.5,1) is not a point of zero gradient. The problem, though, is that unmodified Newton's method will take the algorithm uphill to a nearby inflection point, since the Hessian at (0.5,1) is not positive definite. When I add the modification (with lam=1), I successfully reach one of the local minima.
Dussan Radonich
on 10 Nov 2020
Bruno, it actually converges after that. But thinking about it, it's true. The gradient is suppose to be zero and it shouldn't converge, but it does.
Matt J
on 10 Nov 2020
Edited: Matt J
on 10 Nov 2020
@J Alex Lee
H was randomly chosen to not follow the gradient at all; in fact it could have been anything that's positive definite, right?
It's not exactly random. When H is set to eye(2), it is equivalent to doing a step of steepest descent for that one iteration. But H could indeed be anything positive definite as long as its eigenvalues are uniformly bounded to some strictly positive finite interval..
Dussan Radonich
on 10 Nov 2020
Matt, i tried with everything below 1. The only time it worked was with something around 1e-5.
Well I have to create newton's method with specifically those parameters, so him would never be sent to it. Is there another way?
But just to have more knowledge, after setting the function L. I would have to do a fminbnd of it between 0 and 1?
Matt J
on 10 Nov 2020
Edited: Matt J
on 11 Nov 2020
Matt, i tried with everything below 1. The only time it worked was with something around 1e-5.
It worked fine for me with lam=1, starting from (0.5,1). What failure cases did you observe?
Well I have to create newton's method with specifically those parameters,
Specifically what parameters? Like I said, the modifications we've already made make it such that it's not purely Newton's method anymore.
I would have to do a fminbnd of it between 0 and 1?
In general, it can be any constant interval [0,s] where 0<s<=inf. Obviously if you choose s very small, fminbnd will complete faster, but "Newton's method" convergence can be very slow. Also, note that for this specific objective function, L(lam) is a 1D polynomial, so in theory you can find its exact minimum analytically.
Dussan Radonich
on 10 Nov 2020
It converges to the critical point or to the maxima. if I chose lam to be anything from 0.1 to 0.9 in intervals of 0.1 all the points that converge from the 1000 random ones are maxima. Maybe this is how it's suppose to be?
My function should only receive the gradient, hessia, initial guess, tolerance, and max iterations.
Should't lam be the result of a line search of the function on only one variable?
Something like:
him_x = @(x) him(x,y);
t = fminbnd(him_x,0,s);
Dussan Radonich
on 11 Nov 2020
But i'm still wondering how does changing the hessian to an identity work if the gradient is the one that results in zero?
Matt J
on 11 Nov 2020
Edited: Matt J
on 11 Nov 2020
Should't lam be the result of a line search of the function on only one variable?
No, it should be a search in the direction you plan to step H\dg(x,y). If you plan to step in the direction of x and y alternately (this is called coordinate descent), you would minimize alternatingly along x and y, but then you wouldn't be exploiting the Hessian at all.
Matt J
on 11 Nov 2020
Edited: Matt J
on 11 Nov 2020
if I chose lam to be anything from 0.1 to 0.9 in intervals of 0.1 all the points that converge from the 1000 random ones are maxima. Maybe this is how it's suppose to be?
You may have to increase maxits, depending on your choice of lam. For me, lam=0.4 gives convergence in 61 iterations.
But i'm still wondering how does changing the hessian to an identity work if the gradient is the one that results in zero?
Making the Hessian positive definite ensures that the direction of the step will be downhill. Otherwise, this is not guaranteed.
Dussan Radonich
on 11 Nov 2020
Ok, but the problem I've been having when trying to do the line search is that with the function you sent
L= @(lam) him(x_old - lam*H\dg(x,y))
him should be receiving x and y. And the inside is x_new, so it would be the same to do
L = @(lam) him(x_new(1), x_new(2))
Would this make sense?
Dussan Radonich
on 11 Nov 2020
You get the convergence with the Hessian being put to an identity? Or by doing a line search? Just by adding a step of 0.4 doesn't change anything for me
Matt J
on 11 Nov 2020
The following converged:
% Initialisation of variables to use
x0 = [0.5;1];
tol = 1e-10;
maxits = 50000;
% Himmelblau function
him = @(x,y) (x.^2 + y - 11).^2 + (x + y.^2 - 7).^2;
% Gradient of the Himmelblau
grad_him = @(x,y) [[4*x.^3 + 4*x.*y - 42*x + 2*y.^2 - 14];[4*y.^3 + 4*x.*y - 26*y + 2*x.^2 - 22]];
% Hessian matrix of the Himmelblau
hessian_him = @(x,y) [[ 12*x.^2 + 4*y - 42 , 4*x + 4*y ];[ 4*x + 4*y , 12*y.^2 + 4*x - 26 ]];
% Call to newton's function and displaying our results accordingly
[r, iters, flag] = newton_min(grad_him,hessian_him,x0,tol,maxits);
fprintf ("<strong>Newton's method</strong>\n\n");
Newton's method
switch (flag)
case 0
fprintf ("There was a convergence on f\n\n");
fprintf("The minima found is: \n");
disp(r);
fprintf("It took %d iterations.\n\n",iters);
case 1
fprintf ("There was a convergence on x\n\n");
fprintf("The minima found is: \n");
disp(r);
fprintf("It took %d iterations.\n\n",iters);
otherwise
fprintf ("There was no convergence\n\n");
end
There was a convergence on x
The minima found is:
3.0000
2.0000
It took 61 iterations.
function [r, iters, flag] = newton_min(dg,ddg,x0,tol,maxits)
x = x0(1); y = x0(2);
r = NaN;
flag = -1;
for iters = 1 : maxits
x_old = [x;y];
H=ddg(x,y);
if any(eig(H)<=0), H=eye(2); end
x_new = x_old - 0.4*(H\dg(x,y));
if norm(dg(x,y)) < tol
flag = 0;
r = x_new;
return;
end
if norm(x_new - x_old) <= (tol + eps*norm(x_new))
flag = 1;
r = x_new;
return;
end
x = x_new(1);
y = x_new(2);
end
end
Dussan Radonich
on 11 Nov 2020
Oh ok, yes i get convergence with that. Even by taking the step works fine. Thank you
If I were to implement the line search, would that eliminate the need for the identity matrix?
Dussan Radonich
on 11 Nov 2020
Ok thank you, one last question. How is it that when solving H against the gradient (that is zero) the result isn't zero and we keep going down to the minima?
Dussan Radonich
on 11 Nov 2020
The point where it would get stuck at is a critical point of the Himmelblau, and critical points occur when the gradient is zero. That's why I'm a bit confused.
Matt J
on 11 Nov 2020
Edited: Matt J
on 11 Nov 2020
Yes, if you were to initialize the iterations at any zero-gradient point, including an inflection point, the algorithm would get stuck there. I'm not claiming that wouldn't happen.
That hasn't happened in what we've looked at so far, however, because we have only tested the initial point (0.5,1) which is not a point of zero-gradient:
>> grad_him(0.5,1)
ans =
-30.5000
-41.5000
Bruno Luong
on 11 Nov 2020
It's me that is confuse I thoughh the gradient is 0 at the starting point.
Bruno Luong
on 11 Nov 2020
Edited: Bruno Luong
on 11 Nov 2020
In practice gradient-based minimizer first computes the descend direction, it can cen Newton like direction
dk = -H(xk)*g(k)
H is estimated from BFGS formula or by other mean such as conjugate gradient direction.
then do the line search
x(k+1) = x(k) + rho*(dk)
When computing dk, the minimizer always ensure
dot(dk,g) <= 0
so that the linesearch always goes downhill for infinitesimal step. It will automatically satisfies when Hk is positive, such as BFGS approximation. In some other methods that can deal with nonpositive Hessian, the direction is given by for example minimizing quadratic form on the sphere around the current point xk (trust region), this also ensures the 2nd condition is meet. The line search never goes up as with the Newton direction like in your case.
The line search approximative per iteration is enough, something costly as fminbnd is too much. Usualy a couple of evaluations of cost function along descend the direction, then the minimzer performs polynomial fit to find the minimum. It must satisfied some other criiteria such as Wolf's criteria etc... if not the step can be reduced aor increased in a geometrical manner until the criteria meet.
Even there are only a few evaluations per step, it will eventually converges to small-enough step along the iteration (one iteration == one change of dk), since the gradient converges to 0 (first order optimality). Thus the most important thing for an optimiser is to find the right descend direction and there is not need to do exact line search.
Anyway all that is well known and one can read through many books of optimization. A classical book is Nosedal that give an overview. In practice solver differs from technical details , and one have to find the corresponding paper to find out what's going on.
Dussan Radonich
on 11 Nov 2020
But that is not the point where the gradient would be zero, it is the critical point (-0.1280, -1.9537).
>> grad_him(-0.1280,-1.9537)
ans =
0.0018
0.0006
Where the method gets stuck
Bruno Luong
on 11 Nov 2020
Edited: Bruno Luong
on 11 Nov 2020
When you think about it, your original implementation of newton method for solving first-order euler eqt
g(x)=0
it doesn't care about if the point x is (local) minimum, maximum or eventually even a saddle point. As written the code is totally symetric to maximizing or minimizing.
So it converges to about local maximum (-0.1280, -1.9537) if you start at (0.5,1). So it actually does well. Just not what you would expect that returns the local minimum, but the code does not incorporate any specific statement on finding a minimum.
Suppose now that you reverse the sign of your objective function him (therefore also H and g accordingly), it will goes exactly the same path and converges to now to the very same solution, but now it's a local minimum!
Matt J
on 11 Nov 2020
But that is not the point where the gradient would be zero, it is the critical point (-0.1280, -1.9537).
Yes, but as long as the algorithm goes downhill from (0.5,1) at every iteration, it can never approach the inflection point (-0.1280, -1.9537). The inflection point lies uphill from your initial point:
>> him(0.5,1)
ans =
125.3125
>> him(-0.1280, -1.9537)
ans =
178.3372
More Answers (2)
J. Alex Lee
on 10 Nov 2020
Yes this looks normal, you are only asking to zero the gradient of the function, so naturally that includes non-optimal points where the gradient is [vector] zero.
You can use a non-gradient minimizer, like fminsearch to seek local minima
1 Comment
Dussan Radonich
on 10 Nov 2020
Thank you, the idea is not to used fminsearch as I am trying to compare newton's method against fminsesarch
Bruno Luong
on 10 Nov 2020
Edited: Bruno Luong
on 10 Nov 2020
"Now my question would be, is this normal with this method?"
Your code juts shows it: yes it is normal.
Now in practice it is very rare that one falls on a stationary point that is not a local minimum. As soon as you work with a non-academic objective function. You won't ever get the gradient == 0 exactly.
"Is there a way of getting around this problem?"
All the book I read about optmization, no one care about this specific problem,since as I said it only happens in academic example. However, many methods will compute for each iteration an approximation of the Hessian, and the positiveness of the Hessian is either enforced or monitored. The Hessian that has negative eigenvalues like yours at (0.5,1) will has automatically a special treatment to escape from non-mimum.
1 Comment
See Also
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)