Problems with fminsearch giving startvalues as result

Hey,
I am trying to minimize Gibbs enthalpie dependant an phase fraction and phase compositions. So i set up an equation which has all dependencys in it and is dependent on 2 variables.
The problem is that fminsearch is doing nothing, it always gives ma my start values back as results. From the outout I can see that it did 39 iterations ans tells me that the result lie within the TolX and Tolfun, but thats not the case. With a simple parameter sweep I get better results than fminsearch.. I also changes Tolx and Tol fun to very small values but that didnt help either. No matter how stupid my starting values are, thats its result, no matter how bad it is.
I also had this phenomen when doing fits with custom functions, sometimes als here that start values were given back as fit parameters without any improvement.
Does anybody know what I am doing wrong?
Many Thanks in advance.
Best regards.

4 Comments

Is your function continuous with respect to the optimization parameters ? Or do you have round's, int's or integer values therein ?
Its a concatenation of polynomials and every input is calculated with double precision.
Maybe its the way I defined the function? I defined it the following way:
input_1=double_precision_value_1;
input_2=double_precision_value_2;
..
input_n=double_precision_value_n
polynom_1=polynom(variable_1,variable2,input_1,input_2,..,input_n);
polynom_2=different_polynom(variable_1,variable2,input_1,input_2,..,input_n);
fun=@(variable_1,variable2)polynom_1-polynom_2
x0=[startvalue_1, startvalue2];
[x,] = fminsearch(fun,x0,options)
Very often this happens because people don't understand optimization tools. For example, is your function discrete in some way, quantized? Fminsearch CANNOT solve such a problem, because it assumes the objective is a well-behaved function of the parameters (essentially, smooth.) This will cause it to terminate, despite there being better solutions elswhere, since in the vicinity of your start point, the function is essentially constant.
Similarly, even if the function is indeed well defined and everywhere differentiable, your function might just be so flat that it cannot see a way to move that is any improvement, to within the tolerance. So it gives up, returning your start point.
Another common failure is when people use random numbers in the objective. Doing so makes the function not smooth in any respect. And again, fminsearch will almost certainly fail to converge to a good solution. (It might work for a bit if there is a sufficiently large signal beyond the random component in the function, but it will eventually get hung up.)
All of these cases will cause fminsearch, or indeed, most optimization tools to fail to iterate. Is your problem among the general classes I mentioned? Who knows? You may just have a bug in your code, and are calling the optimization tool incorrectly.
Are you truly working with polynomials? Or are you working with multinomials? Do you have any terms which end up using variable_1 * variable_2, or could it be separated out into the sum of two polynomials each in a single variable?

Sign in to comment.

 Accepted Answer

With a simple parameter sweep I get better results than fminsearch.
I don't know how you've implemented the sweep, but I don't see why you don't use that as your solution, or at least use it to initialize fminsearch. Since you know a local region where the minimum is located, I picture the sweep done in a vectorized fashion like below. It should be easy to vectorize the operation in fun() if they are just polynomial operations.
[var1,var2]=ndgrid(linspace(__), linspace(___))
Fgrid=fun(var1,var2) ; %vectorize fun() to accept array-valued input.
[~,iopt]=min(Fgrid(:));
var1_optimal=var1(iopt);
var2_optimal=var2(iopt);

18 Comments

The sweep is relativ time comsuming. Since finding this minima should be done thousend of tinmes in a row i hoped to improve it with fminsearch, beauchse I thought it is already optimized for such problems.
Vectorizing is possibly, but takes time, the polynoms cover multiple pages...
But even setting aside your current problems with convergence, how did you plan to avoid local minima? You know that polynomial surfaces have them, so you have to do at least a coarse parameter sweep anyway to find a good initial point.
polynomials can potentially be minimized using calculus techniques. Finding the zeros of the derivative. Which can involve polyder() and roots()
I vectorized the polynomials and they do accept vector input, but somehow at "Fgrid=fun(var1,var2)" it tells me "too many input arguments"
fun = @(variable_1,variable2) polynom_1(variable_1,variable_2)-polynom_2(variable_1,variable_2);
fun = @(x)fun(x(1),x(2));
those 2 above dont need to be vectorized, the sum shpuld automatically accept vector input, correct?
Fgrid = fun([var1,var2])
instead of
Fgrid = fun(var1,var2)
if you defined
fun = @(x)fun(x(1),x(2))
well that works, but somehow Fgrid is a 1by2 row vector even though var 1 and var2 are both 100x100
For that fminsearch worked, you had to put the input variables var1 and var2 into one single vector [var1,var2]. This is accomplished by
fun = @(x)fun(x(1),x(2))
assuming that var1 and var2 were both scalars.
If "fun" can handle matrix inputs for var1 and var2 and you don't want to use fminsearch, but a parameter sweep, don't use the above update of "fun".
Vectorizing is possibly, but takes time, the polynoms cover multiple pages...
If they cover multiple pages, I assume the polynomials are of uncommonly high order. That would make the objective function rife with local minima as well as highly unstable numerically, in all likelihood. It may be time to reconsider your model.
The basic polynomials always have tur structure:
p(T)=a+b*T+c*T*log(T)+d*T^2+e*T^3+f*T^(-1)+g*T^(-2)+h*T^(-3)+i*T^(-9)+j*T^(4);
Those polynomials are weighted with constants (of which some are interdependend because of constraints) ind a way like
P=c1*c2 (p1(T)+p2(T)*(c1-c2)+p3(T)*(c1-c2)^2)
and added. For higher order interactions often only the values a and b are needed and the rest is 0. But still, the number of the polynomials P can go up to like 20, 50 or eben higher, depending of the complexity of the system.
And this is the way its done, cant change it. Its done this way in commercial software and described in multiple papers.
Multiple pages for a polynomial can also be caused by having floating point coefficients converted to symbolic by the default methods, which can lead to fractions with 2^53 or so in the denominator per term... and then you multiply those together. If you are doing symbolic work it can help a fair bit to convert floating point constants manually. For example write 1.609 as sym(1609)/1000.
If you are running into those kinds of problems then it can help to use collect() followed by vpa(). But you have to be careful about vpa(), take into consideration the range of values. For example x^10 + 1e10, which term is the most important depends on whether abs(x) < 10
The whole polynom is multiple pages long without any floating point coefficient, only with variable names and placeholder for floating point constants that are all 3 digits long.
I set up the polynom as symbolic function, since I also had to partially differentiate those polynoms, without any floating point number, only variables.
But it takes 2 hours to generate the symbolic function and to differentiate it. So thats not very efficient.
After thats done I save the symbolic functions and its derivatives in a .txt file and this function is then used in MatLab where Matlab then takes the appropriate floating point coefficients from a matrix to put them in the polynom.
It's not clear in what you've laid out which of the variables are the two unknowns that you are solving for. If I assume that the unknowns are c1 and c2, then the expression
P=c1*c2 (p1(T)+p2(T)*(c1-c2)+p3(T)*(c1-c2)^2)
are all second degree polynomials in c1 and c2. If that is always the case in your actual problem, you should be able to rewrite in the form
P=A + B*c1 + C*c2 + D*c1*c2 + E*c1^2 +F*c2^2
where A,B,C,D,E,F are known values computed prior to the optimization. All second order 2D polynomials in c1,c2 can be written this way, so there is no reason why your objective function should to be any more complicated than this.
Obviously, if your actual polynomial is higher order in c1,c2 there will be more terms, but any order which you can realistically process in double floating point should definitely not have pages and pages of terms.
c1 or c2 could be one of those to solve for. The second one is a bit more tricky...
Lets say that the Value of the Page long summation of polynomials P is V1 and for a second polynomial it is V2. Lets name the variables i could solve for d in polynomial 1 und c for polynomial 2.
I am now looking for the minimum of G=(1-f)*V1(d1(c1),d2(c1),d3(c1),...d15(c1),f)+f*V2(c1)
which is depending on c1 and f. V2 could also be dependend on more than c1, mabye an additional c2, and then also d1...d15 would be dependen not only on c1 but c1 and c2
Lets name the variables i could solve for d in polynomial 1 und c for polynomial 2...and then also d1...d15 would be dependen not only on c1 but c1 and c2
Not clear at all, I'm afraid. You said in your original post that you are optimizing over two unknown independent variables. What names are you giving those two variables? Is it c1 and c2 or something else? Is the objective function a polynomial function of those variables?
I have the vague impression that you are using the term "polynomial" loosely to mean any sum of terms. Please don't do that. A polynomial has a precise definition, which matters in this discussion. It is a weighted sum of integer powers of the independent, unknown variables. Does your objective function satisfy that definition, or not? If it meets this definition, then what is the degree of the polynomial?
I got 2 systems which share some constituents. So I have as many variables, relativ content of my systems, as they have constituents. System 1 is composed of like 4 constituants with the relativ composition values d1,d2,d3,d4,d5(=1-d1-d2-d3-d4) and system 2 is composed of only 2 constituents c1 and c2. Since c1=1-c2 I can reduce System 2 on 1 variable. Now the composition of system 1 depends on the composition of system 2, so on c1, and on how much of system 2 I am extraction from system 1, called f, because of the mass balance.
I now have to describe the energy of my systems. Those are described by taking the energy of the constituent weighted pure constituents described by p1(T)=d1*(a+b*T+c*T*log(T)+d*T^2+e*T^3+f*T^(-1)+g*T^(-2)+h*T^(-3)+i*T^(-9)+j*T^(4)); for each constituent. Then more additional excess terms which compensate for missing interaction in the equation above are also weighted and added. They follow the same structure as the equation above, but often with variables c,d,e...,j=0. All those are added up and give me the energy for system one. Same is done for system 2.
To account for mass balance, the value in system 1 for the constituents both systems are sharing, is depended of the composition of system 2 and on how mich system 2 I have...
So Energy of system 1 is kind of
E1=sum(p1_1,p2_1,p3_1,p4_1,p5_1,..) all p dependend on (d1(c1),d2(c1),d3(c1),...d15(c1),f) (index _1 for system 1)
and system 2 is
E2=sum(p1_2,p2_2,p3_2,p4_2,p5_2,..) only dependend on c1
Energy of the whole system is then the weighted some of both systems
E_total=(1-f)*E1+f*E2; and this has to be minimized
Could you confirm that you end up with log() of the unknowns? If so then that is not a polynomial. Also it looks like you have divisions by the unknowns -- T^(-1) and T^(-3) which gives additional challenges.
yeah I end up with log() of the unknows.
Thats also why I have to go with fminsearchbnd, since fminserach finds minimums for negative values, beaucause log() of some negative values gives easy highly negative and therfore probaply minimum values... but negative values of the unknown dont make sence physicaly, therefore the necessity of a bound.
The ...T^(-1) and T^(-3) terms are only dependend on T and therefore constant for each individual case. They are pre calculated and then just insertet. The results of the ...T^(-1) and T^(-3) terms are only weighted by the unknown either just by multiplication with one unknown or more, multiplication with the difference of unknowns or weighted by the difference of unknowns to the power of 2
Just make the change of variables var1-->var1^2, var2-->var2^2 to ensure they only present positive values to the objective function. That's what fminsearchbnd does anyway.

Sign in to comment.

More Answers (1)

polynom_1 = @(variable_1,variable2) polynom(variable_1,variable2,input_1,input_2,..,input_n);
polynom_2 = @(variable_1,variable2) different_polynom(variable_1,variable2,input_1,input_2,..,input_n);
fun = @(variable_1,variable2) polynom_1(variable_1,variable_2)-polynom_2(variable_1,variable_2);
fun = @(x)fun(x(1),x(2));
x0 = [startvalue_1, startvalue2];
x = fminsearch(fun,x0,options)

3 Comments

This works, but there seems to be a minima beyond physical reason. I could give him the exact intervals whre the solution has to be within for both variables, unfortunately minsearch does not support intervals and fminbnd is only for single variable functions...
File Exchange has fminsearchbnd. By John D'Errico if I recall correctly.
Unfortunately it did not get the correct answer. Difference from its found f(x,y) solution to the known value was more than 10^5, whereas the start value has only a difference of 90.

Sign in to comment.

Categories

Find more on Optimization in Help Center and File Exchange

Products

Release

R2020b

Tags

Asked:

on 25 Apr 2022

Edited:

on 28 Apr 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!