Copyright (c) 2009, Yi Cao
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the distribution
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
Chih-Ying Hsiao (view profile)
good thank you
V. Poor (view profile)
And on the URL
http://mdolab.utias.utoronto.ca/resources/complex-step/matlab
the following comment is found:
As in the case of Fortran, one must redefine functions such as abs(), max() and min. All differentiable functions are defined for complex variables. The correct definitions for these functions can be found in the complexify.f90 module.
The standard transpose operation represented by an apostrophe (') poses a problem as it takes the complex conjugate of the elements of the matrix, so one should use the non-conjugate transpose represented by ``dot apostrophe'' (.') instead.
I assume with these changes (which may be difficult to implement as pointed out by Arvind Keerthi), complex step will work for almost any Hessian.
Scott
Pardon me, my mistake: "if" statements are not implemented in Martins, but ABS, <, and similar functions with discontinuous derivatives are.
Scott
The author I was referring to is J.R.R.A.Martins, who in 1999 implemented a Fortran90 module which implements, among other things, complex step differentiation with "if" statements and a variety of other functions.
A link to this module is currently found at:
http://mdolab.utias.utoronto.ca/resources/complex-step/fortran
Of course this would need some translation, but it would be useful to see this implemented in MATLAB.
Scott
The idea of complex step derivative is quite elegant, and Yi Cao has done a very nice job of encoding it.
But buyer beware: This code may NOT be applicable to your particular cost function. For example: If the cost function is defined thus:
function outVal = testFun(x)
b = [3 7 5]';
A = [ 1 2 3;
4 5 6;
17 8 9];
outVal = b'*x + 0.5*x'*A*x;
then it is obvious that the Hessian should always equal the matrix A. Yet, the hessiancsd will not compute the Hessian properly. If instead you made a tiny change to the cost function, and re-wrote the last line as follows:
outVal = b.'*x + 0.5*x.'*A*x;
then hessiancsd will start working correctly.
If you think about why hessiancsd did not work in the first instance, but does in the second, the conclusion is this: Your cost function should be such that a change in the imaginary component of one variable must not be equivalent to a change in another variable.
In practise this is difficult to achieve. For example, one may already have written a difficult-to-evaluate cost function that treats complex input variables in a certain way that makes it totally incompatible for application by hessiancsd.
Well commented, flexible and clear.
Having struggled with the same issues as the author, my concerns are mathematical, rather than programmatical. The complex step derivative is indeed often much more accurate, but takes more time to evaluate both because the functions are complex and because of the difficulty of vectorizing this type of algorithm; the author has used symmetry to gain some speed. As a result, the user may still want to consider a conventional, less accurate Hessian to improve the speed, at least during the earlier stages of optimization.
The second point is that it is not possible to perform 2 complex step derivatives to get Hessians, as the 'i' will interact between the two steps. The author has accounted for that, and only the gradient is computed with a complex step. The Hessian is computed from the gradient using a conventional real finite step, and so suffers from an unavoidable loss in accuracy.
It would, I think, be possible to make a Hessian using two "complex" steps using a quaternion class, which would be highly accurate in both derivative terms, but would probably be even slower. It would be interesting to see such an implementation, and if possible more vectorizing.
Finally, although the author warns of using conditional functions such as 'if', etc., there is an implementation of complex derivatives which includes this class, the author of which escapes me. It would require some work to implement, however.
All in all, excellent job.