Documentation |
1-D interval location using Brent's method
[a,gX,perf,retcode,delta,tol] = srchbre(net,X,Pd,Tl,Ai,Q,TS,dX,gX,perf,dperf,delta,tol,ch_perf)
srchbre is a linear search routine. It searches in a given direction to locate the minimum of the performance function in that direction. It uses a technique called Brent's technique.
[a,gX,perf,retcode,delta,tol] = srchbre(net,X,Pd,Tl,Ai,Q,TS,dX,gX,perf,dperf,delta,tol,ch_perf) takes these inputs,
net | Neural network |
X | Vector containing current values of weights and biases |
Pd | Delayed input vectors |
Tl | Layer target vectors |
Ai | Initial input delay conditions |
Q | Batch size |
TS | Time steps |
dX | Search direction vector |
gX | Gradient vector |
perf | Performance value at current X |
dperf | Slope of performance value at current X in direction of dX |
delta | Initial step size |
tol | Tolerance on search |
ch_perf | Change in performance on previous step |
and returns
a | Step size that minimizes performance |
gX | Gradient at new minimum point |
perf | Performance value at new minimum point |
retcode | Return code that has three elements. The first two elements correspond to the number of function evaluations in the two stages of the search. The third element is a return code. These have different meanings for different search algorithms. Some might not be used in this function. |
0 Normal | |
1 Minimum step taken | |
2 Maximum step taken | |
3 Beta condition not met | |
delta | New initial step size, based on the current step size |
tol | New tolerance on search |
Parameters used for the Brent algorithm are
alpha | Scale factor that determines sufficient reduction in perf |
beta | Scale factor that determines sufficiently large step size |
bmax | Largest step size |
scale_tol | Parameter that relates the tolerance tol to the initial step size delta, usually set to 20 |
The defaults for these parameters are set in the training function that calls them. See traincgf, traincgb, traincgp, trainbfg, and trainoss.
Dimensions for these variables are
Pd | No-by-Ni-by-TS cell array | Each element P{i,j,ts} is a Dij-by-Q matrix. |
Tl | Nl-by-TS cell array | Each element P{i,ts} is a Vi-by-Q matrix. |
Ai | Nl-by-LD cell array | Each element Ai{i,k} is an Si-by-Q matrix. |
where
Ni | = | net.numInputs |
Nl | = | net.numLayers |
LD | = | net.numLayerDelays |
Ri | = | net.inputs{i}.size |
Si | = | net.layers{i}.size |
Vi | = | net.targets{i}.size |
Dij | = | Ri * length(net.inputWeights{i,j}.delays) |
Here is a problem consisting of inputs p and targets t to be solved with a network.
p = [0 1 2 3 4 5]; t = [0 0 0 1 1 1];
A two-layer feed-forward network is created. The network's input ranges from [0 to 10]. The first layer has two tansig neurons, and the second layer has one logsig neuron. The traincgf network training function and the srchbac search function are to be used.
net = newff([0 5],[2 1],{'tansig','logsig'},'traincgf'); a = sim(net,p)
net.trainParam.searchFcn = 'srchbre'; net.trainParam.epochs = 50; net.trainParam.show = 10; net.trainParam.goal = 0.1; net = train(net,p,t); a = sim(net,p)
You can create a standard network that uses srchbre with newff, newcf, or newelm. To prepare a custom network to be trained with traincgf, using the line search function srchbre,
The srchbre function can be used with any of the following training functions: traincgf, traincgb, traincgp, trainbfg, trainoss.
Brent's search is a linear search that is a hybrid of the golden section search and a quadratic interpolation. Function comparison methods, like the golden section search, have a first-order rate of convergence, while polynomial interpolation methods have an asymptotic rate that is faster than superlinear. On the other hand, the rate of convergence for the golden section search starts when the algorithm is initialized, whereas the asymptotic behavior for the polynomial interpolation methods can take many iterations to become apparent. Brent's search attempts to combine the best features of both approaches.
For Brent's search, you begin with the same interval of uncertainty used with the golden section search, but some additional points are computed. A quadratic function is then fitted to these points and the minimum of the quadratic function is computed. If this minimum is within the appropriate interval of uncertainty, it is used in the next stage of the search and a new quadratic approximation is performed. If the minimum falls outside the known interval of uncertainty, then a step of the golden section search is performed.
See [Bren73] for a complete description of this algorithm. This algorithm has the advantage that it does not require computation of the derivative. The derivative computation requires a backpropagation through the network, which involves more computation than a forward pass. However, the algorithm can require more performance evaluations than algorithms that use derivative information.