**[Batt92]** Battiti, R., "First
and second order methods for learning: Between steepest descent and
Newton's method," *Neural Computation*,
Vol. 4, No. 2, 1992, pp. 141–166.

**[Beal72]** Beale, E.M.L., "A
derivation of conjugate gradients," in F.A. Lootsma, Ed., *Numerical
methods for nonlinear optimization*, London: Academic Press,
1972.

**[Bren73]** Brent, R.P., *Algorithms
for Minimization Without Derivatives*, Englewood Cliffs,
NJ: Prentice-Hall, 1973.

**[Caud89]** Caudill, M., *Neural
Networks Primer*, San Francisco, CA: Miller Freeman Publications,
1989.

This collection of papers from the *AI Expert Magazine* gives
an excellent introduction to the field of neural networks. The papers
use a minimum of mathematics to explain the main results clearly.
Several good suggestions for further reading are included.

**[CaBu92]** Caudill, M., and C.
Butler, *Understanding Neural Networks: Computer Explorations*,* Vols.
1 and 2*, Cambridge, MA: The MIT Press, 1992.

This is a two-volume workbook designed to give students "hands on" experience with neural networks. It is written for a laboratory course at the senior or first-year graduate level. Software for IBM PC and Apple Macintosh computers is included. The material is well written, clear, and helpful in understanding a field that traditionally has been buried in mathematics.

**[Char92]** Charalambous, C.,"Conjugate
gradient algorithm for efficient training of artificial neural networks," *IEEE
Proceedings*, Vol. 139, No. 3, 1992, pp. 301–310.

**[ChCo91]** Chen, S., C.F.N. Cowan,
and P.M. Grant, "Orthogonal least squares learning algorithm
for radial basis function networks," *IEEE Transactions
on Neural Networks*, Vol. 2, No. 2, 1991, pp. 302–309.

This paper gives an excellent introduction to the field of radial basis functions. The papers use a minimum of mathematics to explain the main results clearly. Several good suggestions for further reading are included.

**[ChDa99]** Chengyu, G., and K.
Danai, "Fault diagnosis of the IFAC Benchmark Problem with
a model-based recurrent neural network," *Proceedings
of the 1999 IEEE International Conference on Control Applications*,
Vol. 2, 1999, pp. 1755–1760.

**[DARP88]** *DARPA Neural
Network Study*, Lexington, MA: M.I.T. Lincoln Laboratory,
1988.

This book is a compendium of knowledge of neural networks as they were known to 1988. It presents the theoretical foundations of neural networks and discusses their current applications. It contains sections on associative memories, recurrent networks, vision, speech recognition, and robotics. Finally, it discusses simulation tools and implementation technology.

**[DeHa01a]** De Jesús, O.,
and M.T. Hagan, "Backpropagation Through Time for a General
Class of Recurrent Network," *Proceedings of the International
Joint Conference on Neural Networks*, Washington, DC, July
15–19, 2001, pp. 2638–2642.

**[DeHa01b]** De Jesús, O.,
and M.T. Hagan, "Forward Perturbation Algorithm for a General
Class of Recurrent Network," *Proceedings of the International
Joint Conference on Neural Networks*, Washington, DC, July
15–19, 2001, pp. 2626–2631.

**[DeHa07]** De Jesús, O.,
and M.T. Hagan, "Backpropagation Algorithms for a Broad Class
of Dynamic Networks," IEEE Transactions on Neural Networks,
Vol. 18, No. 1, January 2007, pp. 14 -27.

This paper provides detailed algorithms for the calculation of gradients and Jacobians for arbitrarily-connected neural networks. Both the backpropagation-through-time and real-time recurrent learning algorithms are covered.

**[DeSc83]** Dennis, J.E., and
R.B. Schnabel, *Numerical Methods for Unconstrained Optimization
and Nonlinear Equations*, Englewood Cliffs, NJ: Prentice-Hall,
1983.

**[DHH01]** De Jesús, O.,
J.M. Horn, and M.T. Hagan, "Analysis of Recurrent Network Training
and Suggestions for Improvements," *Proceedings of
the International Joint Conference on Neural Networks*,
Washington, DC, July 15–19, 2001, pp. 2632–2637.

**[Elma90]** Elman, J.L., "Finding
structure in time," *Cognitive Science*,
Vol. 14, 1990, pp. 179–211.

This paper is a superb introduction to the Elman networks described
in Chapter 10, "Recurrent Networks."** **

**[FeTs03]** Feng, J., C.K. Tse,
and F.C.M. Lau, "A neural-network-based channel-equalization
strategy for chaos-based communication systems," *IEEE
Transactions on Circuits and Systems I: Fundamental Theory and Applications*,
Vol. 50, No. 7, 2003, pp. 954–957.

**[FlRe64]**** **Fletcher,
R., and C.M. Reeves, "Function minimization by conjugate gradients," *Computer
Journal*, Vol. 7, 1964, pp. 149–154.

**[FoHa97]** Foresee, F.D., and
M.T. Hagan, "Gauss-Newton approximation to Bayesian regularization," *Proceedings
of the 1997 International Joint Conference on Neural Networks*,
1997, pp. 1930–1935.

**[GiMu81]** Gill, P.E., W. Murray,
and M.H. Wright, *Practical Optimization*, New
York: Academic Press, 1981.

**[GiPr02]** Gianluca, P., D. Przybylski,
B. Rost, P. Baldi, "Improving the prediction of protein secondary
structure in three and eight classes using recurrent neural networks
and profiles," *Proteins: Structure, Function, and
Genetics*, Vol. 47, No. 2, 2002, pp. 228–235.

**[Gros82]** Grossberg, S., *Studies
of the Mind and Brain*, Drodrecht, Holland: Reidel Press,
1982.

This book contains articles summarizing Grossberg's theoretical psychophysiology work up to 1980. Each article contains a preface explaining the main points.

**[HaDe99]** Hagan, M.T., and H.B.
Demuth, "Neural Networks for Control," *Proceedings
of the 1999 American Control Conference*, San Diego, CA,
1999, pp. 1642–1656.

**[HaJe99]** Hagan, M.T., O. De
Jesus, and R. Schultz, "Training Recurrent Networks for Filtering
and Control," Chapter 12 in *Recurrent Neural Networks:
Design and Applications*, L. Medsker and L.C. Jain, Eds.,
CRC Press, pp. 311–340.

**[HaMe94]** Hagan, M.T., and M.
Menhaj, "Training feed-forward networks with the Marquardt
algorithm," *IEEE Transactions on Neural Networks*,
Vol. 5, No. 6, 1999, pp. 989–993, 1994.

This paper reports the first development of the Levenberg-Marquardt algorithm for neural networks. It describes the theory and application of the algorithm, which trains neural networks at a rate 10 to 100 times faster than the usual gradient descent backpropagation method.

**[HaRu78]** Harrison, D., and
Rubinfeld, D.L., "Hedonic prices and the demand for clean air," *J.
Environ. Economics & Management*, Vol. 5, 1978, pp. 81-102.

This data set was taken from the StatLib library, which is maintained at Carnegie Mellon University.

**[HDB96]** Hagan, M.T., H.B. Demuth,
and M.H. Beale, *Neural Network Design*, Boston,
MA: PWS Publishing, 1996.

This book provides a clear and detailed survey of basic neural network architectures and learning rules. It emphasizes mathematical analysis of networks, methods of training networks, and application of networks to practical engineering problems. It has example programs, an instructor's guide, and transparency overheads for teaching.

**[HDH09]** Horn, J.M., O. De Jesús
and M.T. Hagan, "Spurious Valleys in the Error Surface of Recurrent
Networks - Analysis and Avoidance," IEEE Transactions on Neural
Networks, Vol. 20, No. 4, pp. 686-700, April 2009.

This paper describes spurious valleys that appear in the error surfaces of recurrent networks. It also explains how training algorithms can be modified to avoid becoming stuck in these valleys.

**[Hebb49]** Hebb,* *D.O., *The
Organization of Behavior*, New York: Wiley, 1949.

This book proposed neural network architectures and the first learning rule. The learning rule is used to form a theory of how collections of cells might form a concept.

**[Himm72]** Himmelblau, D.M., *Applied
Nonlinear Programming*, New York: McGraw-Hill, 1972.

**[HuSb92] **Hunt, K.J., D. Sbarbaro,
R. Zbikowski, and P.J. Gawthrop, Neural Networks for Control System
— A Survey," *Automatica*, Vol. 28,
1992, pp. 1083–1112.

**[JaRa04]** Jayadeva and S.A.Rahman,
"A neural network with O(N) neurons for ranking N numbers in
O(1/N) time," *IEEE Transactions on Circuits and Systems
I: Regular Papers*, Vol. 51, No. 10, 2004, pp. 2044–2051.

**[Joll86]** Jolliffe, I.T., *Principal
Component Analysis*, New York: Springer-Verlag, 1986.

**[KaGr96]** Kamwa, I., R. Grondin,
V.K. Sood, C. Gagnon, Van Thich Nguyen, and J. Mereb, "Recurrent
neural networks for phasor detection and adaptive identification in
power system control and protection," *IEEE Transactions
on Instrumentation and Measurement*, Vol. 45, No. 2, 1996,
pp. 657–664.

**[Koho87] **Kohonen, T., *Self-Organization
and Associative Memory*,* 2nd Edition*,
Berlin: Springer-Verlag, 1987.

This book analyzes several learning rules. The Kohonen learning rule is then introduced and embedded in self-organizing feature maps. Associative networks are also studied.

**[Koho97] **Kohonen, T., *Self-Organizing
Maps*, Second Edition, Berlin: Springer-Verlag, 1997.

This book discusses the history, fundamentals, theory, applications, and hardware of self-organizing maps. It also includes a comprehensive literature survey.

**[LiMi89] **Li, J., A.N. Michel,
and W. Porod, "Analysis and synthesis of a class of neural
networks: linear systems operating on a closed hypercube," *IEEE
Transactions on Circuits and Systems*, Vol. 36, No. 11,
1989, pp. 1405–1422.

This paper discusses a class of neural networks described by
first-order linear differential equations that are defined on a closed
hypercube. The systems considered retain the basic structure of the
Hopfield model but are easier to analyze and implement. The paper
presents an efficient method for determining the set of asymptotically
stable equilibrium points and the set of unstable equilibrium points.
Examples are presented. The method of Li, et. al., is implemented
in Advanced Topics in the *User's Guide*.

**[Lipp87]** Lippman, R.P., "An
introduction to computing with neural nets," *IEEE
ASSP Magazine*, 1987, pp. 4–22.

This paper gives an introduction to the field of neural nets by reviewing six neural net models that can be used for pattern classification. The paper shows how existing classification and clustering algorithms can be performed using simple components that are like neurons. This is a highly readable paper.

**[MacK92]** MacKay, D.J.C., "Bayesian
interpolation," *Neural Computation*, Vol. 4, No. 3, 1992, pp. 415–447.

**[Marq63]** Marquardt, D., "An
Algorithm for Least-Squares Estimation of Nonlinear Parameters," *SIAM
Journal on Applied Mathematics*, Vol. 11, No.
2, June 1963, pp. 431–441.

**[McPi43]** McCulloch, W.S., and
W.H. Pitts, "A logical calculus of ideas immanent in nervous
activity," *Bulletin of Mathematical Biophysics*,
Vol. 5, 1943, pp. 115–133.

A classic paper that describes a model of a neuron that is binary and has a fixed threshold. A network of such neurons can perform logical operations.

**[MeJa00]** Medsker, L.R., and
L.C. Jain, *Recurrent neural networks: design and applications*,
Boca Raton, FL: CRC Press, 2000.

**[Moll93]** Moller, M.F., "A
scaled conjugate gradient algorithm for fast supervised learning," *Neural
Networks*, Vol. 6, 1993, pp. 525–533.

**[MuNe92]** Murray, R., D. Neumerkel,
and D. Sbarbaro, "Neural Networks for Modeling and Control
of a Non-linear Dynamic System," *Proceedings of the
1992 IEEE International Symposium on Intelligent Control*,
1992, pp. 404–409.

**[NaMu97]** Narendra, K.S., and
S. Mukhopadhyay, "Adaptive Control Using Neural Networks and
Approximate Models," *IEEE Transactions on Neural
Networks*, Vol. 8, 1997, pp. 475–485.

**[NaPa91]** Narendra, Kumpati
S. and Kannan Parthasarathy, "Learning Automata Approach to
Hierarchical Multiobjective Analysis," *IEEE Transactions
on Systems, Man and Cybernetics*, Vol. 20, No. 1, January/February
1991, pp. 263–272.

**[NgWi89]** Nguyen, D., and B.
Widrow, "The truck backer-upper: An example of self-learning
in neural networks," *Proceedings of the International
Joint Conference on Neural Networks*, Vol. 2, 1989, pp.
357–363.

This paper describes a two-layer network that first learned the truck dynamics and then learned how to back the truck to a specified position at a loading dock. To do this, the neural network had to solve a highly nonlinear control systems problem.

**[NgWi90]** Nguyen, D., and B.
Widrow, "Improving the learning speed of 2-layer neural networks
by choosing initial values of the adaptive weights," *Proceedings
of the International Joint Conference on Neural Networks*,
Vol. 3, 1990, pp. 21–26.

Nguyen and Widrow show that a two-layer sigmoid/linear network can be viewed as performing a piecewise linear approximation of any learned function. It is shown that weights and biases generated with certain constraints result in an initial network better able to form a function approximation of an arbitrary function. Use of the Nguyen-Widrow (instead of purely random) initial conditions often shortens training time by more than an order of magnitude.

**[Powe77]** Powell, M.J.D., "Restart
procedures for the conjugate gradient method," *Mathematical
Programming*, Vol. 12, 1977, pp. 241–254.

**[Pulu92]** Purdie, N., E.A. Lucas,
and M.B. Talley, "Direct measure of total cholesterol and its
distribution among major serum lipoproteins," *Clinical
Chemistry*, Vol. 38, No. 9, 1992, pp. 1645–1647.

**[RiBr93]** Riedmiller, M., and
H. Braun, "A direct adaptive method for faster backpropagation
learning: The RPROP algorithm," *Proceedings of the
IEEE International Conference on Neural Networks*,* *1993.

**[Robin94]** Robinson, A.J., "An
application of recurrent nets to phone probability estimation," *IEEE
Transactions on Neural Networks*, Vol. 5 , No. 2, 1994.

**[RoJa96]** Roman, J., and A.
Jameel, "Backpropagation and recurrent neural networks in financial
analysis of multiple stock market returns," *Proceedings
of the Twenty-Ninth Hawaii International Conference on System Sciences*,
Vol. 2, 1996, pp. 454–460.

**[Rose61] **Rosenblatt, F., *Principles
of Neurodynamics*, Washington, D.C.: Spartan Press, 1961.

This book presents all of Rosenblatt's results on perceptrons.
In particular, it presents his most important result, the *perceptron
learning theorem*.

**[RuHi86a]** Rumelhart, D.E.,
G.E. Hinton, and R.J. Williams, "Learning internal representations
by error propagation," in D.E. Rumelhart and J.L. McClelland,
Eds., *Parallel Data Processing*, *Vol.
1*, Cambridge, MA: The M.I.T. Press, 1986, pp. 318–362.

This is a basic reference on backpropagation.

**[RuHi86b]** Rumelhart, D.E.,
G.E. Hinton, and R.J. Williams, "Learning representations by
back-propagating errors," *Nature*, Vol.
323, 1986, pp. 533–536.

**[RuMc86]** Rumelhart, D.E., J.L.
McClelland, and the PDP Research Group, Eds., *Parallel Distributed
Processing*,* Vols. 1 and 2*, Cambridge,
MA: The M.I.T. Press, 1986.

These two volumes contain a set of monographs that present a technical introduction to the field of neural networks. Each section is written by different authors. These works present a summary of most of the research in neural networks to the date of publication.

**[Scal85]** Scales, L.E., *Introduction
to Non-Linear Optimization*, New York: Springer-Verlag,
1985.

**[SoHa96]** Soloway, D., and P.J.
Haley, "Neural Generalized Predictive Control," *Proceedings
of the 1996 IEEE International Symposium on Intelligent Control*,
1996, pp. 277–281.

**[VoMa88]** Vogl, T.P., J.K. Mangis,
A.K. Rigler, W.T. Zink, and D.L. Alkon, "Accelerating the convergence
of the backpropagation method," *Biological Cybernetics*,
Vol. 59, 1988, pp. 256–264.

Backpropagation learning can be speeded up and made less sensitive to small features in the error surface such as shallow local minima by combining techniques such as batching, adaptive learning rate, and momentum.

**[WaHa89]** Waibel, A., T. Hanazawa,
G. Hinton, K. Shikano, and K. J. Lang, "Phoneme recognition
using time-delay neural networks," *IEEE Transactions
on Acoustics, Speech, and Signal Processing*, Vol. 37, 1989,
pp. 328–339.

**[Wass93]** Wasserman, P.D., *Advanced
Methods in Neural Computing*, New York: Van Nostrand Reinhold,
1993.

**[WeGe94]** Weigend, A. S., and
N. A. Gershenfeld, eds., *Time Series Prediction: Forecasting
the Future and Understanding the Past*, Reading, MA: Addison-Wesley,
1994.

**[WiHo60] **Widrow, B., and M.E.
Hoff, "Adaptive switching circuits," *1960
IRE WESCON Convention Record, New York IRE*, 1960, pp. 96–104.

**[WiSt85] **Widrow, B., and S.D.
Sterns, *Adaptive Signal Processing*, New York:
Prentice-Hall, 1985.

This is a basic paper on adaptive signal processing.

Was this topic helpful?