**[Batt92]** Battiti, R., “First and second order
methods for learning: Between steepest descent and Newton's method,”
*Neural Computation*, Vol. 4, No. 2, 1992,
pp. 141–166.

**[Beal72]** Beale, E.M.L., “A derivation of
conjugate gradients,” in F.A. Lootsma, Ed., *Numerical methods for
nonlinear optimization*, London: Academic Press, 1972.

**[Bren73]** Brent, R.P., *Algorithms for
Minimization Without Derivatives*, Englewood Cliffs, NJ: Prentice-Hall,
1973.

**[Caud89]** Caudill, M., *Neural Networks
Primer*, San Francisco, CA: Miller Freeman Publications, 1989.

This collection of papers from the *AI Expert Magazine* gives an
excellent introduction to the field of neural networks. The papers use a minimum of
mathematics to explain the main results clearly. Several good suggestions for further
reading are included.

**[CaBu92]** Caudill, M., and C. Butler,
*Understanding Neural Networks: Computer Explorations*,*
Vols. 1 and 2*, Cambridge, MA: The MIT Press, 1992.

This is a two-volume workbook designed to give students “hands on” experience with neural networks. It is written for a laboratory course at the senior or first-year graduate level. Software for IBM PC and Apple Macintosh computers is included. The material is well written, clear, and helpful in understanding a field that traditionally has been buried in mathematics.

**[Char92]** Charalambous, C.,“Conjugate gradient
algorithm for efficient training of artificial neural networks,” *IEEE
Proceedings*, Vol. 139, No. 3, 1992, pp. 301–310.

**[ChCo91]** Chen, S., C.F.N. Cowan, and P.M. Grant,
“Orthogonal least squares learning algorithm for radial basis function
networks,” *IEEE Transactions on Neural Networks*, Vol. 2, No.
2, 1991, pp. 302–309.

This paper gives an excellent introduction to the field of radial basis functions. The papers use a minimum of mathematics to explain the main results clearly. Several good suggestions for further reading are included.

**[ChDa99]** Chengyu, G., and K. Danai, “Fault
diagnosis of the IFAC Benchmark Problem with a model-based recurrent neural
network,” *Proceedings of the 1999 IEEE International Conference on
Control Applications*, Vol. 2, 1999, pp. 1755–1760.

**[DARP88]**
*DARPA Neural Network Study*, Lexington, MA: M.I.T. Lincoln
Laboratory, 1988.

This book is a compendium of knowledge of neural networks as they were known to 1988. It presents the theoretical foundations of neural networks and discusses their current applications. It contains sections on associative memories, recurrent networks, vision, speech recognition, and robotics. Finally, it discusses simulation tools and implementation technology.

**[DeHa01a]** De Jesús, O., and M.T. Hagan,
“Backpropagation Through Time for a General Class of Recurrent Network,”
*Proceedings of the International Joint Conference on Neural
Networks*, Washington, DC, July 15–19, 2001, pp. 2638–2642.

**[DeHa01b]** De Jesús, O., and M.T. Hagan,
“Forward Perturbation Algorithm for a General Class of Recurrent Network,”
*Proceedings of the International Joint Conference on Neural
Networks*, Washington, DC, July 15–19, 2001, pp. 2626–2631.

**[DeHa07]** De Jesús, O., and M.T. Hagan,
“Backpropagation Algorithms for a Broad Class of Dynamic Networks,” IEEE
Transactions on Neural Networks, Vol. 18, No. 1, January 2007, pp. 14 -27.

This paper provides detailed algorithms for the calculation of gradients and Jacobians for arbitrarily-connected neural networks. Both the backpropagation-through-time and real-time recurrent learning algorithms are covered.

**[DeSc83]** Dennis, J.E., and R.B. Schnabel,
*Numerical Methods for Unconstrained Optimization and Nonlinear
Equations*, Englewood Cliffs, NJ: Prentice-Hall, 1983.

**[DHH01]** De Jesús, O., J.M. Horn, and M.T. Hagan,
“Analysis of Recurrent Network Training and Suggestions for Improvements,”
*Proceedings of the International Joint Conference on Neural
Networks*, Washington, DC, July 15–19, 2001, pp. 2632–2637.

**[Elma90]** Elman, J.L., “Finding structure in
time,” *Cognitive Science*, Vol. 14, 1990, pp. 179–211.

This paper is a superb introduction to the Elman networks described in Chapter 10,
“Recurrent Networks.”** **

**[FeTs03]** Feng, J., C.K. Tse, and F.C.M. Lau,
“A neural-network-based channel-equalization strategy for chaos-based
communication systems,” *IEEE Transactions on Circuits and Systems I:
Fundamental Theory and Applications*, Vol. 50, No. 7, 2003,
pp. 954–957.

**[FlRe64]**** **Fletcher,
R., and C.M. Reeves, “Function minimization by conjugate gradients,”
*Computer Journal*, Vol. 7, 1964, pp. 149–154.

**[FoHa97]** Foresee, F.D., and M.T. Hagan,
“Gauss-Newton approximation to Bayesian regularization,”
*Proceedings of the 1997 International Joint Conference on Neural
Networks*, 1997, pp. 1930–1935.

**[GiMu81]** Gill, P.E., W. Murray, and M.H. Wright,
*Practical Optimization*, New York: Academic Press, 1981.

**[GiPr02]** Gianluca, P., D. Przybylski, B. Rost, P.
Baldi, “Improving the prediction of protein secondary structure in three and
eight classes using recurrent neural networks and profiles,” *Proteins:
Structure, Function, and Genetics*, Vol. 47, No. 2, 2002, pp.
228–235.

**[Gros82]** Grossberg, S., *Studies of the Mind
and Brain*, Drodrecht, Holland: Reidel Press, 1982.

This book contains articles summarizing Grossberg's theoretical psychophysiology work up to 1980. Each article contains a preface explaining the main points.

**[HaDe99]** Hagan, M.T., and H.B. Demuth, “Neural
Networks for Control,” *Proceedings of the 1999 American Control
Conference*, San Diego, CA, 1999, pp. 1642–1656.

**[HaJe99]** Hagan, M.T., O. De Jesus, and R. Schultz,
“Training Recurrent Networks for Filtering and Control,” Chapter 12 in
*Recurrent Neural Networks: Design and Applications*, L. Medsker
and L.C. Jain, Eds., CRC Press, pp. 311–340.

**[HaMe94]** Hagan, M.T., and M. Menhaj, “Training
feed-forward networks with the Marquardt algorithm,” *IEEE Transactions
on Neural Networks*, Vol. 5, No. 6, 1999, pp. 989–993, 1994.

This paper reports the first development of the Levenberg-Marquardt algorithm for neural networks. It describes the theory and application of the algorithm, which trains neural networks at a rate 10 to 100 times faster than the usual gradient descent backpropagation method.

**[HaRu78]** Harrison, D., and Rubinfeld, D.L.,
“Hedonic prices and the demand for clean air,” *J. Environ.
Economics & Management*, Vol. 5, 1978, pp. 81-102.

This data set was taken from the StatLib library, which is maintained at Carnegie Mellon University.

**[HDB96]** Hagan, M.T., H.B. Demuth, and M.H. Beale,
*Neural Network Design*, Boston, MA: PWS Publishing, 1996.

This book provides a clear and detailed survey of basic neural network architectures and learning rules. It emphasizes mathematical analysis of networks, methods of training networks, and application of networks to practical engineering problems. It has example programs, an instructor’s guide, and transparency overheads for teaching.

**[HDH09]** Horn, J.M., O. De Jesús and M.T. Hagan,
“Spurious Valleys in the Error Surface of Recurrent Networks - Analysis and
Avoidance,” IEEE Transactions on Neural Networks, Vol. 20, No. 4, pp. 686-700,
April 2009.

This paper describes spurious valleys that appear in the error surfaces of recurrent networks. It also explains how training algorithms can be modified to avoid becoming stuck in these valleys.

**[Hebb49]** Hebb,* *D.O.,
*The Organization of Behavior*, New York: Wiley, 1949.

This book proposed neural network architectures and the first learning rule. The learning rule is used to form a theory of how collections of cells might form a concept.

**[Himm72]** Himmelblau, D.M., *Applied
Nonlinear Programming*, New York: McGraw-Hill, 1972.

**[HuSb92] **Hunt, K.J., D. Sbarbaro, R. Zbikowski, and
P.J. Gawthrop, Neural Networks for Control System — A Survey,”
*Automatica*, Vol. 28, 1992, pp. 1083–1112.

**[JaRa04]** Jayadeva and S.A.Rahman, “A neural
network with O(N) neurons for ranking N numbers in O(1/N) time,” *IEEE
Transactions on Circuits and Systems I: Regular Papers*, Vol. 51, No. 10,
2004, pp. 2044–2051.

**[Joll86]** Jolliffe, I.T., *Principal
Component Analysis*, New York: Springer-Verlag, 1986.

**[KaGr96]** Kamwa, I., R. Grondin, V.K. Sood, C. Gagnon,
Van Thich Nguyen, and J. Mereb, “Recurrent neural networks for phasor detection
and adaptive identification in power system control and protection,”
*IEEE Transactions on Instrumentation and Measurement*, Vol. 45,
No. 2, 1996, pp. 657–664.

**[Koho87] **Kohonen, T., *Self-Organization and
Associative Memory*,* 2nd Edition*, Berlin:
Springer-Verlag, 1987.

This book analyzes several learning rules. The Kohonen learning rule is then introduced and embedded in self-organizing feature maps. Associative networks are also studied.

**[Koho97] **Kohonen, T., *Self-Organizing
Maps*, Second Edition, Berlin: Springer-Verlag, 1997.

This book discusses the history, fundamentals, theory, applications, and hardware of self-organizing maps. It also includes a comprehensive literature survey.

**[LiMi89] **Li, J., A.N. Michel, and W. Porod,
“Analysis and synthesis of a class of neural networks: linear systems operating
on a closed hypercube,” *IEEE Transactions on Circuits and
Systems*, Vol. 36, No. 11, 1989, pp. 1405–1422.

This paper discusses a class of neural networks described by first-order linear
differential equations that are defined on a closed hypercube. The systems considered
retain the basic structure of the Hopfield model but are easier to analyze and
implement. The paper presents an efficient method for determining the set of
asymptotically stable equilibrium points and the set of unstable equilibrium points.
Examples are presented. The method of Li, et. al., is implemented in Advanced Topics in
the *User’s Guide*.

**[Lipp87]** Lippman, R.P., “An introduction to
computing with neural nets,” *IEEE ASSP Magazine*, 1987, pp.
4–22.

This paper gives an introduction to the field of neural nets by reviewing six neural net models that can be used for pattern classification. The paper shows how existing classification and clustering algorithms can be performed using simple components that are like neurons. This is a highly readable paper.

**[MacK92]** MacKay, D.J.C., “Bayesian
interpolation,” *Neural Computation*, Vol. 4, No. 3,
1992, pp. 415–447.

**[Marq63]** Marquardt, D., “An Algorithm for
Least-Squares Estimation of Nonlinear Parameters,” *SIAM Journal on
Applied Mathematics*, Vol. 11, No. 2, June 1963,
pp. 431–441.

**[McPi43]** McCulloch, W.S., and W.H. Pitts, “A
logical calculus of ideas immanent in nervous activity,” *Bulletin of
Mathematical Biophysics*, Vol. 5, 1943, pp. 115–133.

A classic paper that describes a model of a neuron that is binary and has a fixed threshold. A network of such neurons can perform logical operations.

**[MeJa00]** Medsker, L.R., and L.C. Jain,
*Recurrent neural networks: design and applications*, Boca Raton,
FL: CRC Press, 2000.

**[Moll93]** Moller, M.F., “A scaled conjugate
gradient algorithm for fast supervised learning,” *Neural
Networks*, Vol. 6, 1993, pp. 525–533.

**[MuNe92]** Murray, R., D. Neumerkel, and D. Sbarbaro,
“Neural Networks for Modeling and Control of a Non-linear Dynamic System,”
*Proceedings of the 1992 IEEE International Symposium on Intelligent
Control*, 1992, pp. 404–409.

**[NaMu97]** Narendra, K.S., and S. Mukhopadhyay,
“Adaptive Control Using Neural Networks and Approximate Models,”
*IEEE Transactions on Neural Networks*, Vol. 8, 1997, pp.
475–485.

**[NaPa91]** Narendra, Kumpati S. and Kannan
Parthasarathy, “Learning Automata Approach to Hierarchical Multiobjective
Analysis,” *IEEE Transactions on Systems, Man and
Cybernetics*, Vol. 20, No. 1, January/February 1991, pp. 263–272.

**[NgWi89]** Nguyen, D., and B. Widrow, “The truck
backer-upper: An example of self-learning in neural networks,”
*Proceedings of the International Joint Conference on Neural
Networks*, Vol. 2, 1989, pp. 357–363.

This paper describes a two-layer network that first learned the truck dynamics and then learned how to back the truck to a specified position at a loading dock. To do this, the neural network had to solve a highly nonlinear control systems problem.

**[NgWi90]** Nguyen, D., and B. Widrow, “Improving
the learning speed of 2-layer neural networks by choosing initial values of the adaptive
weights,” *Proceedings of the International Joint Conference on Neural
Networks*, Vol. 3, 1990, pp. 21–26.

Nguyen and Widrow show that a two-layer sigmoid/linear network can be viewed as performing a piecewise linear approximation of any learned function. It is shown that weights and biases generated with certain constraints result in an initial network better able to form a function approximation of an arbitrary function. Use of the Nguyen-Widrow (instead of purely random) initial conditions often shortens training time by more than an order of magnitude.

**[Powe77]** Powell, M.J.D., “Restart procedures
for the conjugate gradient method,” *Mathematical
Programming*, Vol. 12, 1977, pp. 241–254.

**[Pulu92]** Purdie, N., E.A. Lucas, and M.B. Talley,
“Direct measure of total cholesterol and its distribution among major serum
lipoproteins,” *Clinical Chemistry*, Vol. 38, No. 9, 1992, pp.
1645–1647.

**[RiBr93]** Riedmiller, M., and H. Braun, “A
direct adaptive method for faster backpropagation learning: The RPROP algorithm,”
*Proceedings of the IEEE International Conference on Neural
Networks*,* *1993.

**[Robin94]** Robinson, A.J., “An application of
recurrent nets to phone probability estimation,” *IEEE Transactions on
Neural Networks*, Vol. 5 , No. 2, 1994.

**[RoJa96]** Roman, J., and A. Jameel,
“Backpropagation and recurrent neural networks in financial analysis of multiple
stock market returns,” *Proceedings of the Twenty-Ninth Hawaii
International Conference on System Sciences*, Vol. 2, 1996, pp. 454–460.

**[Rose61] **Rosenblatt, F., *Principles of
Neurodynamics*, Washington, D.C.: Spartan Press, 1961.

This book presents all of Rosenblatt's results on perceptrons. In particular, it
presents his most important result, the *perceptron learning
theorem*.

**[RuHi86a]** Rumelhart, D.E., G.E. Hinton, and R.J.
Williams, “Learning internal representations by error propagation,” in
D.E. Rumelhart and J.L. McClelland, Eds., *Parallel Data Processing*,
*Vol. 1*, Cambridge, MA: The M.I.T. Press, 1986, pp.
318–362.

This is a basic reference on backpropagation.

**[RuHi86b]** Rumelhart, D.E., G.E. Hinton, and R.J.
Williams, “Learning representations by back-propagating errors,”
*Nature*, Vol. 323, 1986, pp. 533–536.

**[RuMc86]** Rumelhart, D.E., J.L. McClelland, and the
PDP Research Group, Eds., *Parallel Distributed
Processing*,* Vols. 1 and 2*, Cambridge, MA: The M.I.T.
Press, 1986.

These two volumes contain a set of monographs that present a technical introduction to the field of neural networks. Each section is written by different authors. These works present a summary of most of the research in neural networks to the date of publication.

**[Scal85]** Scales, L.E., *Introduction to
Non-Linear Optimization*, New York: Springer-Verlag, 1985.

**[SoHa96]** Soloway, D., and P.J. Haley, “Neural
Generalized Predictive Control,” *Proceedings of the 1996 IEEE
International Symposium on Intelligent Control*, 1996, pp. 277–281.

**[VoMa88]** Vogl, T.P., J.K. Mangis, A.K. Rigler, W.T.
Zink, and D.L. Alkon, “Accelerating the convergence of the backpropagation
method,” *Biological Cybernetics*, Vol. 59, 1988, pp.
256–264.

Backpropagation learning can be speeded up and made less sensitive to small features in the error surface such as shallow local minima by combining techniques such as batching, adaptive learning rate, and momentum.

**[WaHa89]** Waibel, A., T. Hanazawa, G. Hinton, K.
Shikano, and K. J. Lang, “Phoneme recognition using time-delay neural
networks,” *IEEE Transactions on Acoustics, Speech, and Signal
Processing*, Vol. 37, 1989, pp. 328–339.

**[Wass93]** Wasserman, P.D., *Advanced Methods
in Neural Computing*, New York: Van Nostrand Reinhold, 1993.

**[WeGe94]** Weigend, A. S., and N. A. Gershenfeld, eds.,
*Time Series Prediction: Forecasting the Future and Understanding the
Past*, Reading, MA: Addison-Wesley, 1994.

**[WiHo60] **Widrow, B., and M.E. Hoff, “Adaptive
switching circuits,” *1960 IRE WESCON Convention Record, New York
IRE*, 1960, pp. 96–104.

**[WiSt85] **Widrow, B., and S.D. Sterns,
*Adaptive Signal Processing*, New York: Prentice-Hall, 1985.

This is a basic paper on adaptive signal processing.