Backpropagation: Theory, Architectures, and Applications PDF

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Backpropagation: Theory, Architectures, and Applications

BACKPROPAGATION Theory, Architectures, and Applications Copyrighted Material DEVELOPMENTS IN CONNECTIONIST THEORY David E. Rumelhart, Editor Gluck/Rumelhart • Neuroscience and Connectionist Theory Ramsey/Stich/Rumelhart • Philosophy and Connectionist Theory Chauvin/Rumelhart • Backpropagation: Theory, Architectures, and Applications Copyrighted Material BACKPROPAGATION Theory, Architectures, and Applications Edited by Yves Chauvin Stanford University and Net-ID, Inc. David E. Rumelhart Department of Psychology Stanford University LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS 1995 Hillsdale, New Jersey Hove, UK Copyrighted Material Copyright © 1995 by Lawrence Erlbaum Associates, Inc. All rights reserved. No part of this book may be reproduced in any form, by photostat, microform, retrieval system, or any other means, without the prior written permission of the publisher. Lawrence Erlbaum Associates, Inc., Publishers 365 Broadway Hillsdale, New Jersey 07642 Library of Congress Cataloging-in-Publication Data Backpropagation : theory, architectures, and applications / edited by Yves Chauvin and David E. Rumelhart. p. cm. Includes bibliographical references and index. ISBN 0-8058-1258-X (alk. paper). — ISBN 0-8058-1259-8 (pbk. : alk. paper) 1. Backpropagation (Artificial intelligence) I. Chauvin, Yves, Ph. D. II. Rumelhart, David E. Q327.78.B33 1994 006.3—dc20 94-24248 CIP Books published by Lawrence Erlbaum Associates are printed on acid-free paper, and their bindings are chosen for strength and durability. Printed in the United States of America 10 9 8 7 6 5 4 3 21 Copyrighted Material Contents Preface vii 1. Backpropagation: The Basic Theory 1 David E. Rummelhart, Richard Durbin, Richard Golden, and Yves Chauvin 2. Phoneme Recognition Using Time-Delay Neural Networks 35 Alexander Waibel, Toshiyuki Hanazawa, Geoffrey Hinton, Kiyohiro Shikano, and Kevin J. Lang 3. Automated Aircraft Flare and Touchdown Control Using Neural Networks 63 Charles Schley, Yves Chauvin, and Van Henkle 4. Recurrent Backpropagation Networks 99 Fernando J. Pineda 5. A Focused Backpropagation Algorithm for Temporal Pattern Recognition 137 Michael C. Mozer 6. Nonlinear Control with Neural Networks 171 Derrick H. Nguyen and Bernard Widrow v Copyrighted Material vi CONTENTS 7. Forward Models: Supervised Learning with a Distal Teacher 189 Michael I. Jordan and David E. Rumelhart 8. Backpropagation: Some Comments and Variations 237 Stephen Jose Hanson 9. Graded State Machines: The Representation of Temporal Contingencies in Feedback Networks 273 Axel Cleeremans, David Servan-Schreiber, and James L. McClelland 10. Spatial Coherence as an Internal Teacher for a Neural Network 313 Suzanna Becker and Geoffrey E. Hinton 11. Connectionist Modeling and Control of Finite State Systems Given Partial State Information 351 Jonathan R. Bachrach and Michael C. Mozer 12. Backpropagation and Unsupervised Learning in Linear Networks 389 Pierre Baldi, Yves Chauvin, and Kurt Hornik 13. Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity 433 Ronald J. Williams and David Zipser 14. When Neural Networks Play Sherlock Holmes 487 Pierre Baldi and Yves Chauvin 15. Gradient Descent Learning Algorithms: A Unified Perspective 509 Pierre Baldi Author Index 543 Subject Index 549 Copyrighted Material Preface Almost ten years have passed since the publication of the now classic volumes Parallel Distributed Processing: Explorations in the Microstruc- ture of Cognition. These volumes marked a renewal in the study of brain-in spired computations as models of human cognition. Since the publication of these two volumes, thousands of scientists and engineers have joined the study of Artificial Neural Networks (or Parallel Distributed Processing) to attempt to respond to three fundamental questions: (1) how does the brain work? (2) how does the mind work? (3) how could we design machines with equivalent or greater capabilities than biological (including human) brains? Progress in the last 10 years has given us a better grasp of the complexity of these three problems. Although connectionist neural networks have shed a feeble light on the first question, it has become clear that biological neurons and computations are more complex than their metaphorical con nectionist equivalent by several orders of magnitude. Connectionist models of various brain areas, such as the hippocampus, the cerebellum, the olfac tory bulb, or the visual and auditory cortices have certainly helped our understanding of their functions and internal mechanisms. But by and large, the biological metaphor has remained a metaphor. And neurons and synapses still remain much more mysterious than hidden units and weights. Artificial neural networks have inspired not only biologists but also psychologists, perhaps more directly interested in the second question. Al though the need for brain-inspired computations as models of the workings of the mind is still controversial, PDP models have been successfully used to model a number of behavioral observations in cognitive, and more rarely, clinical or social psychology. Most of the results are based on models of perception, language, memory, learning, categorization, and control. These results, however, cannot pretend to represent the beginning of a general understanding of the human psyche. First, only a small fraction of the large quantity of data amassed by experimental psychologists has been examined by neural network researchers. Second, some higher levels of human cogni tion, such as problem solving, judgment, reasoning, or decision making rarely have been addressed by the connectionist community. Third, most models of experimental data remain qualitative and limited in scope: No general connectionist theory has been proposed to link the various aspects of cognitive processes into a general computational framework. Overall, the vii Copyrighted Material viii PREFACE possibility of an artificial machine that could learn how to function in the world with a reasonable amount of intelligence, communication, or "com mon sense" remains far away from our current state of knowledge. It is perhaps on the third problem, the design of artificial learning sys tems, expert in specific tasks, that connectionist approaches have made their best contribution. Such models have had an impact in many different dis ciplines, most of them represented in this volume. This trend is in part the result of advances in computer, communication, and data acquisition tech nologies. As databases of information are becoming ubiquitous in many fields, corresponding accurate models of the data-generating process are often unavailable. It is in these areas that machine learning approaches are making their greatest impact. And it is here that connectionist approaches are beneficially interbreeding with several other related disciplines such as statistical mechanics, statistical pattern recognition, signal processing, sta tistical inference, and information and decision theory. It may be seen as somewhat of a disappointment to the great excitement of the late 1980s that the idea of "intelligent general learning systems" has to yield to local, specialized, often handcrafted neural networks with limited generalization capabilities. But it is also interesting to realize that prior domain knowledge needs to be introduced to constrain network architectures and statistical performance measures if these networks are to learn and generalize. With hindsight, this realization certainly appears to be a sign of maturity in the field. The most influential piece of work in the PDP volumes was certainly Chapter 8, "Learning Interal Representations by Error Propagation." Since the original publication of the PDP volumes, the back propagation algorithm has been implemented in many different forms by many different researchers in different fields. The algorithm shows that complex mappings between input and target patterns could be learned in an elegant and practical way by non-linear connectionist networks. It also overcomes many limitations asso ciated with neural network learning algorithms of the previous generation, such as the perceptron algorithm. At the same time, the back-propagation algorithm includes the basic ingredients of the general connectionist recipe: local computations, global optimization, and parallel operation. But most interestingly, the algorithm showed that input-output mappings could be created during learning by the discovery of internal representations of the training data. These representations were sometimes clever, nontrivial, and not originally intended or even imagined by the human designer of the back-propagation network architectures. In the 1960s and 1970s, the cogni tive psychology revolution was partially triggered by the realization that such internal representations were necessary to explain intelligent behavior beyond the scope of stimulus-response theory. The internal representations learned by the back-propagation algorithm had an "intelligent flavor" that was difficult for artificial intelligence researchers to ignore. Altogether, these features contributed to the success of back propagation as a versatile Copyrighted Material PREFACE ix tool for computer modellers, engineers, and cognitive scientists in general. This present volume can be seen as a progress report on the third problem achieved through a deeper exploration of the back-propagation algorithm. The volume contains a variety of new articles that represent a global per spective on the algorithm and show new practical applications. We have also included a small number of articles that appeared over the last few years and had an impact on our understanding of the back-propagation mechanism. The chapters distinguish the theory of back propagation from architectures and applications. The theoretical chapters relate back-propagation principles to statistics, pattern recognition, and dynamical system theory. They show that back-propagation networks can be viewed as non-parametrized, non linear, structured, statistical models. The architectures and applications chapters then show successful implementations of the algorithm for speech processing, fingerprint recognition, process control, etc. We intend this volume to be useful not only to students in the field of artificial neural networks, but also to professionals who are looking for concrete applications of learning machine systems in general and of the back-propagation algorithm in particular. From the theory section, readers should be able to relate neural networks to their own background in physics, statistics, information or control theory. From the examples, they should be able to generalize the design principles and construct their own architectures optimally adapted to their problems, from medical diagnosis to financial prediction to protein analysis. Considering our current stage of knowledge, there is still a lot of terrain to be explored in the back-propagation landscape. The future of back propa gation and of related machine learning techniques resides in their effective ness as practical solutions to real world problems. The recent creation of start-up companies with core technologies based on these mechanisms shows that the engineering world is paying attention to the computational advan tages of these algorithms. Success in the competition for cost effective solutions to real-world problems will probably determine if back-propaga tion learning techniques are mature enough to survive. We believe it will be the case. Yves Chauvin and David E. Rumelhart Acknowledgments It would probably take another volume just to thank all the people who contributed to the existence of this volume. The first editor would like to mention two of them: Marie-Thérèse and René Chauvin. Copyrighted Material