This book provides an introduction to statistical pattern recognition theory and techniques. Most of the material presented in this book is concerned with discrimination and classification and has been drawn from a wide range of literature including that of engineering, statistics, computer science and the social sciences. The book is an attempt to provide a concise volume containing descriptions of many of the most useful of today's pattern processing techniques including many of the recent advances in nonparametric approaches to discrimination developed in the statistics literature and elsewhere. The techniques are illustrated with examples of real-world applications studies. Pointers are also provided to the diverse literature base where further details on applications, comparative studies and theoretical developments may be obtained. Statistical pattern recognition is a very active area of research. Many advances over recent years have been due to the increased computational power available, enabling some techniques to have much wider applicability. Most of the chapters in this book have concluding sections that describe, albeit briefly, the wide range of practical applications that have been addressed and further developments of theoretical techniques.
Thus, the book is aimed at practitioners in the `field' of pattern recognition (if such a multidisciplinary collection of techniques can be termed a field) as well as researchers in the area. Also, some of this material has been presented as part of a graduate course on information technology. A prerequisite is a knowledge of basic probability theory and linear algebra, together with basic knowledge of mathematical methods (the use of Lagrange multipliers to solve problems with equality and inequality constraints, for example). Some basic material is presented as appendices. The exercises at the ends of the chapters vary from `open book' questions to more lengthy computer projects.
Chapter 1 provides an introduction to statistical pattern recognition, defining some terminology, introducing supervised and unsupervised classification. Two related approaches to supervised classification are presented: one based on the estimation of probability density functions and a second based on the construction of discriminant functions. The chapter concludes with an outline of the pattern recognition cycle, putting the remaining chapters of the book into context. Chapters 2 and 3 pursue the density function approach to discrimination, with Chapter 2 addressing parametric approaches to density estimation and Chapter 3 developing classifiers based on nonparametric schemes.
Chapters 4-7 develop discriminant function approaches to supervised classification. Chapter 4 focuses on linear discriminant functions; much of the methodology of this chapter (including optimisation, regularisation, support vector machines) is used in some of the nonlinear methods. Chapter 5 explores kernel-based methods, in particular, the radial basis function network and the support vector machine, techniques for discrimination and regression that have received widespread study in recent years. Related nonlinear models (projection-based methods) are described in Chapter 6. Chapter 7 considers a decision tree approach to discrimination, describing the CART (classification and regression tree) methodology and MARS (multivariate adaptive regression splines).
Chapter 8 considers performance: measuring the performance of a classifier and improving the performance by classifier combination. The techniques of Chapters 9 and 10 may be described as methods of exploratory data analysis or preprocessing (and as such would usually be carried out prior to the supervised classification techniques of Chapters 2-7, although they could, on occasion, be post-processors of supervised techniques). Chapter 9 addresses feature selection and feature extraction - the procedures for obtaining a reduced set of variables characterising the original data. Such procedures are often an integral part of classifier design and it is somewhat artificial to partition the pattern recognition problem into separate processes of feature extraction and classification. However, feature extraction may provide insights into the data structure and the type of classifier to employ; thus, it is of interest in its own right. Chapter 10 considers unsupervised classification or clustering - the process of grouping individuals in a population to discover the presence of structure; its engineering application is to vector quantisation from image and speech coding.
Finally, Chapter 11 addresses some important diverse topics including model selection. Appendices largely cover background material and material appropriate if this book is used as a text for a 'conversion course': measures of dissimilarity, estimation, linear algebra, data analysis and basic probability.
The website www.statistical-pattern-recognition.net contains references and links to further information on techniques and applications. In preparing the second edition of this book I have been helped by many people. I am grateful to colleagues and friends who have made comments on various parts of the manuscript. In particular, I would like to thank Mark Briers, Keith Copsey, Stephen Luttrell, John O'Loghlen and Kevin Weekes (with particular thanks to Keith for examples in Chapter 2); Wiley for help in the final production of the manuscript; and especially Rosemary for her support and patience