This chapter focuses on audio analysis methods that take into account the temporal evolution of the audio phenomena. This is done by preserving the short-term nature of the feature sequences, in order to either create methods that align two feature sequences or build temporal audio representations using Hidden Markov Models.
Keywords
Sequence alignment
Template matching
Dynamic time warping
DTW
Cost grid
Sakoe-Chiba
Itakura
Smith-Waterman
Hidden Markov Model
HMM
Baum-Welch
Viterbi algorithm
Trellis diagramn
Mixture of Gaussians
This chapter presents several methods that take into account the temporal evolution of the audio phenomena. In other words, we are no longer interested in computing mid-term feature statistics or long-term averages from the audio signals. On the contrary, our main concern is to preserve the short-term nature of the feature sequences in order to (a) devise techniques that are capable of aligning two feature sequences, and (b) build parametric, temporal representations of the audio signals by means of the Hidden Markov modeling methodology. The two goals are complementary in the sense that the former involves non-parametric approaches, whereas the latter aims at deriving parametric stochastic representations of the time-varying nature of the signals. The choice of method depends on the application and the availability of training data. Sometimes non-parametric techniques are the only option, as is, for example, the case with query-by-example scenarios where the user provides an audio example that needs to be detected in a corpus of audio recordings. On the other hand, if sufficient training data are available, it might be preferable to build parametric models, as has traditionally been the case in the speech recognition literature. The chapter starts with sequence alignment techniques and proceeds with a description of the Hidden Markov models (HMMs) and related concepts. We have found it necessary to provide lengthier theoretic descriptions in this chapter. However, care has been taken, so that the presentation is given from an engineering perspective, in order to facilitate the realization of practical solutions.