When dealing with time series, the first step consists in isolating trends and periodicites. Once this is done, we are left with a normalized time series, and studying the auto-correlation structure is the next step, called model fitting. The purpose is to check whether the underlying data follows some well known stochastic process with a similar auto-correlation structure, such as ARMA processes, using tools such as Box and Jenkins. Once a fit with a specific model is found, model parameters can be estimated and used to make predictions. A deeper investigation consists in isolating the auto-correlations to see whether the remaining values, once decorrelated, behave like white noise, or not. If departure from white noise is found, then it means that the time series in question exhibits unusual patterns not explained by trends, seasonality or auto correlations. This can be useful knowledge in some contexts such as high frequency trading, random number generation, cryptography or cyber-security. The analysis of decorrelated residuals can also help identify change points and instances of slope changes in time series. So, how does one remove auto-correlations in a time series? One of the easiest solution consists at looking at deltas between successive values, after normalization.. Chances are that the auto-correlations in the time series of differences X(t) - X(t-1) are much smaller (in absolute value) than the auto-correlations in the original time series X(t). In the particular case of true random walks (see Figure 1), auto-correlations are extremely high, while auto-correlations measured on the differences are very close to zero. So if you compute the first order auto-correlation on the differences, and find it to be statistically different from zero, then you know that you are not dealing with a random walk, and thus your assumption that the data behaves like a random walk is wrong. Auto correlations are computed as follows. Let X = X(t), X(t-1), ... be the original time series, Y = X(t-1), X(t-2), ... be the lag-1 time series, and Z = X(t-2), X(t-3), ... be the lag-2 time series. The following easily generalizes to lag-3, lag-4 and so on. The first order correlation is defined as correl(X, Y) and the second order correlation is defined as correl(X, Z). Auto-correlations decrease to zero in absolute value, as the order increases. While there is little literature on decorrelating time series, the problem is identical to finding principal components among X, Y, Z and so on, and the linear algebra framework used in PCA can also be used to decorrelate time series, just like PCA is used to decorrelate variables in a traditional regression problem. However, we favor easier but more robust methods -- for instance looking at the deltas X(t) - X(t-1) -- as these methods are not subject to over-fitting yet provide nearly as accurate results as exact methods. Figure 1: Auto-correlations in random walks are always close to +1本帖隐藏的内容
[color=rgb(255, 255, 255) !important]