One-Sample Kolmogorov-Smirnov
The Kolmogorov-Smirnov D test is a goodness-of-fit test which tests whether a given distribution is not significantly different from one hypothesized (ex., on the basis of the assumption of a normal distribution). It is a more powerful alternative to chi-square goodness-of-fit tests when its assumptions are met. Whereas the chi-square test of goodness-of-fit tests whether in general the observed distribution is not significantly different from the hypothesized one, the K-S test tests whether this is so even for the most deviant values of the criterion variable. Thus it is a more stringent test.
Key Concepts and Terms
Observed vs. hypothetical distribution.
The observed distribution is the distribution of the variable in the sample. The hypothetical distribution is the expected distribution of a variable with the same parameters if it conformed to a particular type of distribution. The types supported by SPSS are the normal, Poisson, exponential, or uniform distributions. Other distributions are supported through graphical methods. The expected hypothetical distribution is calculated by consulting the appropriate distribution table. In the simplest case, for a uniform distribution with a range of 1 to 8, the cumulative percentages would be 12.5%, 25%, 37.5%, 50%, 62.5%, 75%, 87.5%, and 100%. D would then be the largest difference between one of these theoretical cumulative percentages and the corresponding empirical cumulative percentage, as discussed below.
Kolmogov-Smirnov D.
The D value is the largest absolute difference between the cumulative observed proportion and the cumulative proportion expected on the basis of the hypothesized distribution. The computed D is compared to a table of critical values of D in the Kolmogorov-Smirnov One-Sample Test, for a given sample size (cf. Massey, 1951). For samples > 35, the critical value at the .05 level is approximately 1.36/SQRT(n), where n = sample size. If the computed D is less than the critical value, the researcher fails to reject the null hypothesis that the distribution of the criterion variable is not different from the hypothesized (ex., normal) distribution. In practice, computer programs like SPSS compute the probability of D directly without need to refer to such a table. SPSS prints the two-tailed significance level, testing the probability that the observed distribution is not significantly deviant from the expected distribution in either direction.
Assumptions
Random sampling is assumed, as with all significance tests.
Level of data. Continuous interval or ratio data are required for the Kolmogorov-Smirnov goodness-of-fit test for exact results. If approximate results are sufficient, ordinal data or grouped interval data may be (and commonly are) used. The K-S test is also used for ordinal data when the large-sample assumptions of the chi-square goodness-of-fit test are not met.
Hypothetical distribution specified in advance. For the normal distribution, the expected sample mean and sample standard deviation must be specified in advance. For the Poisson distribution and the exponential distribution, the expected sample mean must be specfied in advance. For the uniform distribution, the expected range (minimum, maximum values) must be specified in advance. The menu mode of SPSS calculates these expected parameters from the data, but the researcher may specify them manually in the command language in the syntax window, as per the SPSS Base Syntax Reference Guide.
Could I use the Kolmogov-Smirnov goodness-of-fit test to test my data not only against the normal distribution, but also against a series of other distributions?
Multiple comparisons would lower the effective alpha significance level. That is, a series of K-S tests for the same criterion variable might capitalize on chance patterns in the data. This is why it is assumed that the hypothetical distribution is specified in advance.
Where is the Kolmogorov-Smirnov test found in SPSS?
From the SPSS menu, select Statistics, Nonparametric Tests, 1-Sample K-S. In the "One-Sample Kolmogorov-Smirnov Test" dialog box which appears, select the desired test distribution (ex., Normal) and select the desired criterion variable(s) from the variable picklist.
Could goodness-of-fit be assessed graphically instead of by the Kolmogorov-Smirnov test?
Two graphical methods are available, either of which may render K-S testing unnecesary. A graph of empirical by theoretical cumulative distribution functions (cdf's) simply shows the empirical distibution as, say, a dotted line, and the hypothetical distribution, say the normal curve, as a solid line. Alternatively, a quantile-by-quantile plot, such as a plot of quantiles of standard normal, forms a 45-degree line when the observed values are in conformity with the hypothetical distribution. From the SPSS menu, select Graphs, Q-Q or Graphs, P-P to obtain these graphs. The SPSS dialog box supports testing the following distributions: normal, exponential, Weibull, Pareto, lognormal, beta, gamma, logistic, Laplace, uniform, half-normal, chi-square, and Student t.
Bibliography
Massey, F. J. Jr. (1951). The Kolmogorov-Smirnov test of goodness of fit. Journal of the American Statistical Association, Vol. 46. The table of critical values of D is found on p. 70.