楼主: SPSSCHEN
10662 27

[学习资料] Tests of Normality in SPSS [分享]

hanszhu 发表于 2006-5-1 22:16:00 |显示全部楼层

Dear Experts.


When I need to check the normality I create the Kolmogoro-Smirnov test and if Sig <0.05, then the distribution is non-normal otherwise the distribution is normal. Please let me know if it is the right way or there is another way.

Thanks.

Omar.
回复

使用道具 举报

hanszhu 发表于 2006-5-1 22:17:00 |显示全部楼层

Are you using 1-sample K-S test (Analyze -> Nonparametric tests ->
1-sample K-S? It lacks power to detect non-normality.
You should use either K-S with Lilliefors correction or Shapiro-Wilk
test (this last methods is considered the best). Both are available
with EXAMINE.
See the following example:
DATA LIST FREE/lead(F8.1).
BEGIN DATA
0.6 2.6 0.1 1.1 0.4 2.0 0.8 1.3 1.2 1.5 3.2 1.7 1.9 1.9 2.2 5.1
0.2 0.3 0.6 0.7 0.8 1.5 1.7 1.8 1.9 1.9 2.0 2.0 2.1 2.8 3.1 3.9
END DATA.
VARIABLE LABEL lead 'Lead concentration (µmol/24 h)'.
* Shapiro-Wilk & K-S(Lilliefors) *.
EXAMINE
VARIABLES=lead
/PLOT BOXPLOT STEMLEAF NPPLOT
/COMPARE GROUP
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/NOTOTAL.
* One-sample K-S (not corrected) *.
NPAR TESTS
/K-S(NORMAL)= lead.
There is quite a difference in significance between K-S and
K-S(Lilliefors).
HTH
Marta

[此贴子已经被作者于2006-5-1 22:17:36编辑过]

回复

使用道具 举报

hanszhu 发表于 2006-5-1 22:20:00 |显示全部楼层

Hello:

I need to check for bivariate normality but am unclear about how to perform
the procedure in SPSS. I have 33 variables in a data set in which I want to
run factor analysis, but I know there is positive skewness from the
univariate analysis. Here are the specific questions for which I need
advice.

1. Can anyone tell me how to perform the bivariate analysis? If I run a
regression on each pair of variables and request plots, which plots should I
be concerned with? How do I save the residuals? And then, what should I do
with the residuals?

2.If my sample size is approximately 210 (presumably large), should I
not concern myself with multivariate normality?

I have checked the SPSS archives and was unable to find more specific
information on bivariate normality. Thus, I would greatly appreciate your
insights on the topic.

Thanks,

Sealvie

回复

使用道具 举报

hanszhu 发表于 2006-5-1 22:20:00 |显示全部楼层

I would suggest that you test multivariate normality using Mardia's PK statistic, which is available in the PRELIS package. There is a fairly large literature on the use and interpretation of this statistic.

HTH,

KS

For personalized and professional consultation in statistics and research
design, visit
www.statisticsdoc.com

[此贴子已经被作者于2006-5-1 22:24:13编辑过]

回复

使用道具 举报

hanszhu 发表于 2006-5-1 22:22:00 |显示全部楼层

Sealvie,


1. For a SPSS macro to examine bivariate/multivariate normality have a
look at:

http://www.columbia.edu/~ld208/


2. Also look at:

http://www.stat.umn.edu/~drak0020/classes/5021/labs/bivnorm/

The author of this site (Douglas Drake) provided me with the following
citations:


@article{best:rayn:1988,
Author = {Best, D. J. and Rayner, J. C. W.},
Title = {A Test for Bivariate Normality},
Year = 1988,
Journal = {Statistics \& Probability Letters},
Volume = 6,
Pages = {407--412},
Keywords = {Goodness-of-fit; Skewness; Kurtosis}
}

@article{maso:youn:1985,
Author = {Mason, Robert L. and Young, John C.},
Title = {Re-examining Two Tests for Bivariate Normality},
Year = 1985,
Journal = {Communications in Statistics, Part A -- Theory and Methods},
Volume = 14,
Pages = {1531--1546},
Keywords = {Goodness-of-fit; Ring test; Line test}
}

@article{pett:1979,
Author = {Pettitt, A. N.},
Title = {Testing for Bivariate Normality Using the Empirical
Distribution Function},
Year = 1979,
Journal = {Communications in Statistics, Part A -- Theory and Methods},
Volume = 8,
Pages = {699--712},
Keywords = {Goodness of fit; Cramer-von Mises}
}

@article{vita:1978,
Author = {Vitale, Richard A.},
Title = {Joint Vs. Individual Normality},
Year = 1978,
Journal = {Mathematics Magazine},
Volume = 51,
Pages = {123--123},
Keywords = {Bivariate normal distribution}
}


@article{mard:1975,
Author = {Mardia, K. V.},
Title = {Assessment of Multinormality and the Robustness of
{H}otelling's $T^2$ Test},
Year = 1975,
Journal = {Applied Statistics},
Volume = 24,
Pages = {163--171},
Keywords = {Bivariate distributions; Mahalanobis distance; Multivariate
kurtosis; Multivariate skewness; Non-normality; Permutation
test}
}


@article{kowa:1970,
Author = {Kowalski, Charles J.},
Title = {The Performance of Some Rough Tests for Bivariate Normality
Before and After Coordinate Transformations to Normality},
Year = 1970,
Journal = {Technometrics},
Volume = 12,
Pages = {517--544},
Keywords = {Goodness of fit}


3. http://www.stat.nus.edu.sg/~biman/


4. John Marden wrote a paper on the use of various plots. It was to be
published in Statistical Science some time in 2005.

Bob Green

回复

使用道具 举报

hanszhu 发表于 2006-5-1 22:26:00 |显示全部楼层

Happy Holidays to All:


I have a data set where none of the variables that I wish to use in my regression analysis follows the normal distribution. Further some of these variables have extrme outliers (which may account for the violations of normality). What is the best way to deal with these outliers short of excluding them from the analysis given that they account for approx. 8% of the data and can I still run parametric tests even though the assumptions of normaality have been violated.

Any help would be appreciated.

回复

使用道具 举报

hanszhu 发表于 2006-5-1 22:34:00 |显示全部楼层

Hi

The variables you are you talking about, are they dependent or independent?. For regression models, you don't need normally distributed independent (predictor) variables. Moreover, you don't even need that the dependent variable is normally distributed, what you need is that the residuals are normally distributed. Build your model, save the residuals and check their normality (either visually, with a histogram, or mathematically, with Shapiro-Wilk test)

HTH

Marta

[此贴子已经被作者于2006-5-1 22:35:05编辑过]

回复

使用道具 举报

hanszhu 发表于 2006-5-1 22:36:00 |显示全部楼层

Hello all,

regression analysis has the four assumptions which are: 1) the assumption of
linearity 2) the assumptions of independence 3) the assumption of constant
variance and 4) Normality

From a practical viewpoint how do you test this assumptions? Are there
methods in SPSS that I can use for that. From the experimental procedure I
have ensured that each measurement is independent by randomization. However,
is ther a statistical method that can test if or even how well this
assumptions exists in the data? What about the other three assumptions?

Kind Regards,
Karl

回复

使用道具 举报

hanszhu 发表于 2006-5-1 22:37:00 |显示全部楼层

There was a pretty good thread on this general topic, recently, with
subject "Data Screening". You'd do well to look that thread up. I'll
give citations, which are all from that thread; the comments are too
long to post here.

I'll take the assumptions in a convenient order, rather than the order
you've given them.

>4) Normality

The assumption is normal distribution of the residuals, not of the DV
or any IVs. Hector Maletta posted an extensive discussion; date-time is
Wed, 28 Sep 2005 22:08:00 -0300.


>1) the assumption of linearity


Theory may indicate a non-linear relationship, in which case it's
proper to transform the variables so that the theoretically expected
relationship is linear.

If theory is lacking, you usually have to make the assumption of
linearity and live with it, because the data will not show any
deviation from it. That means, however, that within the accuracy of
your measurements, any deviations don't matter.

You can test linearity directly by including higher-order terms,
typically quadratic to start with, in your model, and testing for their
significance as a group. However,
- The quadratic terms can be so highly correlated with the linear terms
that the resulting model can't be estimated. There are formal
procedures to handle this, but if you're using only quadratic terms,
it's usually enough to change the measurement origin of the IVs so
their means are near 0, certainly less than 1 SD.
- Adding quadratic terms, with product terms, adds a lot of degrees of
freedom to the model: n(n+1)/2, if you have n IVs in the model. Very
often you won't have enough data for a model that size.
- Unless you have a pretty high R-square in the linear model (sorry, I
can't give you numbers), you have little hope of 'seeing' non-linear
effects.

See also my posting Wed, 28 Sep 2005 20:17:13 -0400, in the cited
thread. (The discussion of non-linearity starts a ways down the post.)


>2) the assumptions of independence


You said you "have ensured that each measurement is independent by
randomization". Can you say more what your study is, and how you drew
the samples? And, given that you drew randomly from your available
data, could you instead have used all your available data?

It's also assumed that the residuals are statistically uncorrelated
with the IVs (not necessary independent). Nothing much you can do about
that, except adopt the convention that the portion that's correlated
with (explainable by) the IVs is part of the DV, not the residual.

If time is an IV, successive residuals can fail to be independent
because the process hasn't had enough time to change between
measurements. Statistics such as the Durbin-Watson are used for this.
Analogous problems can theoretically arise with very closely-spaced
measurements of other IVs, but I don't think this is considered a
common problem in practice.

>3) the assumption of constant variance [of the residuals]

I.e., homoscedasticity. Sometimes theory will indicate higher
measurement errors in different parts of the DV range, which can
sometimes be addressed by transformations. Hector discussed this in his
post I cited above, dated Wed, 28 Sep 2005 22:08:00 -0300.

If you have an ANOVA problem, i.e. multiple measures for the same
values of the DVs, you can check. See CELLPLOTS command in MANOVA
(Advanced Models module). If you have a variable that you suspect is a
measure of the residual variance, you can use WLS (Regression Models
module).

See also Hector's discussion of residuals, twice cited above.

回复

使用道具 举报

hanszhu 发表于 2006-5-1 22:39:00 |显示全部楼层

1) Check visually with plots. Also, if the values of the IV are
repeated several times (f.i. x=2 happens 3 times, x=3 several times,
and so on), you can use a linearity test using MEANS (I'll give you
more details if you want).

3) You can try White or Breusch-Pagan/Koenker tests
http://www.spsstools.net/Syntax/RegressionRepeatedMeasure/Breusch-PaganAndKoenkerTest.txt
http://www.spsstools.net/Syntax/RegressionRepeatedMeasure/WhiteTestStatisticsAndSignificance.txt

I think both syntaxes are unduly complex (I wasn't a good programmer
when I wrote them, perhaps they can be simplified), but they work.

Technical details for both methods are described here:
http://pages.infinit.net/rlevesqu/spss.htm#Heteroscedasticity

4) Save the residuals:
- If sample size is big (let's say n>100) get a histogram with a
normal curve and check visually for any departures from normality
- If sample size is smaller, then run EXAMINE and ask for normality
tests (Shapiro-Wilk & Kolmogorov-Smirnov(Lilliefors). Also, you can
take a look at skewness/kurtosis coefficients.

HTH,
Marta

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 我要注册

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2020-1-29 11:28