请选择 进入手机版 | 继续访问电脑版
楼主: SPSSCHEN
15102 27

[学习资料] Tests of Normality in SPSS [推广有奖]

hanszhu 发表于 2006-5-1 22:40:00 |显示全部楼层 |坛友微信交流群

Hi to everybody

I got a private request of the syntax to run a linearity test in
regression when you have repeated X values. As I though more people could
be interested, I'm posting it:

(Q) Can you outline how it works? (Unfortunately,
repeating values of the IV aren't very common.)

(A) They aren't, unless you plan them at the design step.

* The following example gives the reaction times to a visual stimulus
in 15 subjects that have taken a certain dose of alcohol (0/40/80 g).

DATA LIST LIST/ id alcohol rtime (3 F4.0).
BEGIN DATA
1 0 3
2 0 1
3 0 2
4 0 4
5 0 2
6 40 5
7 40 3
8 40 4
9 40 6
10 40 6
11 80 7
12 80 5
13 80 6
14 80 8
15 80 7
END DATA.

VAR LABEL rtime 'Reaction time (ms)'
/alcohol 'Alcohol dose (g)'.

* You can do a standard regression analysis, and also, this one *.

MEANS TABLES=rtime BY alcohol
/CELLS MEAN COUNT STDDEV
/STATISTICS LINEARITY .

As you'll see when you run the syntax, you get an ANOVA table where
the between-groups variation is further split into linearity (1 df)
and deviation from linearity (k-2 df). It's non-significant for these
data, showing that the relationship between alcohol dose and reation
time doesn't deviate from linearity.

I have a non-parametric version of this method, based on Kruskal-Wallis
and Cuzick test for monotonic trend, just in case you are interested.

Regards

使用道具

SPSSCHEN 发表于 2006-5-1 22:46:00 |显示全部楼层 |坛友微信交流群

Hello

I seem to recall a posting indicating that ANOVA with percentages
required special consideration. I could not locate the thread in the
archives, hence my question. For example, in an educational study
examining differences in the the way material is presented is there a
reason for using either the number correct (or raw score) on an exam or
the percentage correct (entered as a decimal) in the ANOVA analysis? In
the case of the former the data would range from 0 points correct to a
maximum amount if all answers are correct. In the case of percentages
the data would range from 0.0 to 1.0.

Thanks for your help.


Randy Richter

使用道具

SPSSCHEN 发表于 2006-5-1 22:47:00 |显示全部楼层 |坛友微信交流群

Randy,

I do not see problems with that use of percentages in the DV of an ANOVA.
The thread you recall was, I believe, about using a binary or dummy variable
as the DV, for which the average is equivalent to the proportion choosing
the "1" alternative instead of the zero.

Hector

使用道具

SPSSCHEN 发表于 2006-5-1 22:47:00 |显示全部楼层 |坛友微信交流群

There are three things to watch out for that are associated with
assumptions of ANOVA. Normality, of course, but the test is often robust
to this violation. However, independence and homogeneity of variance are
also potential problems. These often go hand in hand with non-normality.
For instance, with percentage data where the mean percent is less than
.25 or > .75, the mean and variance may not be independent. Likewise, if
one group has a mean percentage around .5 and the other around .2, there
may be a difference in the variances as well.

Paul R. Swank, Ph.D.
Professor, Developmental Pediatrics
Director of Research, Center for Improving the Readiness of Children for
Learning and Education (C.I.R.C.L.E.)
Medical School
UT Health Science Center at Houston


使用道具

SPSSCHEN 发表于 2006-5-1 22:51:00 |显示全部楼层 |坛友微信交流群

Hi,

I've made a multiple linear regression using SPSS by one dependent variable and two indepent variables and all assumptions were satisfied but R squre is very low about 0.3,so I think that is because my variable are not normally distributed that's why I was thinking about transforming my data uasing logarithmic transformation to normal distributio and repeat the regression,but I don't know how to transform them? and do I have to test any other assumptions after applying the transformation?

Thanks

使用道具

SPSSCHEN 发表于 2006-5-1 22:55:00 |显示全部楼层 |坛友微信交流群

Razan,


1. Your variables do not need to be normally distributed in order to use
regression, and even less so in order to get high correlation coefficient.
You are confused by the fact that linear regression requires that residuals,
i.e. random errors of prediction (difference between predicted and observed
values) have a normal distribution both sides of the regression line.

2. A low or near zero linear [multiple] correlation coefficient may be due
to (a) the absence of any systematic relationship between your IV and DV, or (b) the existence of a relationship which is non linear. As an example of
(b), if your scatterplot shows a cloud of points with the shape of a U,
there would be possibly a quadratic relationship but the linear coefficient
may be zero.

3. The method of least squares to estimate regression functions is based on
the assumption of a linear relationship between the variables involved. When the relationship is not linear there are two ways to go: (i) identify the
non-linear function linking the variables, and transform it in some way that
yields a linear function, then apply least squares linear regression; or (b)
approximate a non linear function by means of non-linear regression or
curve-fitting, which do not use the least squares algorithm. Some non linear
functions are amenable to linearization, some are not. For instance, a
quadratic equation like y=a+bX+cX^2 can be linearized if you define a new
variable Z=X^2, and use the linear equation y=a+bX+cZ; likewise the equation y=aX^b can be linearized by taking logarithms as log y=log a + b(log X).

4. The fact that a certain mathematical function fits your data is no great
deal. You can always find some function that does that. The trick is finding
a function for which you have a theoretical explanation. So it is not
advisable to go around blindly trying different mathematical functions until
any of them "fits". In fact, you may find several, perhaps an infinite
number of functions that reasonably fit the data, and that is arguably worse
than not having any.

5. If no reasonable function fits the shape of the data, perhaps your data
just show little relationship at all between the variables...

Hector

使用道具

SPSSCHEN 发表于 2006-5-1 22:57:00 |显示全部楼层 |坛友微信交流群

Hi Mr.Hector,

First of all thank you very much for your quick response.

  1. I don't want high correlation coefficient what I need to make it higher is the coefficient of determination(R squre), and about residuals I've already tested the normality and they are normal. R-Squared is the squared correlation coefficient, so both are essentially the same. If residuals are normal, nothing is necessary to get more normal residuals such as a log transformtion.
  2. I don't know what you mean by the 2nd point but I've tested that there is no correlation between independent variables i.e there is no
    multicollinearity, and the scatter between the DV and each IV is not u-
    shaped.
  3. What I mean in my second point is that a low R or R-Squared may be due to either: the absence of any relationship between your DV and the set of IV, or the presence of a relationship that is not linear. This can be ascertained by plotting predicted and observed values. A formless cloud is the first case, a regular but not linear shape, e.g. a cloud in the shape of an U, is the second case. In the latter situation you may transform some of the variables to get a linear, instead of non-linear relationship, or you may try non-linear regression or curve fitting.
  4. I'm trying hardly not to another model other than linear in order not to test another assumptions that's why I'm trying to find a way to solve the problem. Moreover, I don't know how to detect which model that would fit. Models are based on theory. Trying blindly anything that fits is not good advice.
  5. As I mentioned before I've tested collinearity but there is only one
    assumption that I wasn't able to test is that residuals and independent
    variables are independent from each other because I don't have the residuals as separated variable.
    Collinearity might have been one problem, but you evidently do not have it. Perhaps it is simply that your IV do not predict the DV well. That happens.


Razan

[此贴子已经被作者于2006-5-1 23:05:03编辑过]

使用道具

SPSSCHEN 发表于 2006-5-1 22:58:00 |显示全部楼层 |坛友微信交流群

Depending on the number of cases you have and the subject matter area, a
multiple correlation of .55 (r**2= .3) could be suspiciously high. What are your variables? how are they measured? How many cases do you have? How were they selected?

Art
Art@DrKendall.org
Social Research Consultants
University Park, MD USA Inside the Washington, DC beltway.
(301) 864-5570

[此贴子已经被作者于2006-5-1 22:59:01编辑过]

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-3-28 21:40