楼主: ReneeBK
2474 15

[问答] Sample Size for Factor Analysis? [推广有奖]

  • 1关注
  • 62粉丝

VIP

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49422 个
通用积分
52.2304
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57815 点
帖子
4006
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Hello, I was asked to do a factor analysis of 40 variables but I only have 70 cases. Needless to say, I had to increase iterations to 100 to get the program to converge and I still believe that it makes no sense to do a factor analysis with less than 2 cases per variable. I was then asked to provide a citation for that. Could someone point me to a source discussing the minimum case per variable requirement for factor analysis that I can cite? Thanks a lot.

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Analysis Analysi Sample factor Analys increase believe minimum provide someone

沙发
ReneeBK 发表于 2014-3-29 10:10:25 |只看作者 |坛友微信交流群
Comrey & Lee (1992, A first course in factor analysis) give as a guide sample sizes of:

50 as very poor
100 as poor
200 as fair
300 as good
500 as very good
1000 as excellent for factor analysis.

Tabachnick & Fidell (Using multivariate statistics, 4th ed) recommend at least 300 cases.

使用道具

藤椅
ReneeBK 发表于 2014-3-29 10:10:39 |只看作者 |坛友微信交流群
Perhaps not the most authoritative citation, but the APA publication Edited by Grimm and Yarnold, Reading and Understanding Multivariate Statistics, 8th Ed. 2003.  Washington DC.  Page 100.    Referred to as the subjects to variables ratio (STV), "the minimum number of observations in ones sample should be at least five times the number of variables."

使用道具

板凳
ReneeBK 发表于 2014-3-29 10:11:10 |只看作者 |坛友微信交流群
I would look at the article by McCallum et al in Psycholgical Methods as well as some in MBR that show problems with rules of thumb for EFA......one needs to take into account scaling issues, over/under determination, communalities/saturation, etc..........

Robert Marshall <marshall_pmp@comcast.net> wrote:  Perhaps not the most authoritative citation, but the APA publication Edited by Grimm and Yarnold, Reading and Understanding Multivariate Statistics, 8th Ed. 2003. Washington DC. Page 100. Referred to as the subjects to variables ratio (STV), "the minimum number of observations in ones sample should be at least five times the number of variables

使用道具

报纸
ReneeBK 发表于 2014-3-29 10:12:28 |只看作者 |坛友微信交流群
I do not remember a specific citation, but the general idea is that factor analysis is a derivation of regression, and regression rests on the normal distribution of estimation errors. This normal distribution of estimation errors is known as "the law of large numbers" and is a tendency shown by errors as N gets larger and larger. More exactly, as the "degrees of freedom" get larger. The degrees of freedom equal number of cases minus number of variables, N-k-1, which in your case is quite small. As the number of cases are few, the margin of error of your estimates will be very wide, and you could not be sure of their probable true value in the universe or population, especially for minor factors after the first or second one, where the coefficients or loadings will be close to zero (and there may therefore be difficult to tell whether they are not zero in the population).

An old rule of thumb says you need at the very least 10 cases per variable, but this is "the very least". With less than 30-50 cases experimental error distributions hardly (or very infrequently) resemble a normal curve. So my advise is you try a model with fewer variables, possibly one underlying factor if your 40 variables are mostly explained by one overarching factor, or abandon factor analysis altogether and try some more modest approaches like a simple summatory scale, simple regression, 2 or 3 way cross tabulations, and the like. Next time, go bigger in your sample design. And then again, do you really have a theory that is so complex that no less than 40 independent factors are required by it? Isaac Newton explained the universe with only two or three variables, and did very well indeed, thank you.

使用道具

地板
ReneeBK 发表于 2014-3-29 10:13:39 |只看作者 |坛友微信交流群
I have been following this discussion with much interest, as I have a similar problem at hand. For years, we have been conducting a consumer satisfaction survey that consists of one page, about 10 questions, plus a single open-ended question. Although the questions were intended to probe consumer satisfaction in a number of different areas, basically the level of correlation is so high that it seems that we're really only tracking one factor: overall satisfaction.

So we conducted literature reviews, and went back to the drawing boards, formulating more than 100 questions in 6 broad areas of consumer satisfaction. Our intention was to pilot test these questions with participants, examine the results, throw out the redundant questions (discerned through factor analysis), and emerge with, say, 20 questions known to reflect different dimensions of consumer satisfaction. However, our sample size thus far is in the pitiful range: perhaps 35 respondents.Needless to say, we have a long way to go. With our response rates, and consumer base, we would be lucky to get more than 100 respondents in a year.

In order to improve the subjects to variables ratio (STV), we need either to greatly increase the sample size (which is difficult for us to do), or reduce the number of variables, or both. Our questions are short simple statements  requesting responses on a 5-point likert scale. Some of the questions are worded in almost identical language, and some of these are almost certainly redundant. Given our relatively small sample size thus far, what is the best way to proceed to remove redundant questions while
retaining maximum diversity of responses?
From one perspective, it would appear that rank correlations might be the preferred measure of association, but I wonder if Likert scales are, analytically speaking, equivalent to rank order variables? What other measures would be most appropriate? I hesitate to downgrade the measure of association to categorical, because that throws out the information on directionality and degree. Likewise, I hesitate to overgrade the measure of association to ratio, because clearly the intervals are arbitrary and not
additive.

Intuitively, I am seeking to extract, out of these 100 questions, 4-5 groups of 2-3 questions each, such that within-group correlations are high, but correlations with the other groups are low. The within-group redundancy
reinforces degree of satisfaction with that particular factor, and the low
between-group correlation assures that different aspects of satisfaction
are represented.



使用道具

7
ReneeBK 发表于 2014-3-29 10:13:40 |只看作者 |坛友微信交流群
I have been following this discussion with much interest, as I have a
similar problem at hand.
For years, we have been conducting a consumer satisfaction survey that
consists of one page, about 10 questions, plus a single open-ended
question. Although the questions were intended to probe consumer
satisfaction in a number of different areas, basically the level of
correlation is so high that it seems that we're really only tracking one
factor: overall satisfaction.

So we conducted literature reviews, and went back to the drawing boards,
formulating more than 100 questions in 6 broad areas of consumer
satisfaction. Our intention was to pilot test these questions with
participants, examine the results, throw out the redundant questions
(discerned through factor analysis), and emerge with, say, 20 questions
known to reflect different dimensions of consumer satisfaction. However,
our sample size thus far is in the pitiful range: perhaps 35 respondents.
Needless to say, we have a long way to go. With our response rates, and
consumer base, we would be lucky to get more than 100 respondents in a year.

In order to improve the subjects to variables ratio (STV), we need either
to greatly increase the sample size (which is difficult for us to do), or
reduce the number of variables, or both. Our questions are short simple
statements  requesting responses on a 5-point likert scale. Some of the
questions are worded in almost identical language, and some of these are
almost certainly redundant. Given our relatively small sample size thus
far, what is the best way to proceed to remove redundant questions while
retaining maximum diversity of responses?

From one perspective, it would appear that rank correlations might be the
preferred measure of association, but I wonder if Likert scales are,
analytically speaking, equivalent to rank order variables? What other
measures would be most appropriate? I hesitate to downgrade the measure of
association to categorical, because that throws out the information on
directionality and degree. Likewise, I hesitate to overgrade the measure of
association to ratio, because clearly the intervals are arbitrary and not
additive.

Intuitively, I am seeking to extract, out of these 100 questions, 4-5
groups of 2-3 questions each, such that within-group correlations are high,
but correlations with the other groups are low. The within-group redundancy
reinforces degree of satisfaction with that particular factor, and the low
between-group correlation assures that different aspects of satisfaction
are represented.

Suggestions, please?

使用道具

8
ReneeBK 发表于 2014-3-29 10:14:03 |只看作者 |坛友微信交流群
In such kind of case my main suggestion is forget about factor analysis, and simply try to add up the number of "correct" answers. If all questions are highly correlated and clearly measure various aspects of overall satisfaction, subtle differences in weighting (provided by factor analysis) would not matter much, and would probably vary from one sample to the next. So go ahead with a no-weight (i.e. equal weight) scale and relax. You can check whether this simple additive score still correlates well with individual questions, and with other (external) indicators associated with satisfaction (such as returning for more), but assuming all goes well the simple scale is easier to compute, easier to explain, and lacks the many statistical pitfalls of factor analysis and regression. It only lacks the false pretenses of scientificity coming from mere difficulty or sophistication, and some people live off being difficult, and get famous just because of that, e.g. some postmodern "philosophes", but you better don't care much about that.

使用道具

9
ReneeBK 发表于 2014-3-29 10:23:46 |只看作者 |坛友微信交流群
You can try to use Dwyer's extension analysis. You start by creating a set of homogenous item packages or parcels - combine sets of 2-4 items into new scales by reviewing the item correlations (combine those items with the highest inter-item correlations). Then, factor analyze the item parcels (you will have reduced the number of variables in the factor analysis to about 10-15 (instead of 40). Convergence and iterations should behave better. Rotate and then use the Dwyer extension procedure described in Gorsuch
(1983) Factor Analysis (2nd Ed.) on pages 236-238. Essentially, the factor solution of the parcels is projected onto the original set of items. You'll get your factor structure and pattern matrix (if you rotate obliquely) of the 40 items.

If you need some background on item parceling, you can find out more about it by searching "item parcels." I know their use is controversial. You can also check up on Andrew Comrey's work in developing his personality inventory and Ray Cattell's work.

使用道具

10
ReneeBK 发表于 2014-3-29 10:26:17 |只看作者 |坛友微信交流群
n addition to  the recommended ratios of 10 to 20 people per variable, the following has also been suggested:

Some Monte Carlo simulation research (Guadagnoli & Velincer, 1998) suggest ... replicable factors tend to be estimated if:
1. factors are each defined by four or more measured variables with structure coefficients each great than .6 [in absolute value], regardless or sample size; or
2. factors are each defined with 10 or more structure coefficients each around .4[in absolute value], if sample size is greater than 150; or
3. sample size is at least 300." (Thompson, 2004, p. 24)

Linda

Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding concepts and applications. Washington, DC: American Psychological Association.

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-5-27 18:32