请选择 进入手机版 | 继续访问电脑版
楼主: SPSSCHEN
3784 3

[讨论]Sample Size for Cluster Analysis [推广有奖]

  • 0关注
  • 0粉丝

博士生

22%

还不是VIP/贵宾

-

TA的文库  其他...

Voxco NewOccidental

Case Study NewOccidental

NoSQL NewOccidental

威望
0
论坛币
946 个
通用积分
0.6700
学术水平
7 点
热心指数
2 点
信用等级
0 点
经验
2052 点
帖子
306
精华
0
在线时间
42 小时
注册时间
2005-9-25
最后登录
2022-10-25

SPSSCHEN 发表于 2005-9-25 05:01:00 |显示全部楼层 |坛友微信交流群

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

Hi all, I am running a cluster analysis with 20 variables in a sample of 100 participants. Is the sample size too small? Should I try to reduce the number of variables? This is my first time running cluster analysis. Any help would be greatly appreciated!

Thank you!! Joyce

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Analysis Cluster Analysi Analys alysis Analysis 讨论 Sample size Cluster

SPSSCHEN 发表于 2005-9-25 05:03:00 |显示全部楼层 |坛友微信交流群

There is no specific rule for this. In linear regression usually a rule is circulated requiring at least 10-20 cases per variable. Based on this rule, you should use a maximum of 5 variables, possibly extensible to 10 variables. But it all depends on the variability among your cases. If your 100 cases fall neatly within a few groups, and the variables are highly correlated among themselves, then you may use more variables and still get meaningful results (i.e. meaningful groups of cases). But if your cases are dispersed across all values and combinations of values of the various variables, you may as well form three clusters or thirty clusters, use four variables or forty variables... The general objective of a cluster analysis is to construct a few groups or clusters that are (a) internally homogeneous and (b) clearly distinct from other groups. If the groups are more or less equally distributed all over the variable-space, many will fall in the "gray area", more or less at an equal distance from various cluster centers, and thus attributing those cases to one cluster or to another would be essentially arbitrary, and all solutions would be highly unstable (changing even slightly the value of a case in some of the variables would throw it into a different cluster). In that kind of situation, larger samples (and larger cases/variables ratios) would be needed.

Hector

[此贴子已经被作者于2005-9-25 5:03:39编辑过]

使用道具

suewoe 发表于 2005-9-25 05:15:00 |显示全部楼层 |坛友微信交流群

Agree with the second floor

使用道具

tianwk 发表于 2020-2-14 23:07:40 |显示全部楼层 |坛友微信交流群
thanks for sharing

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-18 17:14