人大经济论坛 › 论坛 › 数据科学与人工智能 › 数据分析与数据科学 › SPSS论坛 › Weighting variables in two-step cluster analysis

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: ReneeBK

1842 4

[问答] Weighting variables in two-step cluster analysis [推广有奖]

1关注
62粉丝

VIP

学术权威

14%

还不是VIP/贵宾

TA的文库 其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望: 1 级
论坛币: 49492 个
通用积分: 53.3854
学术水平: 370 点
热心指数: 273 点
信用等级: 335 点
经验: 57815 点
帖子: 4006
精华: 21
在线时间: 582 小时
注册时间: 2005-5-8
最后登录: 2023-11-26

楼主

ReneeBK 发表于 2014-4-23 02:20:16 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

I'm using SPSS to perform two-step cluster analyses. SPSS shows predictor importance of each variable used in an analysis. Oftentimes, a binary variable like gender (sorry, I'm just keeping it simple!) will be the most important variable to the formation of the clusters, even if you don't want it to be.

Is there a way to weight variables, so that maybe I can downplay, but not eliminate, gender's role in the analysis?

Thank you for the help!

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：weighting Variables Analysis Variable Cluster procedure function multiple adjusted factors

相关帖子

使用道具举报

沙发

ReneeBK 发表于 2014-4-23 02:21:44 |只看作者 |坛友微信交流群

One thing to keep in mind before turning to weights is that gender can be considered a "swamping" variable in a two-step cluster analysis. Differences between gender are oftentimes large, and thus overpower weaker, but still substantively interesting heterogeneity in your data.

Instead of down-weighting gender, you could consider looking into a finite mixture regression model. Finite mixture models are a model-based cluster analysis (clusters are usually assumed to be multivariate Gaussian) and a finite mixture regression model essentially combines a cluster analysis with a regression. In your case, you could use gender as a predictor, perform this analysis, and detect clusters while taking into account the predictive power of gender (as well as other variables of interest). More information can be found here and here in the flexmix R package documentation.

使用道具举报

藤椅

ReneeBK 发表于 2014-4-23 02:27:03 |只看作者 |坛友微信交流群

I want to assign different weights to the variables in my cluster analysis, but my program (Stata) doesn't seem to have an option for this, so I need to do it manually.

Imagine 4 variables A, B, C, D. The weights for those variables should be

w(A)=50%
w(B)=25%
w(C)=10%
w(D)=15%
I am wondering whether one of the following two approaches would actually do the trick:

First I standardize all variables (e.g. by their range). Then I multiply each standardized variable with their weight. Then do the cluster analysis.
I multiply all variables with their weight and standardize them afterwards. Then do the cluster analysis.
Or are both ideas complete nonsense?

The clustering algorithms (I try 3 different) I wish to use are k-means, weighted-average linkage and average-linkage. I plan to use weighted-average linkage to determine a good number of clusters which I plug into k-means afterwards.

使用道具举报

板凳

ReneeBK 发表于 2014-4-23 02:28:11 |只看作者 |坛友微信交流群

One way to assign a weight to a variable is by changing its scale. The trick works for the clustering algorithms you mention, viz. k-means, weighted-average linkage and average-linkage.

Kaufman, Leonard, and Peter J. Rousseeuw. "Finding groups in data: An introduction to cluster analysis." (2005) - page 11:

The choice of measurement units gives rise to relative weights of the variables. Expressing a variable in smaller units will lead to a larger range for that variable, which will then have a large effect on the resulting structure. On the other hand, by standardizing one attempts to give all variables an equal weight, in the hope of achieving objectivity. As such, it may be used by a practitioner who possesses no prior knowledge. However, it may well be that some variables are intrinsically more important than others in a particular application, and then the assignment of weights should be based on subject-matter knowledge (see, e.g., Abrahamowicz, 1985).

On the other hand, there have been attempts to devise clustering techniques that are independent of the scale of the variables (Friedman and Rubin, 1967). The proposal of Hardy and Rasson (1982) is to search for a partition that minimizes the total volume of the convex hulls of the clusters. In principle such a method is invariant with respect to linear transformations of the data, but unfortunately no algorithm exists for its implementation (except for an approximation that is restricted to two dimensions). Therefore, the dilemma of standardization appears unavoidable at present and the programs described in this book leave the choice up to the user

Abrahamowicz, M. (1985), The use of non-numerical a pnon information for measuring dissimilarities, paper presented at the Fourth European Meeting of the Psychometric Society and the Classification Societies, 2-5 July, Cambridge (UK).

Friedman, H. P., and Rubin, J. (1967), On some invariant criteria for grouping data. J . Amer. Statist. ASSOC6.,2 , 1159-1178.

Hardy, A., and Rasson, J. P. (1982), Une nouvelle approche des problemes de classification automatique, Statist. Anal. Donnies, 7, 41-56.

使用道具举报

报纸

ReneeBK 发表于 2014-4-23 02:29:06 |只看作者 |坛友微信交流群

Yes approach 1 is the right one, and correspond to what Kaufman, Leonard, and Peter J. Rousseeuw are saying in the paragraphs I quoted in the answer. Approach 2 would be useless as the standardization removes the weights :)

Franck Dernoncourt

使用道具举报

返回列表

发帖

本版微信群

加好友,备注cda
拉您进交流群

手机版 |

意见反馈 |

帮助 |

新手入门 |

用户手册 |

友情链接 |

如有投资本站、合作意向或投放广告，请联系：13661292478（刘老师）

联系客服

邮箱：service@pinggu.org 投诉或不良信息处理：（010-68466864）

京ICP备16021002-2号京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明

[问答] Weighting variables in two-step cluster analysis [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

本版微信群

扫码加我拉你入群