楼主: 羊乖乖
23412 9

[问答] 相关系数计算中缺失值的处理问题 [推广有奖]

  • 1关注
  • 1粉丝

已卖:1份资源

本科生

38%

还不是VIP/贵宾

-

威望
0
论坛币
189 个
通用积分
2.7900
学术水平
0 点
热心指数
0 点
信用等级
0 点
经验
5462 点
帖子
48
精华
0
在线时间
96 小时
注册时间
2015-8-24
最后登录
2020-9-28

楼主
羊乖乖 发表于 2015-11-13 11:13:13 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
在R中对多列数据计算相关系数。
在缺失值的处理中,有四种方法:

当use=“all.obs”时,有缺失值就会报错;

当use=“complete.obs”时,空值的地方会被casewise deletion( If use is "complete.obs" then missing values are handled by casewise deletion (and if there are no complete cases, that gives an error). )但计算出来的相关系数矩阵是完整的没有缺失值的,那么那些空值的地方的值是用什么方法填补的呢?

当use=“na.or.complete”计算结果与上面一样,只不过如果全部都没有完整数据的行列,结果将给出NA,而不是像上面一样给一个error。

当use=“pairwise.complete.obs”计算结果和前面两个不一样(if use has the value "pairwise.complete.obs" then the correlation or covariance between each pair of variables is computed using all complete pairs of observations on those variables. This can result in covariance or correlation matrices which are not positive semi-definite, as well as NA entries if there are no complete pairs for that pair of variables. )但结果也是完整的。

想问下,那些缺失值的地方在计算过程中是如何处理的?是选择了数据列中均值、中值、众数之类的来替代的吗?还是用什么方法进行填补的?




二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:相关系数 缺失值 observations correlation observation values error

本帖被以下文库推荐

沙发
万人往LVR 在职认证  发表于 2015-11-13 11:53:27
?cor
看details

藤椅
羊乖乖 发表于 2015-11-13 13:50:00
万人往LVR 发表于 2015-11-13 11:53
?cor
看details
是的哦。哦 我看看。。。谢谢哈。。。

板凳
羊乖乖 发表于 2015-11-13 14:13:36
万人往LVR 发表于 2015-11-13 11:53
?cor
看details
不好意思,麻烦再问下,这个details在哪里呀?

报纸
jiangbeilu 学生认证  发表于 2015-11-13 18:42:43
羊乖乖 发表于 2015-11-13 14:13
不好意思,麻烦再问下,这个details在哪里呀?
只有选择pairs...那个,才是不用缺失值进行计算的,选择complete cases进行计算。也就是在计算 的时候,先剔出了缺失值,并没有用其它值进行代替。
下面是Details
  1. Details

  2. For cov and cor one must either give a matrix or data frame for x or give both x and y.

  3. The inputs must be numeric (as determined by is.numeric: logical values are also allowed for historical compatibility): the "kendall" and "spearman" methods make sense for ordered inputs but xtfrm can be used to find a suitable prior transformation to numbers.

  4. var is just another interface to cov, where na.rm is used to determine the default for use when that is unspecified. If na.rm is TRUE then the complete observations (rows) are used (use = "na.or.complete") to compute the variance. Otherwise, by default use = "everything".

  5. If use is "everything", NAs will propagate conceptually, i.e., a resulting value will be NA whenever one of its contributing observations is NA.
  6. If use is "all.obs", then the presence of missing observations will produce an error. If use is "complete.obs" then missing values are handled by casewise deletion (and if there are no complete cases, that gives an error).
  7. "na.or.complete" is the same unless there are no complete cases, that gives NA. Finally, if use has the value "pairwise.complete.obs" then the correlation or covariance between each pair of variables is computed using all complete pairs of observations on those variables. This can result in covariance or correlation matrices which are not positive semi-definite, as well as NA entries if there are no complete pairs for that pair of variables. For cov and var, "pairwise.complete.obs" only works with the "pearson" method. Note that (the equivalent of) var(double(0), use = *) gives NA for use = "everything" and "na.or.complete", and gives an error in the other cases.

  8. The denominator n - 1 is used which gives an unbiased estimator of the (co)variance for i.i.d. observations. These functions return NA when there is only one observation (whereas S-PLUS has been returning NaN), and fail if x has length zero.

  9. For cor(), if method is "kendall" or "spearman", Kendall's tau or Spearman's rho statistic is used to estimate a rank-based measure of association. These are more robust and have been recommended if the data do not necessarily come from a bivariate normal distribution.
  10. For cov(), a non-Pearson method is unusual but available for the sake of completeness. Note that "spearman" basically computes cor(R(x), R(y)) (or cov(., .)) where R(u) := rank(u, na.last = "keep"). In the case of missing values, the ranks are calculated depending on the value of use, either based on complete observations, or based on pairwise completeness with reranking for each pair.

  11. Scaling a covariance matrix into a correlation one can be achieved in many ways, mathematically most appealing by multiplication with a diagonal matrix from left and right, or more efficiently by using sweep(.., FUN = "/") twice. The cov2cor function is even a bit more efficient, and provided mostly for didactical reasons.
复制代码

地板
zq19900310 发表于 2015-11-16 19:48:17
可以使用DMwR包中的knnImputation(data,k=...)函数来填补缺失值

7
羊乖乖 发表于 2015-11-19 09:03:52
jiangbeilu 发表于 2015-11-13 18:42
只有选择pairs...那个,才是不用缺失值进行计算的,选择complete cases进行计算。也就是在计算 的时候,先 ...
既然剔除了,那为什么计算出来的结果是完整的?缺失值的地方也有相关系数啊?不好意思,新手小白一个。问题有点多,希望见谅。。
感谢回复~~~谢谢。。

8
羊乖乖 发表于 2015-11-19 09:04:31
zq19900310 发表于 2015-11-16 19:48
可以使用DMwR包中的knnImputation(data,k=...)函数来填补缺失值
谢谢~~~感谢帮助。。

9
jiangbeilu 学生认证  发表于 2015-11-19 11:11:51
羊乖乖 发表于 2015-11-19 09:03
既然剔除了,那为什么计算出来的结果是完整的?缺失值的地方也有相关系数啊?不好意思,新手小白一个。问 ...
具体可以参见我的帖子:
https://bbs.pinggu.org/thread-3992878-1-1.html

10
哈哈MOON 发表于 2019-1-20 22:55:44
那么请问如果使用 pcor()的话 对于有缺失值的变量怎么处理呢?可以使用类似cor()里面的成对删除吗?但是没有找到这条指令

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-27 04:03