楼主: oliyiyi
1289 0

Ten Simple Rules for Effective Statistical Practice: An Overview [推广有奖]

版主

已卖:2994份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
84105 个
通用积分
31671.0967
学术水平
1454 点
热心指数
1573 点
信用等级
1364 点
经验
384134 点
帖子
9629
精华
66
在线时间
5508 小时
注册时间
2007-5-21
最后登录
2025-7-8

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

楼主
oliyiyi 发表于 2016-6-26 14:58:52 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

An overview of 10 simple rules to follow to ensure proper effective statistical data analysis.

By Matthew Mayo, KDnuggets.

On June 9, 2016, in the open-access journal PLOS Computational Biology, an article titled "Ten Simple Rules for Effective Statistical Practice" was published. In it, authors Robert E. Kass, Brian S. Caffo, Marie Davidian, Xiao-Li Meng, Bin Yu, and Nancy Reid laid out a statistical data analysis code, one which, if followed, should help lead to accurate and useful results.

While the article appears in a computational biology publication, and references relevant topics within, it also rightfully points out the applicability of the rules to any "science," what the article defines as "investigations using data to study questions of interest."

All direct quotes (obviously) and ideas are attributable to the authors of the article. This post will quickly summarize the general content, including the gist of the rules, leaving an in-depth investigation to the interested reader. For more detail and finer points, read the PLOS article.

Rule 1: Statistical Methods Should Enable Data to Answer Scientific Questions

nexperienced users of statistics tend to take for granted the link between data and scientific issues and, as a result, may jump directly to a technique based on data structure rather than scientific goal.

The example given in the article relates to tabular microarray gene expression data: an analyst might look for a statistical method by asking, "Which test?" when they should, instead, start with the scientific question: "Where are the differentiated genes?" From this underlying question, the researcher could then employ a statistical test which they deemed appropriate to answer said question.

In other words, instead of asking which test should be employed in a given situation, the authors argue that the better way to proceed is to focus on the goal, and let the most appropriate test arise organically. Clearly, experience is the key to success for this rule.

Rule 2: Signals Always Come with Noise

Other times variability may be annoying, such as when we get three different numbers when measuring the same thing three times. This latter variability is usually called “noise,” in the sense that it is either not understood or thought to be irrelevant.

One of the goals of statistical analysis is to assess the data's signal and variability amongst irrelevant variability, or noise. This is especially applicable in today's world of Big Data; if small amounts of data possess noise which must be accounted for, massive amounts of data certainly do not possess less noise, and certainly do not make its existence any less of an issue.

Rule 3: Plan Ahead, Really Ahead

[R]ather than focusing on a specific detail in the design of the experiment, someone with a lot of statistical experience is likely to step back and consider many aspects of data collection in the context of overall goals and may start by asking, “What would be the ideal outcome of your experiment, and how would you interpret it?”

The moral of Rule 3 is that early preparation saves time in the long run: design questions lead to simplified, and often more rigorous, subsequent analysis.

Rule 4: Worry about Data Quality

This only makes sense: GIGO (garbage in, garbage out). We have all heard it before.

[T]he complexity of modern data collection requires many assumptions about the function of technology, often including data pre-processing technology.

Data { preparation | munging | cleaning | wrangling } often leads to the discovery of data-related quality concerns, and brings other issues to light (misspelled variations of identical categorical data; divergent techniques by different data recorders; what to do about missing values?). You have heard this ad nauseam ("Data prep takes 80% of your time!"), but it bears repeating once again in this context.

Rule 5: Statistical Analysis Is More Than a Set of Computations

Statistical software provides tools to assist analyses, not define them.

Statistical software is a means, not an end. It is a tool meant to assist analytical processes in the investigation of scientific questions, and losing site of this fact can be detrimental.

On the other hand, algorithmic analysis can significantly enhance reproducibility, the importance of which should not be overlooked.

Rule 6: Keep it Simple

All else being equal, simplicity trumps complexity.

While this is a really simplistic argument, it's also a difficult to argue with. Start simple, and add complexity as necessary. A sound implementation of simple statistical methods can often trump unnecessary complexity, and lead to useful, consistent, understandable results.

Rule 7: Provide Assessments of Variability

A basic purpose of statistical analysis is to help assess uncertainty, often in the form of a standard error or confidence interval, and one of the great successes of statistical modeling and inference is that it can provide estimates of standard errors from the same data that produce estimates of the quantity of interest.

Reporting the results of statistical analysis comes with the responsibility of identifying the appropriate uncertainty. All repeated data collection would involve variability, which would lead to subsequent uncertainty in conclusions. At the very least, sharing these points of potential uncertainty are useful for planning future work.

Rule 8: Check Your Assumptions

Every statistical inference involves assumptions, which are based on substantive knowledge and some probabilistic representation of data variation—this is what we call a statistical model.

Assumptions such as linear relationships and the statistical independence of multiple observations must be scrutinized and validated, as should measurement biases and assumptions related to how missing values are dealt with, among others. Doing so attempts to explain innate volatility, which exists whether or not it is acknowledged. At an absolute minimum, visual tools can help check how well models fit the data.

Rule 9: When Possible, Replicate!

Statisticians tend to be aware of the most obvious kinds of data snooping, such as choosing particular variables for a reported analysis, and there are methods that can help adjust results in these cases.
...
The only truly reliable solution to the problem posed by data snooping is to record the statistical inference procedures that produced the key results, together with the features of the data to which they were applied, and then to replicate the same analysis using new data.

Related to this rule, the authors make the great analogy of drawing a bullseye around your findings, as opposed to the opposite, and correct, process of measuring how well your observations stack up against the actual predetermined bullseye. It's not only about exposing others, however; it's about performing and reporting your analysis in a way that allows for replication as well. Ideally, replication is accomplished via an independent investigator, on different data sets. Replication also often introduces modifications to the original experiment.

Rule 10: Make Your Analysis Reproducible

[G]iven the same set of data, together with a complete description of the analysis, it should be possible to reproduce the tables, figures, and statistical inferences.

Rule 10 is closely-related to Rule 9, even if it does not go as far. In the absence of the practicality of having independent investigators replicate results on new data, the detailed description and systematic outlining of experiments which can lead to reproducible results are ideal.

Note: This content of this post is based on this article, and all credit to the ideas contained within are attributable to its authors.

Related:



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Statistical statistica Effective statistic Overview effective published overview article Robert

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-30 23:40