楼主: karleenchan
3210 9

[其他] What is a p-value anyway ? (A Very good book) [推广有奖]

  • 1关注
  • 56粉丝

已卖:2683份资源

院士

7%

还不是VIP/贵宾

-

威望
0
论坛币
26575 个
通用积分
66.0874
学术水平
161 点
热心指数
255 点
信用等级
134 点
经验
158146 点
帖子
3433
精华
0
在线时间
2082 小时
注册时间
2013-11-24
最后登录
2025-9-28

楼主
karleenchan 发表于 2015-2-22 03:11:18 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

What_is_a_p-value_anyway.pdf (9.16 MB, 需要: 10 个论坛币)

Statistics stands on two pillars, estimation and inference. Pretty much anything you work on stats, you end up either estimating something or inferring something. If you take a random sample of people who have taken a stats101 course at some point in their lives and ask them what was the course all about , a most likely answer would be, "It was something to do with p-values". Statistics at it core is about comparing a set of numbers with each other , with theoretical models and with past experience. But most of the introductory textbooks contain scary formulae and distribution tables that need to be used by students. I think if you are a teacher introducing statistics to a new batch of students, it will do a world of good, if you dramatize a specific act : Walk in to the class and tear all the pages in the appendix that have these tables and arcane formulae that only scare people out of developing a statistical mindset. It will at least drive home the point that there is no formal textbook to interpret real life data. Why do you think textbooks make assumptions about the distributions of the data ? Pause for a few seconds and think about it. Well, one of the main reasons is that unless you have some assumptions, you cannot fill up the textbook with neat formulae. Yes, think about it. Unless you assume a certain distribution, you cannot put a neat formula for estimate. You cannot put a neat formula for confidence interval and so on and so forth. What’s the use of those formulae ? Not much.

As an example, let’s say you record your evening commute times from office to home, daily for a month or two. You want to see whether the your average commute time on Monday is less thanFriday.What would you do ? Well, you can calculate the mean of the commute times on both days. What you are doing is estimating the average time ? What you intended was inference , i. whether your commute time on Monday is less than Friday’s?. If you open up a stats book, it will tell you to use some formula that involves mean of the commute times,pooled standard deviation etc. From that you form a t statistic and check whether the respective p-value for the statistic is less than 0.05. The whole procedure that is mentioned in the book is brimming with a ton of assumptions ? Why should one assume same variance across two samples? Ok, if you don’t assume that, there is another formula where you can make that adjustment?. Either use this formula or that formula..argg..There is a fundamental problem with this approach.The use of parametric statistics to determine the critical value at which observed t becomes significant. In these days of abundant and cheap computing power, nobody follows such textbook approach anymore. One uses nonparametric distribution free methods for as simple a test as mentioned above. This was unthinkable a few decades ago when computing power was expensive, when software was expensive. You can generate let’s say , for the above example, 500 resampled data and get a practical answer to the question. In todays world of free open source software, it would be unthinkable if someone were to open a textbook in trying to answer the question mentioned above.

Books such as these are helpful to the general public who don’t run statistical analysis in their day to day work, but have to interpret statistical results in their professional or personal life. The book does not have a single formula but it tries to impart knowledge more than most of the boring and sleep inducing textbooks that one comes across The book tries to weave some important stats101 concepts by telling 34 stories, stories that one can easily read and remember the associated lesson with it. Stories are always a great way to teach/learn/understand stuff. Let me attempt to list the concepts that these 34 stories cover:


  • Basic difference between estimation and inference
  • In some situations mean is better than median , while in others it is vice-versa
  • For Skewed data, median and interquartile range summaries gives a better understanding of the data than mean and standard deviation
  • What is skewness and how do you identify the same in the data?
  • Mean and Standard deviation tell you everything about the data(if it comes from a normal distribution). Technically they are called sufficient statistics. Histograms do the job where mean and median are not enough.
  • Story that hints at the utility of nonparametric regression
  • Normal distribution and its parameters
  • If data doesn’t look normal, you can take a log transform and work with it. Why does log transform make the data normal in many situations ? Verbalizing it with out any formulae is where the stories do a good job.
  • How does one check whether data is from a normal distribution ?
  • What does standard error of an estimate mean ? Well, actually modern non parametric statistics has algos to figure out standard error of the standard error of an estimate. That’s a mouthful , but there are situations where it matters
  • What do you understand by a confidence interval ?
  • What is a statistical tie ?
  • How does one verbalize p-value ?
  • Story of a dry toothbrush to illustrate the basic funda of p-value
  • What does one mean by Null Hypothesis ?
  • Difference between a t test and Wilcoxon test. What are parametric statistics and non parametric statistics ?
  • Concepts such as sample size, precision , statistical power, Type I and Type II errors
  • Story to convey the usage of univariate regression, multivariate regression and logistic regression.
  • Multiple regression is not a magic wand that you can use to churn out models. It has a ton of assumptions and the more you are aware of them, the less crap you dish out/ less crap you take from the news/articles/papers that have a statistical garb.
  • When a child coughs, the mother overreacts and father underreacts ? What are the consequences ? A story that illustrates the concept of specificity and sensitivity of a test,i.e probability that the test shows positive/negative given the patient has disease/no-disease. If a doctor has to discuss a test result with a patient, specificity and sensitivity are of not much help. One is supposed to talk about probability that the patient has disease/no-disease given the test shows positive/negative, called the positive predictive value and negative predictive value.
  • How does a decision tree work ?
  • A story that analyzes the blunders one sees in academic papers that report a barrage of p-values for everything
  • Weeding out unnecessary p-values – A paper that involves testing 150 odd patients and that ends up reporting 126 separate p-values, i.e almost one p-value for a patient! Heights of lunacy
  • A story that drives home the point that chi-square and ANOVA provide inference and not estimation. Similarly correlation provides estimation and not inference. Any statistical test that provides p-values is basically an inferential procedure and one cannot use it for estimation
  • Words of caution in the context of regression
  • Explaining the idea of "Regression to the mean"using a simple story
  • Misapplication of conditional probability in O.J Simpson’s case,Sally Clark’s case
  • Dangers of multiple hypothesis testing
  • A story that reiterates that p-values are about inference and not about estimation
  • A story that talks about statistical errors creeping in because of not "starting the clock" simultaneously for test and control group
  • There is no right way of doing statistics. It is all problem,purpose, context dependent
  • Importance of reproducibility in statistics. This is a topic that is dear to my heart. May be 5-10 years back there was no good infra to do it. But now thanks to amazing efforts of R community and Python community, creating reproducible research has become easy.
  • Statistics has to link math to whatever field you are working. This linkage is what makes working in stats such a fun thing. Here is a wonderful illustration that is spot on about the real world and statistical world. Universities, Schools, Textbooks focus on the right side of the picture. They teach you all the math, probability, models to equip you enough, so that you can go in to the real world, and make all the connections.


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:p-value anyway value Book good comparing something anything numbers either

已有 1 人评分论坛币 热心指数 收起 理由
99rabbit + 10 + 1 奖励积极上传好的资料

总评分: 论坛币 + 10  热心指数 + 1   查看全部评分

沙发
songlinjl(真实交易用户) 发表于 2015-2-22 05:25:12 来自手机
karleenchan 发表于 2015-2-22 03:11
Statistics stands on two pillars, estimation and inference. Pretty much anything you work on s ...
excellent book.

藤椅
nathan9800(未真实交易用户) 发表于 2015-2-24 15:00:36
good book

板凳
Joseph_KII(未真实交易用户) 在职认证  学生认证  发表于 2015-2-24 21:42:24
嗯,不错,应该看看

报纸
爱萌(真实交易用户) 发表于 2015-2-25 15:22:50
很好的, 能将P-Value讲清楚,这是很不容易

地板
Enthuse(真实交易用户) 发表于 2015-2-26 00:10:44
thanks ..

7
小皇(未真实交易用户) 发表于 2015-2-26 20:05:22
想看看,但木有论坛币,只能去图书馆借了~呜呜

8
lkwokchu(真实交易用户) 发表于 2015-8-20 21:50:48
Thank for sharing

9
xiangyukuang(未真实交易用户) 发表于 2016-1-20 06:51:13
顶一下

10
王教授卐(未真实交易用户) 学生认证  发表于 2016-1-20 16:47:16
看看。。。。。。

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
扫码
拉您进交流群
GMT+8, 2026-1-20 03:27