楼主: jose.liupei
3460 7

关于模型非线性问题 [推广有奖]

  • 0关注
  • 33粉丝

已卖:1120份资源

博士生

63%

还不是VIP/贵宾

-

威望
0
论坛币
1981 个
通用积分
318.3593
学术水平
72 点
热心指数
85 点
信用等级
53 点
经验
13432 点
帖子
221
精华
0
在线时间
401 小时
注册时间
2010-12-17
最后登录
2024-9-19

楼主
jose.liupei 发表于 2014-9-23 11:50:13 |AI写论文
10论坛币
请问:我在一个回归模型中(比如:y=a+bx+e)考虑到非线性问题,加入x的平方项,回归等式变成y=a+bx+cx^2+e
在不考虑非线性时,回归y=a+bx+e,得到系数b是显著并且为负值;
考虑到非线性时,回归y=a+bx+cx^2+e,得到系数b是不显著但为正值,x的平方项的系数c显著并且为负值;

在这种情况下该如何解释?到底有没有非线性效果呢?还需要做别的test吗?
多谢多谢

最佳答案

colinwang 查看完整内容

Hi Jo, Sorry for the confusion, I DIDNOT mean that you NEED to increase the sample size. What I was trying to express is that: hypothetically, if you increase the sample size to a very very large number, for instance, 10000, you will observe all the coefficients with significant p-value (refer to sample size calculation). So, in your case, you can not make your decision based on the p-val ...
关键词:非线性 性问题 test 回归模型 平方项 模型 如何

回帖推荐

colinwang 发表于2楼  查看完整内容

Hi Jo, Sorry for the confusion, I DIDNOT mean that you NEED to increase the sample size. What I was trying to express is that: hypothetically, if you increase the sample size to a very very large number, for instance, 10000, you will observe all the coefficients with significant p-value (refer to sample size calculation). So, in your case, you can not make your decision based on the p-val ...

本帖被以下文库推荐

未出土時先有節,及凌雲處尚虛心

沙发
colinwang 企业认证  发表于 2014-9-23 11:50:14
jose.liupei 发表于 2014-9-28 00:03
Thanks for ur detailed answers.

But I do not quite understand what u mentioned about "As you in ...
Hi Jo,

Sorry for the confusion, I DIDNOT mean that you NEED to increase the sample size. What I was trying to express is that: hypothetically, if you increase the sample size to a very very large number, for instance, 10000, you will observe all the coefficients with significant p-value (refer to sample size calculation).

So, in your case, you can not make your decision based on the p-value, since p-value is not only resting on the degree of association between Y and X, as well as the sample size which is wether big enough to express the association.

When you have a certain 300 observations, it is big enough to express Y~X and Y~X2. However, it is might not legitimate to show Y~(X, X2). The underlying causes might be the Collinearity. X2 is driven from X, somehow when you use X2 to explain Y. X can be omitted by the algorithm, where MLE is very sensible on correlation. That's why I suggest other approach to do model selection.

Let's say if you nail to Y~X2. The x is greater or equal to 0, then you don't have a U-sharpe rather an right-half U-sharpe. This is very easy to interpret. The X2 can be considered as a monotone transformation from X, and you could draw the linear association between Y and X2, followed by extrapolation to Y with X.

If you still have issue, free to email me at colinwang@hotmail.co.uk. In addition, thank statax for the backup!

藤椅
jose.liupei 发表于 2014-9-23 21:12:17 来自手机
求解答~谢谢~

板凳
xuruilong100 发表于 2014-9-23 22:37:48
看一下y=a+bx^2+e的结果,如果b的显著性较强,可以考虑去掉x线性因素,只考察x^2非线性因素。

报纸
jose.liupei 发表于 2014-9-23 22:59:50
xuruilong100 发表于 2014-9-23 22:37
看一下y=a+bx^2+e的结果,如果b的显著性较强,可以考虑去掉x线性因素,只考察x^2非线性因素。
那应该怎么解释这个结果?或者在经济学上有什么意义呢?谢谢

地板
colinwang 企业认证  发表于 2014-9-26 00:15:30
First, let's align that it is still a linear model with non-linear relation between Y and X.

Let's move to your spot. Linear regression, as a parametric methods, in regardless of what kind of algorithm for maximisation, is very sensible on the its assumptions, especially on the scale, distribution, outliner, etc.

When you perform y=a+bx+e, it suggest the significant linear relation between y and x under certain sample size (I assume it is not over powered).

For y=a+bx+cx^2+e, you also find the association between y and x^2. Note, even y and x is not statistically related here, you CANNOT conclude that x is irrelevant with y. As you increasing the sample size, I'm assure you that the coefficient will be significant again at certain level.

So now, your question is becoming clear, which model is the better:
y=a+bx+e
y=a+bx+cx^2+e
y=a+bx^2+e

There are couple options for model comparison. R2 square is a simple way, but remembering penalising the degree of freedom you denoted. Log likelihood ratio test using restrict maximise likelihood estimation could be very informative and quantitative. Check the residual, stepwise, professional prior, etc. I can keep going for a day. So focus on your background hypothesis and choose the best way you can.

7
jose.liupei 发表于 2014-9-28 00:03:23
colinwang 发表于 2014-9-26 00:15
First, let's align that it is still a linear model with non-linear relation between Y and X.

Let' ...
Thanks for ur detailed answers.

But I do not quite understand what u mentioned about "As you increasing the sample size, I'm assure you that the coefficient will be significant again at certain level." If I include x^2 in the equation, it does not necessarily increase the sample size (from my understanding).

For example, if I have 300 observations, I firstly regress the following equation: y=a+bx+e. Then, I add the x^2 in the equation, which become y=a+bx+cx^2+e. But in this new equation, I still have 300 observations. How can the sample size be increased?

And how to explain the results from y=a+bx^2+e in economic, accounting, or financial practise, assuming x is equal to or greater than 0 (this is the general case in accounting or financial variables, for example, total assets, leverage ratio, exporting ratio, etc.). As y=a+bx^2+e displays a U-shaped relationship between x and y, and the threshold (or turning point) is at 0, so we can discuss the different trend or different effect below or above 0. But, when x is only equal to or greater than 0, how can the results be explained?

Thank you very much!

8
statax 发表于 2014-9-28 22:21:01
报纸那一楼的意思是如果样本数不多,加入平方项可能不会显著,但如果样本数不断增大,所有变量可通都会变得显著。你现有的样本数比较这两种回归,如果样本数不大就没有多少可信度吧。

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-2 06:25