楼主: cmmddy
3007 11

[编程问题求助] stata操作中有ma这个命令吗? [推广有奖]

  • 0关注
  • 0粉丝

高中生

47%

还不是VIP/贵宾

-

威望
0
论坛币
0 个
通用积分
0.0000
学术水平
0 点
热心指数
0 点
信用等级
0 点
经验
118 点
帖子
8
精华
0
在线时间
51 小时
注册时间
2020-4-20
最后登录
2024-2-27

楼主
cmmddy 发表于 2021-6-19 16:21:03 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
如图,是一篇论文中stata操作的稳健性检验的一部分,原被解释变量是PM2.5(表母国环境质量),解释变量是lnOFDI2(企业对外直接投资存量),这一块为将被解释变量替换为SO2排放量。前半部分没有问题,后半部分划红线的类似于egen xma=ma(x) 是什么意思啊?有ma这个函数吗?之后的替换因变量也有这个操作,请求大家的解答,谢谢! 论坛.png
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Stata tata 对外直接投资 稳健性检验 是什么意思 stata命令

回帖推荐

蓝色 发表于8楼  查看完整内容

https://www.stata.com/support/faqs/statistics/moving-averages-and-panel-data/

917968079 发表于7楼  查看完整内容

具体可以在这里查看

沙发
wdlbcj 学生认证  发表于 2021-6-19 20:27:21
不清楚是选择 中位数 还是选择最大值

藤椅
917968079 发表于 2021-6-19 22:40:54
应该是移动平均

板凳
cmmddy 发表于 2021-6-20 12:41:35
wdlbcj 发表于 2021-6-19 20:27
不清楚是选择 中位数 还是选择最大值
谢谢你~请问是为什么呢?

报纸
cmmddy 发表于 2021-6-20 12:42:05
917968079 发表于 2021-6-19 22:40
应该是移动平均
谢谢您的回答~请问是为什么呢?

地板
qiangli 发表于 2021-6-20 13:05:38 来自手机
cmmddy 发表于 2021-6-19 16:21
如图,是一篇论文中stata操作的稳健性检验的一部分,原被解释变量是PM2.5(表母国环境质量),解释变量是ln ...
好像以前有人问过<br>
这应该是以前版本的命令<br>
现在还可以使用<br>
但帮助里面找不到

7
917968079 发表于 2021-6-20 13:36:07
cmmddy 发表于 2021-6-20 12:42
谢谢您的回答~请问是为什么呢?
具体可以在这里查看
  1. https://www.stata.com/support/faqs/statistics/moving-averages-and-panel-data/
复制代码

8
蓝色 发表于 2021-6-20 13:39:26

9
蓝色 发表于 2021-6-20 13:41:26

Note: This FAQ is for users of Stata 7.

It is not relevant for Stata 8, which includes the tssmooth command for calculating moving averages and other kinds of smooth summary.

The following material is based on exchanges on Statalist.

Stata 7: How can I calculate moving averages for panel data?

Title

Stata 7: Moving averages for panel data

Authors

Nicholas J. Cox, Durham University, UK
Christopher Baum, Boston College


egen, ma() and its limitations

Stata’s most obvious command for calculating moving averages is the ma() function of egen. Given an ex pression, it creates a #-period moving average of that ex pression. By default, # is taken as 3. # must be odd.

However, as the manual entry indicates, egen, ma() may not be combined with by varlist:, and, for that reason alone, it is not applicable to panel data. In any case, it stands outside the set of commands specifically written for time series; see time series for details.

Alternative approaches

To calculate moving averages for panel data, there are at least two choices. Both depend upon the dataset having been tsset beforehand. This is very much worth doing: not only can you save yourself repeatedly specifying panel variable and time variable, but Stata behaves smartly given any gaps in the data.

1. Write your own definition using generate

Using time-series operators such as L. and F., give the definition of the moving average as the argument to a generate statement. If you do this, you are, naturally, not limited to the equally weighted (unweighted) centered moving averages calculated by egen, ma().

For example, equally-weighted three-period moving averages would be given by

. generate moveave1 = (F1.myvar + myvar + L1.myvar) / 3

and some weights can easily be specified:

. generate moveave2 = (F1.myvar + 2 * myvar + L1.myvar) / 4

You can, of course, specify an ex pression such as log(myvar) instead of a variable name such as myvar.

One big advantage of this approach is that Stata automatically does the right thing for panel data: leading and lagging values are worked out within panels, just as logic dictates they should be. The most notable disadvantage is that the command line can get rather long if the moving average involves several terms.

Another example is a one-sided moving average based only on previous values. This could be useful for generating an adaptive expectation of what a variable will be based purely on information to date: what could someone forecast for the current period based on the past four values, using a fixed weighting scheme? (A 4-period lag might be especially commonly used with quarterly timeseries.)

. generate moveave3 = 0.4L1.myvar + 0.3L2.myvar + 0.2L3.myvar + 0.1L4.myvar

2. Use egen, filter() from SSC

Use the community-contributed egen function filter() from the egenmore package on SSC. In Stata 7 (updated after 14 November 2001), you can install this package by

. ssc inst egenmore

after which help egenmore points to details on filter(). The two examples above would be rendered

. egen moveave1 = filter(myvar), coef(1 1 1) lags(-1/1) normalise
. egen moveave2 = filter(myvar), coef(1 2 1) lags(-1/1) normalise

(In this comparison the generate approach is perhaps more transparent, but we will see an example of the opposite in a moment.) The lags are a numlist, leads being negative lags: in this case -1/1 expands to -1 0 1 or lead 1, lag 0, lag 1. The coefficients, another numlist, multiply the corresponding lagging or leading items: in this case those items are F1.myvar, myvar and L1.myvar. The effect of the normalise option is to scale each coefficient by the sum of the coefficients so that coef(1 1 1) normalise is equivalent to coefficients of 1/3 1/3 1/3 and coef(1 2 1) normalise is equivalent to coefficients of 1/4 1/2 1/4.

You must specify not only the lags but also the coefficients. Because egen, ma() provides the equally weighted case, the main rationale for egen, filter() is to support the unequally weighted case, for which you must specify coefficients. It could also be said that obliging users to specify coefficients is a little extra pressure on them to think about what coefficients they want. The main justification for equal weights is, we guess, simplicity, but equal weights have lousy frequency domain properties, to mention just one consideration.

The third example above could be

. egen moveave3 = filter(myvar), coef(0.4 0.3 0.2 0.1) lags(1/4)

or

. egen moveave3 = filter(myvar), coef(4 3 2 1) lags(1/4) normalise

either of which is just about as complicated as the generate approach. There are cases in which egen, filter() gives a simpler formulation than generate. If you want a nine-term binomial filter, which climatologists find useful, then

. egen binomial9 = filter(myvar), coef(1 8 28 56 70 56 28 8 1) lags(-4/4)
> normalise

looks perhaps less horrible than, and easier to get right than,

. gen binomial9 = (F4.myvar + 8 * F3.myvar + 28 * F2.myvar + 56 * F1.myvar +
> 70 * myvar + 56 * L1.myvar + 28 * L2.myvar + 8 * L3.myvar + L4.myvar) / 256

Just as with the generate approach, egen, filter() works properly with panel data. In fact, as stated above, it depends upon the dataset having been tsset beforehand.

A graphical tip

After calculating your moving averages, you will probably want to look at a graph. The community-contributed command tsgraph is smart about tsset datasets. Install it in an up-to-date Stata 7 by ssc inst tsgraph.

What about subsetting with if?

None of the above examples make use of if restrictions. In fact egen, ma() will not allow if to be specified. Occasionally people want to use if when calculating moving averages, but its use is a little more complicated than it is usually.

What would you expect from a moving average calculated with if? Let us identify two possibilities:

  • Weak interpretation: I don’t want to see any results for the excluded observations.
  • Strong interpretation: I don’t even want you to use the values for the excluded observations.

Here is a concrete example. Suppose as a consequence of some if condition, observations 1-42 are included but not observations 43 on. But the moving average for 42 will depend, among other things, on the value for observation 43 if the average extends backwards and forwards and is of length at least 3, and it will similarly depend on some of the observations 44 onwards in some circumstances.

Our guess is that most people would go for the weak interpretation, but whether that is correct, egen, filter() does not support if either. You can always ignore what you don’t want or even set unwanted values to missing afterwards by using replace.

A note on missing results at the ends of series

Because moving averages are functions of lags and leads, egen, ma() produces missing where the lags and leads do not exist, at the beginning and end of the series. An option nomiss forces the calculation of shorter, uncentered moving averages for the tails.

In contrast, neither generate nor egen, filter() does, or allows, anything special to avoid missing results. If any of the values needed for calculation is missing, then that result is missing. It is up to users to decide whether and what corrective surgery is required for such observations, presumably after looking at the dataset and considering any underlying `science’ that can be brought to bear.

10
蓝色 发表于 2021-6-20 13:42:28

Note: This FAQ is for users of Stata 7.

It is not relevant for Stata 8, which includes the tssmooth command for calculating moving averages and other kinds of smooth summary.

The following material is based on exchanges on Statalist.

Stata 7: How can I calculate moving averages for panel data?

Title

Stata 7: Moving averages for panel data

Authors

Nicholas J. Cox, Durham University, UK
Christopher Baum, Boston College


egen, ma() and its limitations

Stata’s most obvious command for calculating moving averages is the ma() function of egen. Given an ex pression, it creates a #-period moving average of that ex pression. By default, # is taken as 3. # must be odd.

However, as the manual entry indicates, egen, ma() may not be combined with by varlist:, and, for that reason alone, it is not applicable to panel data. In any case, it stands outside the set of commands specifically written for time series; see time series for details.

Alternative approaches

To calculate moving averages for panel data, there are at least two choices. Both depend upon the dataset having been tsset beforehand. This is very much worth doing: not only can you save yourself repeatedly specifying panel variable and time variable, but Stata behaves smartly given any gaps in the data.

1. Write your own definition using generate

Using time-series operators such as L. and F., give the definition of the moving average as the argument to a generate statement. If you do this, you are, naturally, not limited to the equally weighted (unweighted) centered moving averages calculated by egen, ma().

For example, equally-weighted three-period moving averages would be given by

. generate moveave1 = (F1.myvar + myvar + L1.myvar) / 3

and some weights can easily be specified:

. generate moveave2 = (F1.myvar + 2 * myvar + L1.myvar) / 4

You can, of course, specify an ex pression such as log(myvar) instead of a variable name such as myvar.

One big advantage of this approach is that Stata automatically does the right thing for panel data: leading and lagging values are worked out within panels, just as logic dictates they should be. The most notable disadvantage is that the command line can get rather long if the moving average involves several terms.

Another example is a one-sided moving average based only on previous values. This could be useful for generating an adaptive expectation of what a variable will be based purely on information to date: what could someone forecast for the current period based on the past four values, using a fixed weighting scheme? (A 4-period lag might be especially commonly used with quarterly timeseries.)

. generate moveave3 = 0.4L1.myvar + 0.3L2.myvar + 0.2L3.myvar + 0.1L4.myvar

2. Use egen, filter() from SSC

Use the community-contributed egen function filter() from the egenmore package on SSC. In Stata 7 (updated after 14 November 2001), you can install this package by

. ssc inst egenmore

after which help egenmore points to details on filter(). The two examples above would be rendered

. egen moveave1 = filter(myvar), coef(1 1 1) lags(-1/1) normalise
. egen moveave2 = filter(myvar), coef(1 2 1) lags(-1/1) normalise

(In this comparison the generate approach is perhaps more transparent, but we will see an example of the opposite in a moment.) The lags are a numlist, leads being negative lags: in this case -1/1 expands to -1 0 1 or lead 1, lag 0, lag 1. The coefficients, another numlist, multiply the corresponding lagging or leading items: in this case those items are F1.myvar, myvar and L1.myvar. The effect of the normalise option is to scale each coefficient by the sum of the coefficients so that coef(1 1 1) normalise is equivalent to coefficients of 1/3 1/3 1/3 and coef(1 2 1) normalise is equivalent to coefficients of 1/4 1/2 1/4.

You must specify not only the lags but also the coefficients. Because egen, ma() provides the equally weighted case, the main rationale for egen, filter() is to support the unequally weighted case, for which you must specify coefficients. It could also be said that obliging users to specify coefficients is a little extra pressure on them to think about what coefficients they want. The main justification for equal weights is, we guess, simplicity, but equal weights have lousy frequency domain properties, to mention just one consideration.

The third example above could be

. egen moveave3 = filter(myvar), coef(0.4 0.3 0.2 0.1) lags(1/4)

or

. egen moveave3 = filter(myvar), coef(4 3 2 1) lags(1/4) normalise

either of which is just about as complicated as the generate approach. There are cases in which egen, filter() gives a simpler formulation than generate. If you want a nine-term binomial filter, which climatologists find useful, then

. egen binomial9 = filter(myvar), coef(1 8 28 56 70 56 28 8 1) lags(-4/4)
> normalise

looks perhaps less horrible than, and easier to get right than,

. gen binomial9 = (F4.myvar + 8 * F3.myvar + 28 * F2.myvar + 56 * F1.myvar +
> 70 * myvar + 56 * L1.myvar + 28 * L2.myvar + 8 * L3.myvar + L4.myvar) / 256

Just as with the generate approach, egen, filter() works properly with panel data. In fact, as stated above, it depends upon the dataset having been tsset beforehand.

A graphical tip

After calculating your moving averages, you will probably want to look at a graph. The community-contributed command tsgraph is smart about tsset datasets. Install it in an up-to-date Stata 7 by ssc inst tsgraph.

What about subsetting with if?

None of the above examples make use of if restrictions. In fact egen, ma() will not allow if to be specified. Occasionally people want to use if when calculating moving averages, but its use is a little more complicated than it is usually.

What would you expect from a moving average calculated with if? Let us identify two possibilities:

  • Weak interpretation: I don’t want to see any results for the excluded observations.
  • Strong interpretation: I don’t even want you to use the values for the excluded observations.

Here is a concrete example. Suppose as a consequence of some if condition, observations 1-42 are included but not observations 43 on. But the moving average for 42 will depend, among other things, on the value for observation 43 if the average extends backwards and forwards and is of length at least 3, and it will similarly depend on some of the observations 44 onwards in some circumstances.

Our guess is that most people would go for the weak interpretation, but whether that is correct, egen, filter() does not support if either. You can always ignore what you don’t want or even set unwanted values to missing afterwards by using replace.

A note on missing results at the ends of series

Because moving averages are functions of lags and leads, egen, ma() produces missing where the lags and leads do not exist, at the beginning and end of the series. An option nomiss forces the calculation of shorter, uncentered moving averages for the tails.

In contrast, neither generate nor egen, filter() does, or allows, anything special to avoid missing results. If any of the values needed for calculation is missing, then that result is missing. It is up to users to decide whether and what corrective surgery is required for such observations, presumably after looking at the dataset and considering any underlying `science’ that can be brought to bear.

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-29 15:29