人大经济论坛 › 论坛 › 计量经济学与统计论坛五区 › 计量经济学与统计软件 › Stata专版 › 如何删除重复的数据以进行进一步检验

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: t818

4598 5

如何删除重复的数据以进行进一步检验 [推广有奖]

2关注
2粉丝

教师

博士生

84%

还不是VIP/贵宾

威望: 0 级
论坛币: 73516 个
通用积分: 4.3104
学术水平: 3 点
热心指数: 9 点
信用等级: 0 点
经验: 6706 点
帖子: 96
精华: 0
在线时间: 580 小时
注册时间: 2008-11-7
最后登录: 2024-4-17

楼主

t818 发表于 2010-7-23 02:55:52 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

请教一下：事件研究法中，用以下命令生成累计超额收益率后，每个id的各行数据均相同（即累计超额收益率重复出现），此时如何只留下一个累计超额收益率的数据以进行T检验等？谢谢！
gen abnormal_return=ret-predicted_return if event_window==1
by id: egen car = sum(abnormal_return) if dif>=-365 & dif<0

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏1 回帖

关键词：Predicted abnormal predict RETURN normal 数据检验删除

相关帖子

使用道具举报

沙发

t818 发表于 2010-7-23 03:06:39 |只看作者 |坛友微信交流群

能否在用egen时针对每个id直接只输出一个累计超额收益率？谢谢！

使用道具举报

藤椅

houquan 发表于 2010-7-23 08:53:52 |只看作者 |坛友微信交流群

help collapse                                                                                                                         dialog:  collapse
---------------------------------------------------------------------------------------------------------------------------------------------------------

Title

[D] collapse -- Make dataset of summary statistics

Syntax

      collapse clist [if] [in] [weight] [, options]

where clist is either

      [(stat)] varlist [ [(stat)] ... ]

      [(stat)] target_var=varname [target_var=varname ...] [ [(stat)] ...]

or any combination of the varlist or target_var forms, and stat is one of

      mean       means (default)
      median    medians
      p1          1st percentile
      p2          2nd percentile
      ...       3rd-49th percentiles
      p50       50th percentile (same as median)
      ...       51st-97th percentiles
      p98       98th percentile
      p99       99th percentile
      sd          standard deviations
      semean    standard error of the mean (sd/sqrt(n))
      sebinomial standard error of the mean, binomial (sqrt(p(1-p)/n))
      sepoisson standard error of the mean, Poisson (sqrt(mean))
      sum       sums
      rawsum    sums, ignoring optionally specified weight
      count       number of nonmissing observations
      max       maximums
      min       minimums
      iqr       interquartile range
      first       first value
      last       last value
      firstnm    first nonmissing value
      lastnm    last nonmissing value

If stat is not specified, mean is assumed.

options       description
---------------------------------------------------------------------------------------------------------------------------------------------------
Options
   by(varlist) groups over which stat is to be calculated
   cw          casewise deletion instead of all possible observations

+ fast          do not restore the original dataset should the user press Break; programmer's command
---------------------------------------------------------------------------------------------------------------------------------------------------
+ fast is not shown in the dialog box.
varlist and varname in clist may contain time-series operators; see tsvarlist.
aweights, fweights, iweights, and pweights are allowed; see weight, and see Weights below.  pweights may not be used with sd, semean, sebinomial,
   or sepoisson.  iweights may not be used with semean, sebinomial, or sepoisson.  aweights may not be used with sebinomial or sepoisson.

Menu

Data > Create or change data > Other variable-transformation commands > Make dataset of means, medians, etc.

Description

collapse converts the dataset in memory into a dataset of means, sums, medians, etc.  clist must refer to numeric variables exclusively.

Note: See [D] contract if you want to collapse to a dataset of frequencies.

Options

      +---------+
----+ Options +------------------------------------------------------------------------------------------------------------------------------------

by(varlist) specifies the groups over which the means, etc., are to be calculated.  If this option is not specified, the resulting dataset will
      contain 1 observation.  If it is specified, varlist may refer to either string or numeric variables.

cw specifies casewise deletion.  If cw is not specified, all possible observations are used for each calculated statistic.

The following option is available with collapse but is not shown in the dialog box:

fast specifies that collapse not restore the original dataset should the user press Break.  fast is intended for use by programmers.

Weights

collapse allows all four weight types; the default is aweights.  Weight normalization impacts only the sum, count, sd, semean, and sebinomial
statistics.

Here are the definitions for count and sum with weights:

   count:
      unweighted                   _N, the number of physical observations
      aweight:                   _N, the number of physical observations
      fweight, iweight, pweight: sum(w_j), the sum of user-specified weights
   sum:
      unweighted                   sum(x_j), the sum of the variable
      aweight:                   sum(v_j*x_j); v_j = weights normalized to sum to _N
      fweight, iweight, pweight: sum(w_j*x_j); w_j = user supplied weights.

The sd statistic with weights returns the bias-corrected standard deviation, which is based on the factor sqrt(N/(N-1)), where N is the number of
observations. Statistics sd, semean, sebinomial, and sepoisson are not allowed with pweighted data.  Otherwise, the statistic is changed by the
weights through the computation of the count (N), as outlined above.

For instance, consider a case in which there are 25 physical observations in the dataset and a weighting variable that sums to 57.  In the
unweighted case, the weight is not specified, and N = 25.  In the analytically weighted case, N is still 25; the scale of the weight is irrelevant.
In the frequency-weighted case, however, N = 57, the sum of the weights.

The rawsum statistic with aweights ignores the weight, with one exception:  observations with zero weight will not be included in the sum.

Examples

-----------------------------------------------------------------------------------------------------------------------------------------------------
Setup
      . webuse college
      . describe
      . list

Create dataset containing the 25th percentile of gpa for each year
      . collapse (p25) gpa [fw=number], by(year)

List the result
      . list

-----------------------------------------------------------------------------------------------------------------------------------------------------
Setup
      . webuse college, clear

Create dataset containing the mean and median of gpa and hour for each year, and store median of gpa and hour in medgpa and medhour, respectively
      . collapse (mean) gpa hour (median) medgpa=gpa medhour=hour [fw=number], by(year)

List the result
      . list

-----------------------------------------------------------------------------------------------------------------------------------------------------
Setup
      . webuse college, clear

Create dataset containing the count of gpa and hour and the minimums of gpa and hour, and store the minimums in mingpa and minhour, respectively
      . collapse (count) gpa hour (min) mingpa=gpa minhour=hour [fw=number], by(year)

List the result
      . list

-----------------------------------------------------------------------------------------------------------------------------------------------------
Setup
      . webuse college, clear
      . replace gpa = . in 2/4

Create dataset containing the mean of gpa and hour for each year, but ignore all observations that have missing values when calculating the means
      . collapse (mean) gpa hour [fw=number], by(year) cw

List the result
      . list
-----------------------------------------------------------------------------------------------------------------------------------------------------

Also see

Manual:  [D] collapse

   Help:  [D] contract, [D] egen, [D] statsby, [R] summarize

We all love to instruct, though we can teach only what is not worth knowing. -- J. Austen

使用道具举报