楼主: rurusoso
3738 16

[数据管理求助] 合併資料的問題 [推广有奖]

11
rurusoso 发表于 2014-3-16 22:51:29
可能我剛剛弄錯甚麼了= =
還很不熟stata 讓您麻煩了

所以我接下來要使用#duplicates drop 這個指令嗎??

12
蓝色 发表于 2014-3-16 22:54:00
你先看duplicates 相关选项

这是帮你检查是不是有重复的样本的
如果没有才能一一对应进行合并,
那个cv数据我下载不了,无法帮你验证是否有问题





Title

    [D] duplicates -- Report, tag, or drop duplicate observations


Syntax

    Report duplicates

        duplicates report [varlist] [if] [in]


    List one example for each group of duplicates

        duplicates examples [varlist] [if] [in] [, options]


    List all duplicates

        duplicates list [varlist] [if] [in] [, options]


    Tag duplicates

        duplicates tag [varlist] [if] [in] , generate(newvar)


    Drop duplicates

        duplicates drop [if] [in]

        duplicates drop varlist [if] [in] , force


    options                  Description
    --------------------------------------------------------------------------------------------------
    Main
      compress               compress width of columns in both table and display formats
      nocompress             use display format of each variable
      fast                   synonym for nocompress; no delay in output of large datasets
      abbreviate(#)          abbreviate variable names to # characters; default is ab(8)
      string(#)              truncate string variables to # characters; default is string(10)

    Options
      table                  force table format
      display                force display format
      header                 display variable header once; default is table mode
      noheader               suppress variable header
      header(#)              display variable header every # lines
      clean                  force table format with no divider or separator lines
      divider                draw divider lines between columns
      separator(#)           draw a separator line every # lines; default is separator(5)
      sepby(varlist)         draw a separator line whenever varlist values change
      nolabel                display numeric codes rather than label values

    Summary
      mean[(varlist)]        add line reporting the mean for each of the (specified) variables
      sum[(varlist)]         add line reporting the sum for each of the (specified) variables
      N[(varlist)]           add line reporting the number of nonmissing values for each of the
                               (specified) variables
      labvar(varname)        substitute Mean, Sum, or N for varname in last row of table

    Advanced
      constant[(varlist)]    separate and list variables that are constant only once
      notrim                 suppress string trimming
      absolute               display overall observation numbers when using by varlist:
      nodotz                 display numerical values equal to .z as field of blanks
      subvarname             substitute characteristic for variable name in header
      linesize(#)            columns per line; default is linesize(79)
    --------------------------------------------------------------------------------------------------


Menu

    Data > Data utilities > Manage duplicate observations


Description

    duplicates reports, displays, lists, tags, or drops duplicate observations, depending on the
    subcommand specified.  Duplicates are observations with identical values either on all variables
    if no varlist is specified or on a specified varlist.

    duplicates report produces a table showing observations that occur as one or more copies and
    indicating how many observations are "surplus" in the sense that they are the second (third, ...)
    copy of the first of each group of duplicates.

    duplicates examples lists one example for each group of duplicated observations.  Each example
    represents the first occurrence of each group in the dataset.

    duplicates list lists all duplicated observations.

    duplicates tag generates a variable representing the number of duplicates for each observation.
    This will be 0 for all unique observations.

    duplicates drop drops all but the first occurrence of each group of duplicated observations.  The
    word drop may not be abbreviated.

    Any observations that do not satisfy specified if and/or in conditions are ignored when you use
    report, examples, list, or drop.  The variable created by tag will have missing values for such
    observations.






13
rurusoso 发表于 2014-3-16 22:55:42
額  抱歉  我剛剛開原始檔來看

那是同公司名稱在不同的區域下的資料

所以這個理論上是我必須保留的資料....

14
蓝色 发表于 2014-3-16 23:00:01
那你合并的时候在加一个地区的编码
merge 1:1  公司 地区 年份  using 数据集B .dta

保证 如果知道了:  公司 地区 年份  找到的样本是唯一的一个就没有问题了

15
rurusoso 发表于 2014-3-16 23:17:18
抱歉  我這幾天搞昏頭了  

我應該用isincode 和 year 去找相同的才對...

16
rurusoso 发表于 2014-3-16 23:24:59
好了  已完成  謝謝各位的幫忙!!!

17
蓝色 发表于 2014-3-17 08:19:56
所以,越早提供原始数据,越容易解决问题

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-31 16:32