楼主: Imasasor
4510 26

如何找重复的id号 [推广有奖]

11
yongyitian 发表于 2013-10-30 08:23:11
Imasasor 发表于 2013-10-29 16:09
请问两个set是几个指针,几个pdv
不敢妄言有多少年经验和真弄明白了PDV.

个人理解一个数据步应该只有一个PDV,用于储存所有的变量名和需要处理的数据.
指针可能会有多个, 比如整个数据步的指针可以指向整段程序的某一段, 如第一个set语句,do-
loop, run 语句等。 而set语句的指针 point= 指向第几行数据, do-loop的指针指向第几个循环参数.

在上面的程序里, 当数据步指针指向do-loop时, 会hold住第一个set语句读入的数据, 并执行do-loop内的语句. do-loop每循环一次, 就用第二个set语句从第二个数据集读入一条由 point= 指定的观测, 然后执行IF语句.

do-loop运行时数据步的指针始终指向do-loop,而第二个set语句随着循环参数的变化指向第二个数据集的不同观测并读取数据。

do-loop运行时, PDV 中由do-loop前set语句读取的数据集 1 的数据保持不变,与do-loop内部有关的数据不断变化,满足一定条件的用output语句输出到结果数据集。

两个数据集的变量名是在数据步开始运行前存入PDV的. 如果两个数据集有相同的变量名,第二个set语句读入的数据会覆盖第一个set语句读入的数据值。

这样说来上面的code有点费解. 因此做了点修改,可能会清楚些. 由于do-loop从_n_+1开始, 所以局限于找出 id2中的重复值出现在id1中相同值的后面.
  1. data tem;
  2. input id1 id2;
  3. cards;
  4. 1 2
  5. 2 1
  6. 4 7
  7. 5 8
  8. 6 9
  9. 8 5
  10. ; run;

  11. data repeated;
  12.      set tem (rename=(id1=_id1 id2=_id2));
  13.        n = _n_;
  14.      do i = n+1 to nobs;
  15.        set tem nobs=nobs point=i;
  16.        if _id1 = id2 then output;
  17.      end;
  18.     keep id1 id2;
  19. run;

  20. proc sql;
  21.   create table want as
  22.     select *
  23.     from tem
  24.     except select * from repeated;
  25. quit;
复制代码
已有 1 人评分经验 论坛币 学术水平 热心指数 信用等级 收起 理由
Imasasor + 100 + 100 + 5 + 5 + 5 分析的有道理

总评分: 经验 + 100  论坛币 + 100  学术水平 + 5  热心指数 + 5  信用等级 + 5   查看全部评分

12
yongyitian 发表于 2013-10-30 08:32:27
邓贵大 发表于 2013-10-29 22:12
No, your code is OK.
I was just saying the solution may not be unique based on Imasasor's descrip ...
You are absolutely right.  I tried to loop through all observations in the dataset.  It did not produce the wanted results.  Therefore, I started the do-loop from _n_+1, which is limited to remove repeated ID value in ID2 that occurs after its occurrence in ID1.  The code does not remove record that ID2 value occurs before the same value in ID1.

13
playmore 发表于 2013-10-30 08:45:54
用Hash做了一个,看看行不?
  1. data tem;
  2. input id1 id2;
  3. cards;
  4. 1 2
  5. 2 1
  6. 4 7
  7. 5 8
  8. 6 9
  9. 8 5
  10. ;

  11. data tem1(rename=(id1=id2 id2=id1));
  12.         if _n_=1 then do;
  13.                 declare hash h(dataset:"tem");
  14.                 h.definekey('id1');
  15.                 h.definedata('id2');
  16.                 h.definedone();
  17.         end;
  18.         set tem(rename=(id1=id2 id2=id1));
  19.         if h.find() NE 0 then output;
  20. run;
复制代码
playmore邀请您访问ChinaTeX论坛!!!进入ChinaTeX论坛

14
bobguy 发表于 2013-10-30 10:01:22
You can load all data into a temporary array and maneuver easily with a array.

data tem;
array id(6,3) _temporary_;
infile cards eof=end;
input ;
id(_N_,1)=input(scan(_infile_,1), best.);
id(_N_,2)=input(scan(_infile_,2), best.);
return;

end:

  do i=1 to 6-1;
    do j=i+1 to 6;
          if  id(i,1)=id(j,2) then id(j,3)=0;
        end;
  end;
  do i=1 to 6;
     if id(i,3) ne 0 then do;
           id1=id(i,1);
       id2=id(i,2);
           output;
         end;
  end;
keep id:;
cards;
1 2
2 1
4 7
5 8
6 9
8 5
;

proc print;run;

15
邓贵大 发表于 2013-10-30 20:37:19
still not quite right, folks! try it on this dataset if you may
  1. 3 4
  2. 6 5
  3. 5 7
  4. 4 6
  5. 1 2
  6. 2 1
复制代码
Be still, my soul: the hour is hastening on
When we shall be forever with the Lord.
When disappointment, grief and fear are gone,
Sorrow forgot, love's purest joys restored.

16
Eternal0601 发表于 2013-10-30 21:22:55
bobguy 发表于 2013-10-30 10:01
You can load all data into a temporary array and maneuver easily with a array.

data tem;
弱弱地问句 end: 在这里是啥用法?之前见过的都是用go to  labelname      labelname:这种用法

17
1033096528 发表于 2013-11-1 12:22:08
初学者觉得楼上大家都很厉害的样子,默默飘走。。。。。。

18
龙潭丰乐 学生认证  发表于 2013-11-2 16:06:27
playmore 发表于 2013-10-30 08:45
用Hash做了一个,看看行不?
这样是不对的  ,你没有考虑到id1出现的顺序,只是把所有的id1作为键,放到hash里面了。

19
playmore 发表于 2013-11-2 22:01:17
龙潭丰乐 发表于 2013-11-2 16:06
这样是不对的  ,你没有考虑到id1出现的顺序,只是把所有的id1作为键,放到hash里面了。
lz的贴子里有说顺序的问题了吗?
playmore邀请您访问ChinaTeX论坛!!!进入ChinaTeX论坛

20
邓贵大 发表于 2013-11-3 10:00:31
let's forget about the order of input or the uniqueness of solution and rephrase the question as follows.
Find a subset A from the set of ordered pairs S={(a,b): 1<=a,b<=n and a, b are not equal} such that,
(1) (a,b) and (c,a) cannot belong to A at the same time, for any a,b,c; (this condition is stated by the OP)
(2) A is maximal in the sense that there does not exist a subset A' of S such that A is a proper subset of A'. (the OP didn't mention this, but I think it's necessary otherwise you could always return only one record and claim you get the right answer.)
The question is also equivalent to finding a maximal subgraph from a directed graph by deleting edges so that no vertex has both nonzero indegree and outdegree.
I don't have a solution on the top of my head, but you may check your program against the two conditions above.
Be still, my soul: the hour is hastening on
When we shall be forever with the Lord.
When disappointment, grief and fear are gone,
Sorrow forgot, love's purest joys restored.

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-24 12:35