【求教】如何剔除重复的记录

0关注
4粉丝

已卖：77份资源

博士生

43%

还不是VIP/贵宾

-

0%

威望: 0 级
论坛币: 2921 个
通用积分: 9.4100
学术水平: 1 点
热心指数: 1 点
信用等级: 0 点
经验: 136 点
帖子: 258
精华: 0
在线时间: 99 小时
注册时间: 2009-2-7
最后登录: 2024-8-17

楼主

cynthialam 发表于 2012-1-10 10:17:27 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

假设有数据如下：
Var1 Var2
a b
a c
b c
b a

在这个数据中，第一条记录和最后一条记录的意义是一样的，可以认为是重复的记录，那要怎么删除呢？

求教！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：SOSO VaR sos 记录如何

回帖推荐

桶桶nancy 发表于4楼查看完整内容

一、具有主键的情况 a.具有唯一性的字段id(为唯一主键) delect table where id not in ( select max(id) from table group by col1,col2,col3... ) group by 子句后跟的字段就是你用来判断重复的条件，如只有col1，那么只要col1字段内容相同即表示记录相同。 b.具有联合主键假设col1+ ', '+col2+ ', '...col5 为联合主键 select * from table where col1+ ', '+c ...

本帖被以下文库推荐

· SAS精彩问答|主题: 2530, 订阅: 30

沙发

cynthialam 发表于 2012-1-10 10:46:32

一直木有人回答么.....

藤椅

桶桶nancy

发表于 2012-1-10 10:55:16

一、具有主键的情况
a.具有唯一性的字段id(为唯一主键)
delect table
where id not in
( select max(id) from table group by col1,col2,col3... )
group by 子句后跟的字段就是你用来判断重复的条件，如只有col1，
那么只要col1字段内容相同即表示记录相同。

b.具有联合主键
假设col1+ ', '+col2+ ', '...col5 为联合主键
select * from    table where col1+ ', '+col2+ ', '...col5 in ( select max(col1+ ', '+col2+ ', '...col5) from table where having count(*)> 1
group by col1,col2,col3,col4 )
group by 子句后跟的字段就是你用来判断重复的条件，
如只有col1，那么只要col1字段内容相同即表示记录相同。

or
select * from table    where exists (select 1 from table x where table.col1 = x.col1 and
table.col2= x.col2 group by x.col1,x.col2 having count(*) > 1)

c:判断所有的字段
select * into #aa from table group by id1,id2,....
delete table
insert into table
select * from #aa

二、没有主键的情况

a:用临时表实现
select identity(int,1,1) as id,* into #temp from ta
delect #temp
where id not in
(  select max(id) from # group by col1,col2,col3... )
delete table ta
inset into ta(...)
   select ..... from #temp

b:用改变表结构（加一个唯一字段）来实现
alter table 表 add    newfield int identity(1,1)
delete 表
where newfield not in
( select min(newfield) from 表 group by 除newfield外的所有字段 )

alter table 表 drop column newfield

板凳

桶桶nancy

发表于 2012-1-10 10:55:43

一、具有主键的情况
a.具有唯一性的字段id(为唯一主键)
delect table
where id not in
( select max(id) from table group by col1,col2,col3... )
group by 子句后跟的字段就是你用来判断重复的条件，如只有col1，
那么只要col1字段内容相同即表示记录相同。

b.具有联合主键
假设col1+ ', '+col2+ ', '...col5 为联合主键
select * from    table where col1+ ', '+col2+ ', '...col5 in ( select max(col1+ ', '+col2+ ', '...col5) from table where having count(*)> 1
group by col1,col2,col3,col4 )
group by 子句后跟的字段就是你用来判断重复的条件，
如只有col1，那么只要col1字段内容相同即表示记录相同。

or
select * from table    where exists (select 1 from table x where table.col1 = x.col1 and
table.col2= x.col2 group by x.col1,x.col2 having count(*) > 1)

c:判断所有的字段
select * into #aa from table group by id1,id2,....
delete table
insert into table
select * from #aa

二、没有主键的情况

a:用临时表实现
select identity(int,1,1) as id,* into #temp from ta
delect #temp
where id not in
(  select max(id) from # group by col1,col2,col3... )
delete table ta
inset into ta(...)
   select ..... from #temp

b:用改变表结构（加一个唯一字段）来实现
alter table 表 add    newfield int identity(1,1)
delete 表
where newfield not in
( select min(newfield) from 表 group by 除newfield外的所有字段 )

alter table 表 drop column newfield

已有 1 人评分	经验	论坛币	收起理由
bakoll	+ 3	+ 3	精彩帖子

总评分: 经验 + 3 论坛币 + 3 查看全部评分

报纸

cynthialam 发表于 2012-1-10 13:35:05

谢谢~
但看得不是太明白，特别是关于Max那块...

地板

shenliang_111 发表于 2012-1-10 14:34:37

data a;
input var1 $ var2 $;
cards;
a b
a c
b c
b a
d c
e f
t g
c d
;
/*method one-学自hopewell*/
data _null_;
length var1 $8.
var2 $8.;
if _n_=1 then do;
declare hash h();
h.definekey('var1','var2');
h.definedata('var1','var2');
h.definedone();
call missing(var1,var2);
end;
set a end=last;
if h.find(key:var1,key:var2) and h.find(key:var2,key:var1)
then h.add();
if last then h.output(dataset:'result');
run;
/*method two*/
data a;
set a;
id+1;
run;
proc transpose data=a out=aa(rename=(_name_=var));
by id;
var var1 var2;
run;
proc sql noprint;
create table temp(drop=sum) as
select distinct * from
(select a.*,sum(case when(missing(b.col1)) then 0 else 1 end) as sum
from aa a left join aa b
on a.id gt b.id and a.col1=b.col1
group by a.id,b.id) c
group by id
having sum(case when(sum=2) then 1 else 0 end) eq 0
order by id;
quit;
proc transpose data=temp out=result2(drop=id _name_ rename=(col1=var1 col2=var2));
by id;
var col1;
run;

复制代码

7楼

mymine 发表于 2012-1-10 15:08:37

来一个逻辑简单，重复的行都删除模式

data a;
input var1 $ var2 $;
cards;
a b
a c
b c
b a
d c
e f
t g
c d
;
run;

data a1;
set a;
var3=compress(var1||'*'||var2);
nn=_n_;
run;
data a2;
set a;
var3=compress(var2||'*'||var1);
nn=_n_;
run;
data aa;
set a1 a2;
run;

proc sort data=aa out=aa nodupkey;
by var3;
run;
proc sql;
create table ab as
select distinct var1,var2
from aa group by nn
having n(var1)=2;
quit;

8楼

mymine 发表于 2012-1-11 14:27:41

重复的行保留一行的模式

data a;
input var1 $ var2 $;
cards;
a b
a c
b c
b a
d c
e f
t g
c d
;
run;

data a1;
set a;
var3=compress(var1||'*'||var2);
nn=_n_;
run;
data a2;
set a;
var3=compress(var2||'*'||var1);
nn=_n_;
run;
data aa;
set a1 a2;
run;

proc sql;
create table ab as
select distinct nn,var1,var2,max(nn) as n1,min(nn) as n2
from aa group by var3;
quit;

proc sort data=ab out=ab nodupkey;
by n1 n2;
run;
proc sort data=ab;
by nn;
run;

data ab;
set ab;
drop nn n1 n2;
run;

9楼

sushe1527 发表于 2012-1-11 17:22:24

假如顺序不同无关紧要的话，下面的也可
data a;
input Var1$ Var2$;
call sortc(of var1-var2);
cards;
a b
a c
b c
b a
;run;
proc sort data=a out=final nodupkey;by var1 var2;run;

复制代码

10楼

tj0412ymy 发表于 2012-1-11 17:56:47

DATA Test;
input
Var1 $ Var2 $;
cards;
a b
a c
b c
b a
;
run;
proc sql;
select *
from Test
except
select a.*
from Test as a, Test as b
where a.var1 eq b.var2 and a.var2 eq b.var1
;
quit;

复制代码

对SAS和统计方面感兴趣的朋友，请加SAS学习和认证讨论群：169157207。欢迎在群上讨论!

[原创博文] 【求教】如何剔除重复的记录 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

回帖推荐

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

初级学术勋章

中级学术勋章

高级学术勋章

初级热心勋章

中级热心勋章

高级热心勋章

初级信用勋章

中级信用勋章

高级信用勋章

特级学术勋章

本版微信群

[原创博文] 【求教】如何剔除重复的记录 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

回帖推荐

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

初级学术勋章

中级学术勋章

高级学术勋章

初级热心勋章

中级热心勋章

高级热心勋章

初级信用勋章

中级信用勋章

高级信用勋章

特级学术勋章

本版微信群

扫码加我拉你入群