* 利用菜单可以实现以一个为关键变量(如ID)来查重,也可以实现根据两个变量确定的查重(如下面2中的name和age).
* 发现重复记录的删除方法是select cases,不过过程有点绕,不是直选中要接删除的记录,而是选中不要删除的记录,所以得注意.
* 举一个例子说明上述功能.
ID | name | age |
1 | chen | 24 |
2 | li | 45 |
3 | li | 45 |
4 | he | 43 |
4 | John | 65 |
5 | tom | 23 |
6 | lily | 67 |
.
* 1. 以ID为关键变量查重.
SORT CASES BY ID(A).
MATCH FILES
/FILE=*
/BY ID
/FIRST=PrimaryFirst
/LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE MatchSequence=MatchSequence+1.
END IF.
LEAVE MatchSequence.
FORMATS MatchSequence (f7).
COMPUTE InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
/FILE=*
/DROP=PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS PrimaryLast 'Indicator of each last matching case as Primary'.
VALUE LABELS PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES=PrimaryLast.
EXECUTE.
* 删除重复记录.
FILTER OFF.
USE ALL.
SELECT IF (PrimaryLast = 1).
EXECUTE.
* 2. 以name和age定义重复记录.
SORT CASES BY name(A) age(A).
MATCH FILES
/FILE=*
/BY name
/FIRST=PrimaryFirst
/LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE MatchSequence=MatchSequence+1.
END IF.
LEAVE MatchSequence.
FORMATS MatchSequence (f7).
COMPUTE InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
/FILE=*
/DROP=PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS PrimaryLast 'Indicator of each last matching case as Primary'.
VALUE LABELS PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES=PrimaryLast.
EXECUTE.



雷达卡



京公网安备 11010802022788号







