data three;
set one two;
by var;
run;
SAS help中说明:
The values of the variables in the program data vector are set to missing each time SAS starts to read a new
data set and when the BY group changes. (SAS language reference 9.2, P362)
但我经过测试,我发现其实除了首次将PDV置为缺失时,SAS开始读另一个data set和by 组改变时,均没有置为缺失。
以下是我的测试:
- data one;
- x=1;output;
- x=1;output;
- x=3 ;output;
- run;
- data two;
- x=2;output;
- x=2;output;
- x=4;output;
- run;
- proc sort data =one;
- by x;
- run;
- proc sort data=two;
- by x;
- data three;
- put "before set:" _all_;
- set one two;
- by x;
- put "after set:" _all_;
- run;
149 put "before set:" _all_;
150 set one two;
151 by x;
152 put "after set:" _all_;
153 run;
before set:x=. FIRST.x=1 LAST.x=1 _ERROR_=0 _N_=1
after set:x=1 FIRST.x=1 LAST.x=0 _ERROR_=0 _N_=1
before set:x=1 FIRST.x=1 LAST.x=0 _ERROR_=0 _N_=2
after set:x=1 FIRST.x=0 LAST.x=1 _ERROR_=0 _N_=2
before set:x=1 FIRST.x=0 LAST.x=1 _ERROR_=0 _N_=3
after set:x=2 FIRST.x=1 LAST.x=0 _ERROR_=0 _N_=3
before set:x=2 FIRST.x=1 LAST.x=0 _ERROR_=0 _N_=4
after set:x=2 FIRST.x=0 LAST.x=1 _ERROR_=0 _N_=4
before set:x=2 FIRST.x=0 LAST.x=1 _ERROR_=0 _N_=5
after set:x=3 FIRST.x=1 LAST.x=1 _ERROR_=0 _N_=5
before set:x=3 FIRST.x=1 LAST.x=1 _ERROR_=0 _N_=6
after set:x=4 FIRST.x=1 LAST.x=1 _ERROR_=0 _N_=6
before set:x=4 FIRST.x=1 LAST.x=1 _ERROR_=0 _N_=7
NOTE: There were 3 observations read from the data set WORK.ONE.
NOTE: There were 3 observations read from the data set WORK.TWO.
NOTE: The data set WORK.THREE has 6 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
标红部分都是by组改变,且要切换到另一个data set里读取观测。但是pdv里的值是retain的。没有置为missing.
同样,对于MERGE+by语句:
SAS HELP说明:
When SAS has read all observations in a
BY group from all data sets, it sets all variables in the program data vector
(except those created by SAS) to missing (SAS language reference 9.2, P373)
- data a;
- x=1; y='a1';output;
- x=2;y='a2';output;
- x=3 ;y='a3';output;
- run;
- data b;
- x=1;z='b1';output;
- x=2;z='b2';output;
- x=3;z='b3';output;
- run;
- proc sort data=one;
- by x;
- run;
- proc sort data=two;
- by x;
- run;
- data c;
- put "before set:" _all_;
- merge a b;
- by x;
- put "after set:" _all_;
- run;
198 data c;
199 put "before set:" _all_;
200 merge a b;
201 by x;
202 put "after set:" _all_;
203 run;
before set:x=. y= z= FIRST.x=1 LAST.x=1 _ERROR_=0 _N_=1
after set:x=1 y=a1 z=b1 FIRST.x=1 LAST.x=1 _ERROR_=0 _N_=1
before set:x=1 y=a1 z=b1 FIRST.x=1 LAST.x=1 _ERROR_=0 _N_=2
after set:x=2 y=a2 z=b2 FIRST.x=1 LAST.x=1 _ERROR_=0 _N_=2
before set:x=2 y=a2 z=b2 FIRST.x=1 LAST.x=1 _ERROR_=0 _N_=3
after set:x=3 y=a3 z=b3 FIRST.x=1 LAST.x=1 _ERROR_=0 _N_=3
before set:x=3 y=a3 z=b3 FIRST.x=1 LAST.x=1 _ERROR_=0 _N_=4
NOTE: There were 3 observations read from the data set WORK.A.
NOTE: There were 3 observations read from the data set WORK.B.
NOTE: The data set WORK.C has 3 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds 如上,标红部分也是by组change时,结果还是retain,没有置为Missing.