bayes 发表于 2011-6-17 11:42 
6# bobguy
问题的关键,可能不在于800万的观测值,而在于a和b的种类太多。
我看了你的程序,以下两句:
a=ceil(ranuni(123)*1e3);
b=ceil(ranuni(123)*1e3);
实际上,给予a和b的种类只有各1000,而我的实际数据是各100万+,所以才会出现你可以模拟,而我这个却太大不能处理的情况。
不知道把上面两句改成以下两句:
a=ceil(ranuni(123)*1e6);
b=ceil(ranuni(123)*1e6);
之后,看看还能模拟不。
The same result can be achieved by using data step + sort. BTW if a and b have 1m possible values each, then combined one is very possible to be unique provided that total obs is 8m.
170 options FULLSTIMER;
171
172 data t1;
173 do i=1 to 8e6;
174 a=ceil(ranuni(123)*1e6);
175 b=ceil(ranuni(123)*1e6);
176 output;
177 end;
178 drop i;
179 run;
NOTE: The data set WORK.T1 has 8000000 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 1.43 seconds
user cpu time 1.24 seconds
system cpu time 0.18 seconds
Memory 180k
OS Memory 6520k
Timestamp 6/17/2011 11:38:10 PM
180
181 proc sort data=t1 out=t2 nodupkey;
182 by a b;
183 run;
NOTE: There were 8000000 observations read from the data set WORK.T1.
NOTE: 0 observations with duplicate key values were deleted.
NOTE: The data set WORK.T2 has 8000000 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 4.21 seconds
user cpu time 6.83 seconds
system cpu time 0.67 seconds
Memory 66535k
OS Memory 71996k
Timestamp 6/17/2011 11:38:14 PM
184
185 data t3;
186 set t2;
187 by a b;
188 if first.b then cnt=0;
189 cnt+1;
190 if last.b then output;
191 run;
NOTE: There were 8000000 observations read from the data set WORK.T2.
NOTE: The data set WORK.T3 has 8000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 1.90 seconds
user cpu time 1.35 seconds
system cpu time 0.53 seconds
Memory 215k
OS Memory 6520k
Timestamp 6/17/2011 11:40:07 PM