楼主: redaring
6590 13

[原创博文] 奖励20论坛币,紧急求助!Probable disk full condition. [推广有奖]

11
soporaeternus2 发表于 2010-1-24 22:16:02
FAT32格式的盘现在可以有单个4G以上的文件,这个我忘记了......
我想楼主的数据文件应该不在C盘上

12
redaring 发表于 2010-1-25 01:19:12
谢谢楼上各位!
经过多番尝试,证明了,的确是硬盘格式的问题。我的c盘是fat32的,改成ntfs以后,就可以运行了。
也许很多朋友没有处理过体积这么大的数据,希望我这个问题可以为以后的朋友提供点帮助,不会弄得像我这样狼狈。

13
bobguy 发表于 2010-1-25 02:56:50
redaring 发表于 2010-1-25 01:19
谢谢楼上各位!
经过多番尝试,证明了,的确是硬盘格式的问题。我的c盘是fat32的,改成ntfs以后,就可以运行了。
也许很多朋友没有处理过体积这么大的数据,希望我这个问题可以为以后的朋友提供点帮助,不会弄得像我这样狼狈。
I am glad that you solve the proble. I am about to advise you other approaches. We don't have to hang ourselves on one tree, there are many other options with SAS. Here is a couple,

1) index  - which is 'cheaper' than sort
2)take out keys only from big file + (_N_ observation point) and save as keysonly file. This file will be much smaller. Sort the smaller file by keys + _N_. Create the new file with the order in smaller file and point access to the bigger file.
According to the log, the approach is faster then a naked sort by a factor of 2+. That is a surprise. Your case may vary.

284  data t1;
285   retain x1-x2000 '222222222222';
286    do i=1 to 10000 ;
287       key=ceil(ranuni(99)*10000);
288       a= ranuni(99); b=ranuni(99);
289       output;
290   end;
291   drop i;
292  run;

NOTE: The data set WORK.T1 has 10000 observations and 2003 variables.
NOTE: DATA statement used (Total process time):
      real time           22.89 seconds
      cpu time            1.29 seconds


293
294  data tmp/view=tmp;
295     set t1(keep=key);
296     original_ord=_n_;
297  run;

NOTE: DATA STEP view saved on file WORK.TMP.
NOTE: A stored DATA STEP view cannot run under a different operating system.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.00 seconds


298
299  proc sort data=tmp out=tmp_srt; by key original_ord; run;

NOTE: There were 10000 observations read from the data set WORK.TMP.
NOTE: View WORK.TMP.VIEW used (Total process time):
      real time           0.17 seconds
      cpu time            0.17 seconds

NOTE: There were 10000 observations read from the data set WORK.T1.
NOTE: The data set WORK.TMP_SRT has 10000 observations and 2 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.21 seconds
      cpu time            0.21 seconds


300
301  data t2;
302    set  tmp_srt;
303    set t1 point=original_ord;
304  run;

NOTE: The variable original_ord exists on an input data set, but was also specified in an I/O
      statement option.  The variable will not be included on any output data set.
NOTE: There were 10000 observations read from the data set WORK.TMP_SRT.
NOTE: The data set WORK.T2 has 10000 observations and 2003 variables.
NOTE: DATA statement used (Total process time):
      real time           27.89 seconds
      cpu time            1.82 seconds


305
306
307  proc sort data=t1 out=t3; by key;run;

NOTE: There were 10000 observations read from the data set WORK.T1.
NOTE: The data set WORK.T3 has 10000 observations and 2003 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           1:11.86
      cpu time            3.95 seconds

14
pos1623 发表于 2010-1-26 00:32:43
這個問題好難喔!

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2026-1-2 03:33