楼主: redaring
13569 15

如何导入一个有两种分隔符的txt [推广有奖]

11
bobguy 发表于 2009-12-9 12:14:25
redaring 发表于 2009-12-8 13:48
数据文件是txt格式的,第一行的分隔符为tab,后面的分隔符都为逗号,在导入时想把第一行作为变量名,其他所有行为数据,可是分隔符不知道该怎么设置。我用了proc import 导入,可是delimiter只能指定一个。由于我有好几百个数据文件,不能一个去把第一行的tab替换成逗号。
刚刚开始学习sas几天,很多基本的问题都不懂,希望各位前辈指点一下。谢谢了!

以下是我自己写的,可是没能实现我想要的效果:
proc import datafile="d:\testfile.txt" out=testfile;
delimiter=',';
getnames=yes;
run;
If the files are many and already in text(ascii), then it is much fast to use SAS standard infile and input statements. you can use firstobs option to start reading the second line where you data starts. SAS provides many ways to deal with input text files.

If all files have the same "formats" in the following sense,
file1
**************************
var1    var2    var3    var4
1,2,c,5
**************************
file2
*************************
var1    var2    var3    var4
1,2,c,5
36,4,d,6
**************************

The you may think about to use filevar option in infile stetement.

Here is a sample program.

*************************;
data _null_;
  infile 'c:\downloads\test*.txt'  ;
  input ;
  put _infile_;
run;

%let loc= c:\downloads\;

filename fnlist pipe "dir &loc.test*.txt /w";

data test;
   length fn $128  x1  x2  x4 8 x3 $1;;
   infile fnlist;
   input fn @@;
   if upcase(substr(fn,1,4))='TEST' then do;
      filen="&loc"||fn;
      infile dummy filevar=filen firstobs=2  dsd end=eof truncover;
      do while(not eof);
         input x1 x2 x3 x4;
         output;
      end;
    end;
    *drop fn;
run;

proc print; run;
*********************

Here is the log;

251  data _null_;
252    infile 'c:\downloads\test*.txt'  ;
253    input ;
254    put _infile_;
255  run;

NOTE: The infile 'c:\downloads\test*.txt' is:
      File Name=c:\downloads\test.txt,
      File List=c:\downloads\test*.txt,RECFM=V,
      LRECL=256

var1    var2    var3    var4
1,2,c,5
NOTE: The infile 'c:\downloads\test*.txt' is:
      File Name=c:\downloads\test2.txt,
      File List=c:\downloads\test*.txt,RECFM=V,
      LRECL=256

var1    var2    var3    var4
1,2,c,5
NOTE: The infile 'c:\downloads\test*.txt' is:
      File Name=c:\downloads\test3.txt,
      File List=c:\downloads\test*.txt,RECFM=V,
      LRECL=256

var1    var2    var3    var4
1,2,c,5
36,4,d,6
NOTE: 2 records were read from the infile 'c:\downloads\test*.txt'.
      The minimum record length was 7.
      The maximum record length was 19.
NOTE: 2 records were read from the infile 'c:\downloads\test*.txt'.
      The minimum record length was 7.
      The maximum record length was 19.
NOTE: 3 records were read from the infile 'c:\downloads\test*.txt'.
      The minimum record length was 7.
      The maximum record length was 19.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


256
257  %let loc= c:\downloads\;
258
259  filename fnlist pipe "dir &loc.test*.txt /w";
260
261  data test;
262     length fn $128  x1  x2  x4 8 x3 $1;;
263     infile fnlist;
264     input fn @@;
265     if upcase(substr(fn,1,4))='TEST' then do;
266        filen="&loc"||fn;
267        infile dummy filevar=filen firstobs=2  dsd end=eof truncover;
268        do while(not eof);
269           input x1 x2 x3 x4;
270           output;
271        end;
272      end;
273      *drop fn;
274  run;

NOTE: The infile FNLIST is:
      Unnamed Pipe Access Device,
      PROCESS=dir c:\downloads\test*.txt /w,RECFM=V,
      LRECL=256

NOTE: The infile DUMMY is:
      File Name=c:\downloads\test.txt,
      RECFM=V,LRECL=256

NOTE: The infile DUMMY is:
      File Name=c:\downloads\test2.txt,
      RECFM=V,LRECL=256

NOTE: The infile DUMMY is:
      File Name=c:\downloads\test3.txt,
      RECFM=V,LRECL=256

NOTE: 8 records were read from the infile FNLIST.
      The minimum record length was 0.
      The maximum record length was 50.
NOTE: 1 record was read from the infile DUMMY.
      The minimum record length was 7.
      The maximum record length was 7.
NOTE: 1 record was read from the infile DUMMY.
      The minimum record length was 7.
      The maximum record length was 7.
NOTE: 2 records were read from the infile DUMMY.
      The minimum record length was 7.
      The maximum record length was 8.
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.TEST has 4 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.03 seconds


275
276  proc print; run;

NOTE: There were 4 observations read from the data set WORK.TEST.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


*******************************

Here is the listing;

Obs       fn        x1    x2    x4    x3

  1     test.txt      1     2     5    c
  2     test2.txt     1     2     5    c
  3     test3.txt     1     2     5    c
  4     test3.txt    36     4     6    d

12
zespri 发表于 2009-12-9 12:23:18
9# redaring


An ASCII comma is '2C'x
  and an ASCII tab is '09'x so you could use:
      INFILE fileref DLM='2C09'x;

or


       DATA _NULL_; CALL SYMPUT('TAB','09'x); RUN;                              

         INFILE filere DLM=",&TAB";

13
redaring 发表于 2009-12-9 13:39:28
谢谢!
bobguy 的程序让我的工作更加直接和快速的完成,正如你所说的,我所有文件的格式都是一样的,我刚开始学习,你的程序我要好好学习一下,希望很快自己也可以写出来解决遇到的问题。

zedpri 的解释非常清楚和到位,让我长见识了,呵呵,谢谢!

14
lwien007 发表于 2009-12-9 13:54:43
'2C09'x是同时把逗号和TAB作为分隔符,2C是逗号,09是TAB,分别是逗号和TAB对应的ASCII码的十六进制值
如果楼主直接在数据文件中替换分隔符的话,可以用sas提供的正则表达式。
  1. filename dst 'd:\ts.txt';
  2. data _null_;
  3.         length varname $1000;
  4.         infile dst obs=1;
  5.         input varname;
  6.         re=prxparse("s/\t/,/");
  7.         call prxchange(re,-1,varname);
  8.         file dst obs=1;
  9.         put varname;
  10. run;
复制代码
当然也可以使用perl语言直接修改文件,效果是一样的,perl也有windows版本。

15
huangruiji110 发表于 2011-8-16 12:38:12
bobguy 发表于 2009-12-9 12:14
If the files are many and already in text(ascii), then it is much fast to use SAS standard inf ...
请问如何把每个文件放在不同的SAS文件中?比如说我有100个文件(A1,A2,…,A100),如何把这100个文件循环导入SAS中?求高手指教
@#¥“俺不傻可俺啥都不懂” &*%

16
huangruiji110 发表于 2011-8-16 13:00:54
bobguy 发表于 2009-12-9 12:14
If the files are many and already in text(ascii), then it is much fast to use SAS standard inf ...
请问如何循环导入多个文件?比如有我100个文件(A1,A2,A3,....,A100),一个一个导入的话太麻烦了,如何把这100个文件一起导入?不用合并
@#¥“俺不傻可俺啥都不懂” &*%

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-31 23:32