楼主: warrenzhang
6029 9

急!请教一个关于sas导入excel文件的问题 [推广有奖]

  • 0关注
  • 3粉丝

已卖:290份资源

副教授

62%

还不是VIP/贵宾

-

威望
0
论坛币
4239 个
通用积分
2.3755
学术水平
64 点
热心指数
76 点
信用等级
54 点
经验
623 点
帖子
864
精华
0
在线时间
686 小时
注册时间
2007-3-9
最后登录
2011-8-4

楼主
warrenzhang 发表于 2010-4-23 21:50:46 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
我用proc import导入excel文件。这个excel文件性质不太好,第一个变量x前500条为空,第二个变量y前750条为空,但是三个变量都是数值型的。导入程序如下:

PROC IMPORT OUT= WORK.a
            DATAFILE= "Z:\stockstudy\test.xls"
            DBMS=EXCEL REPLACE;
     RANGE="Sheet1$";
     GETNAMES=YES;
     MIXED=NO;
     SCANTEXT=YES;
     USEDATE=YES;
     SCANTIME=YES;
RUN;

问题在于生成的数据集中第一、第二个变量被sas搞成字符串型,而不是我所需要的数值型。有人了解如何解决吗?谢谢!!!

数据test.xls已附上。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:EXCEL xcel exce cel replace excel import 字符串 EXCEL 程序

沙发
guoluo 发表于 2010-4-23 22:41:07
把Excel文件另存为CSV文件,然后

PROC IMPORT OUT= WORK.a
            DATAFILE= "Z:\stockstudy\test.xls"
            DBMS=CSV REPLACE;
  GUESSINGROWS=1000;
RUN;

藤椅
crackman 发表于 2010-4-23 22:59:07
我判断一下你这个问题估计是因为
在读入数据的时候,SAS对于初始变量默认为数值型缺失值 即为一个点。
初始变量的长度是根据第一个观测对象,相应变量的长度来决定的,如果没有其他格式化语句的话
第一个观测的三个变量值分别为空格 空格 0.666
在SAS里面,空格是字符变量的缺失值,那么第一个变量和第二个变量 SAS系统认为是字符型变量,默认长度为2,所以对于后面的值来说就全部为字符缺失了
给你一个建议就是 你在EXCEL第一个观测里面 把前两个变量值填上,再导入就没得问题了

板凳
warrenzhang 发表于 2010-4-23 23:02:29
3# crackman
你的方法可行。只是问题是我实际所用数据有200多列,都有那种问题。一个一个改工作量太大。不过还是谢谢你!

报纸
tonywfu 发表于 2010-4-23 23:03:04
也可以用 SAS Import Wizard, 选 User-defined formats, 自己改成数值型

地板
warrenzhang 发表于 2010-4-23 23:10:59
1# warrenzhang

首先感谢楼上2位的热心帮助!!!

终于找到问题的根源了,原来是微软jet engine的设置问题。
参见SAS帮助:


MIXED=YES | NO; (available only for DBMS=EXCEL) specifies whether to convert numeric data into character data values for a column that contains mixed data types. The Microsoft Jet/Excel engine handles this option.
YESassigns a SAS character type for the column and converts all numeric data values to character data values when mixed data types are found. When you specify MIXED=YES, the connection is set in import mode and no updates are allowed.
CAUTION:Due to a limitation in the Microsoft Jet/Excel engine, using the MIXED=YES option might result in improper text variable lengths. It is strongly recommended that you first follow the steps below and check that you are receiving the results that you expect before you continue.   From Microsoft Windows, follow these steps to change the value of TypeGuessRows.
  • Start Run In the Run window dialog box, enter Regedt32 and select OK.
  • Open the following key in the Registry Editor window:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
  • Double-click TypeGuessRows.
  • In the Edit DWORD dialog box under Base, select Decimal, enter 0 in the Value field, and select OK.
  • Exit from the Registry Editor window.
NOassigns numeric or character type for the column, depending on the majority of the type data that is found. Numeric data in a character column is imported as missing values, and character data in a numeric column is imported as missing values.
The following registry settings can affect the behavior of the MIXED= option. They are located in the [\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] key of the Microsoft Windows registry.
Registry Settings for the MIXED OptionTypeGuessRowsan integer type with a default value of 8. You can use the number of rows in the worksheet range in scans to determine column types. If you set this type to 0, all rows in the range are checked. Microsoft states that the valid range of TypeGuessRows is 0-16. However, you could set as high as 16384, and it would still operate correctly.CAUTION:Not only can changing the TypeGuessRows value cause a scan to fail if you set it higher than 16384, but it affects any software that uses the Microsoft Jet provider to access Excel file data, including accessing Excel data in a Microsoft Access database. The TypeGuessRows value is registered with and controlled by Microsoft. It is therefore recommended that you set the value to 0.   ImportMixedTypesa string type with a default value of Text. If a column contains more than one type of data while scanning TypeGuessRows rows, the type of the column is determined to be Text if the value of the setting is Text. If the value of the setting is Majority, the most common type in the column determines the column type. For the MIXED=YES option to work correctly, you should you change TypeGuessingRows to 0 in the Microsoft Windows registry so that all the rows in the specified range are scanned. As a result, when you use MIXED=YES, the Jet provider always assigns character type for columns with data of mixed data types and converts numeric data to character data.


也就是说,默认状态MS Jet是把混合型数据当成字符串(text)处理的,因此如果import的选项mixed=yes打开的话,就会有可能出现上述结果。根据提示,应该把注册表项TypeGuessingRows改0,ImportMixedTypes改Majority。这样做才能从根本上解决问题。


PS。强烈鄙视微软!!!鄙视EXCEL!!!遗憾的是,许多公司都在把excel当数据库用,简直是莫名其妙。用csv格式多好啊,xls记录2维表格非常臃肿,数据处理也不方便。

7
warrenzhang 发表于 2010-4-23 23:12:52
5# tonywfu
不好意思,从来没找到过那一项。您能不能发个截图过来?谢谢!

8
crackman 发表于 2010-4-23 23:17:48
PROC IMPORT OUT= WORK.a
            DATAFILE= "Z:\stockstudy\test.xls"
            DBMS=EXCEL REPLACE;
     RANGE="Sheet1$";
     GETNAMES=YES;
     MIXED=NO;/*把NO改成YES*/
     SCANTEXT=YES;
     USEDATE=YES;
     SCANTIME=YES;
RUN;
data a;
set a;
x=input(x,f5.3);
y=input(y,f5.3);
z=z;
run;

OK了

9
chishuai1226 发表于 2012-4-5 17:43:00
Thanks for helping me,it's very useful.

10
Michael1941 发表于 2016-8-26 21:10:46
warrenzhang 发表于 2010-4-23 23:10
1# warrenzhang

首先感谢楼上2位的热心帮助!!!
我也查到了这个帮助,也和楼主一样的修改了注册表,但是还是不行呢?怎么回事?

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2026-1-1 10:12