数据库:ORACLE
网络方式:VPN
需求:读取多张含json的数据的表,当前数据库内包含大量数据,只需要处理近一周的数据即可。
核心问题:数据读取太慢!
小障碍:1.因数据库内无索引,部分表无可用时间字段进行筛选。
2.VPN网络不稳定导致不能测试代码读数效率。
求教可以有效提高SAS读取数据的效率的方式。
个人已尝试多种方式,但是效率颠覆之前的一些认知,并未有效提高效率,方式如下:
方式1:*col2为案件的通用编号
- connect to oracle as data (user=XXX password=XXXXXX path=XXXX);
- create table TEMP as
- select * from connection to data(
- select * from(
- select a.col1,
- c.col2,
- row_number()over(partition by c.col2 order by c.col2,c.time desc) as rm
- from AA.TAB1 a
- inner join AA.TAB2 b on a.KEY1=b.KEY1
- inner join AA.TAB3 c on c.KEY2=b.KEY2
- where b.id is not null
- and to_char(c.time,'yyyy-mm-dd')>=to_char(sysdate-1,'yyyy-mm-dd')
- )where rm=1
- );
- disconnect from data;
- quit;
方式2:
- libname AA oracle user=XXX password=XXXXXX path='XXXX' schema=AA;
- create table TEMP as
- select a.col1,
- c.col2,
- c.time
- from AA.TAB1 a
- inner join AA.TAB2 b on a.KEY1=b.KEY1
- inner join AA.TAB3 c on c.KEY2=b.KEY2
- where b.id is not null
- and to_char(c.time,'yyyy-mm-dd')>=to_char(sysdate-1,'yyyy-mm-dd');
- quit;
- proc sort data=TEMP;by KEY2 descending time;run
- data TEMP ;
- set TEMP;
- by KEY2;
- if first.KEY2;
- run;
方式3:
- libname AA oracle user=XXX password=XXXXXX path='XXXX' schema=AA;
- data TAB1;
- set AA.TAB1(keep=KEY1 col1);
- run;
- data TAB2;
- set AA.TAB2(
- keep=KEY1 KEY2
- where=(id^='')
- );
- run;
- data TAB3;
- set AA.TAB3(
- keep=KEY2 col2 time
- where=(to_char(time,'yyyy-mm-dd')>=to_char(sysdate-1,'yyyy-mm-dd'))
- );
- run;
- proc sort data=TAB3;by KEY2 descending time;run
- data TAB3 ;
- set TAB3;
- by KEY2;
- if first.KEY2;
- run;
- proc sql;
- create table TEMP as
- select a.col1,
- c.col2
- from TAB1 a
- inner join TAB2 b on a.KEY1=b.KEY1
- inner join TAB3 c on c.KEY2=b.KEY2;
- quit;
- data TAB2;
- set AA.TAB2(
- keep=KEY1 KEY2
- firstobs=10000000 *保证前10000000条数据不涉及当前需求
- );
- where id^=''
- run;
- data TAB2;
- set AA.TAB2(keep=KEY1 KEY2);
- _n_>=10000000 *保证前10000000条数据不涉及当前需求
- where id^=''
- run;
求解惑,谢谢!!