楼主: cufe万事通
3953 6

SAS里用sql语句between and实现的匹配功能能否用hash实现? [推广有奖]

  • 0关注
  • 2粉丝

博士生

70%

还不是VIP/贵宾

-

威望
0
论坛币
0 个
通用积分
7.9251
学术水平
1 点
热心指数
1 点
信用等级
0 点
经验
27043 点
帖子
105
精华
0
在线时间
503 小时
注册时间
2011-9-23
最后登录
2023-12-9

楼主
cufe万事通 发表于 2013-10-28 09:15:13 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
SAS里有两表要合并,表一中有字段a,b;表二中有字段c;两表合并的条件是c的值在a和b之间。我只知道sql的between and语句可以实现,但运行速度很慢,不知道hash能否实现相同功能呢?求指点。。。
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Between sql语句 HASH Has Bet between sql语句

沙发
邓贵大 发表于 2013-10-28 12:10:21
depends on you data
if those [a,b] intervals are mutually exclusive, there're ways to fine-tune the join.
on the other hand, if there are many overlaps, it's beyond me.
Be still, my soul: the hour is hastening on
When we shall be forever with the Lord.
When disappointment, grief and fear are gone,
Sorrow forgot, love's purest joys restored.

藤椅
yongyitian 发表于 2013-10-28 13:02:19
Try if this code
The results may be slightly different from that of sql, due to the replicate.
  1. data a1(keep=ID a b) b1(keep= ID c c1);
  2.     do Id = 1 to 10;
  3.            a = int(ranuni(12345)*50);
  4.            b = int(ranuni(12345)*50);
  5.            output a1;
  6.          end;
  7.     do Id = 1 to 10;
  8.            c = int(ranuni(12345)*50);
  9.            c1 = int(ranuni(12345)*50);
  10.            output b1;
  11.          end;
  12. run;

  13. data ab;
  14.   if _n_ = 1 then do;
  15.       set b1;
  16.       declare hash h(dataset: 'b1');
  17.        h.definekey('c');
  18.        h.definedata('c', 'c1');
  19.        h.definedone();
  20.      declare hiter iter('h');   
  21.   end;
  22.     set a1;
  23.       rc = iter.first();
  24.     do while (rc=0);
  25.       if (a <= c <=b) or (b<=c <= a) then output;
  26.       rc = iter.next();
  27.     end;
  28. run;
复制代码
已有 1 人评分学术水平 热心指数 信用等级 收起 理由
Tigflanker + 1 + 1 + 1 观点有启发

总评分: 学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

板凳
cufe万事通 发表于 2013-10-28 14:42:08
yongyitian 发表于 2013-10-28 13:02
Try if this code
The results may be slightly different from that of sql, due to the replicate.
谢谢。确实达到了目的,还有一个问题:hash里使用不等式条件是不是比使用等式条件慢?

报纸
邓贵大 发表于 2013-10-28 20:50:45
"If neither contributing dataset is sorted, but one of them does fit into memory, PROC SQL will choose to process the query with a Hash Join (sqxjhsh)" (http://support.sas.com/techsup/technote/ts553.html)
Make sure you have enough memory, and the SQL join will be as efficient as Yongyitian's program.
The option is buffersize (http://support.sas.com/resources ... ings10/139-2010.pdf)
In either method, a Cartisian product is involved, so they should be spending very similar amount of time. It's not a problem as long as you can get your results in a reasonable amount of time.
Be still, my soul: the hour is hastening on
When we shall be forever with the Lord.
When disappointment, grief and fear are gone,
Sorrow forgot, love's purest joys restored.

地板
邓贵大 发表于 2013-10-28 20:51:08
"If neither contributing dataset is sorted, but one of them does fit into memory, PROC SQL will choose to process the query with a Hash Join (sqxjhsh)" (http://support.sas.com/techsup/technote/ts553.html)
Make sure you have enough memory, and the SQL join will be as efficient as Yongyitian's program.
The option is buffersize (http://support.sas.com/resources ... ings10/139-2010.pdf)
In either method, a Cartisian product is involved, so they should be spending very similar amount of time. It's not a problem as long as you can get your results in a reasonable amount of time.
已有 1 人评分学术水平 热心指数 收起 理由
zhou.wen + 1 + 1 观点有启发

总评分: 学术水平 + 1  热心指数 + 1   查看全部评分

Be still, my soul: the hour is hastening on
When we shall be forever with the Lord.
When disappointment, grief and fear are gone,
Sorrow forgot, love's purest joys restored.

7
cufe万事通 发表于 2013-11-3 11:14:45
thank you

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2026-1-1 07:25