楼主: liujx80
4718 19

真正SAS编程高手请指教 [推广有奖]

11
liujx80 发表于 2007-9-5 23:20:00
I see what you are doing. You are geniue in math. You transfer the overload of crossing two datasets to yourself. This datasets have around 400 different certificates. And each assignment could have up to 13 qualified certs.According to your coding scheme, your last coding part will be terribly extensive and we have to have a very very good memory to map the relationship. Your 'if /else ' statements will be trully hard. I guess you have extensive programming experience except for SAS and I cannot do that. And I guess your code will run very fast.

I strongly suggest 'Ban Zhu' gives your a big bonus. It will be best if 'Ban Zhu' could also give me some bonus.


12
kofsphere 发表于 2007-9-6 00:43:00

I dont think it could be that terrible, as SAS can do everything
but you do need another algo to match the 400 certs

13
liujx80 发表于 2007-9-6 01:08:00
I'm trying to find another way to solve the problem. Basically I'm trying to obtain the result with crossing dataset calculation.
Kind of like 'data test;
set cert;
do m = 1 to nobs;
set assign point=m nobs=nobs;
....................
'
I'm still working on that. I'll post my code when it's done.

How many SAS expererience do you have? what about other programming experience ?

14
liujx80 发表于 2007-9-6 02:56:00
My Code:

/* certificate dataset: one ssn has multiple certificates */
data cert;
input ssn (certArea1-certArea3) ($) ;
datalines;
1 32 11 .
2 32 . 23
3 23 21 32
4 30 24 25
;
/*assignment dataset: one assignment has multiple qualified certificates */
data assign;
input assignment $ (required_cert1-required_cert2) ($) ;
datalines;
a 11 .
b 30 11
c 23 .
d 32 23
;

/* we first count the number of assignments one teacher can teach with his/her multiple certificates */
data cert1(keep=ssn count );
set cert;
count = 0;
do m = 1 to nobs;
set assign point=m nobs=nobs;
array cert(3) certArea1-certArea3;
* array ctype(30) certType1-certType30;
* array cdate(30) certDate1-certDate30;
array require(2) required_cert1 - required_cert2;

do i = 1 to 3;
if cert(i) ne "" then do;
do j = 1 to 2;
if require(j) ne "" then do;
if cert(i) eq require(j) then count+1;
end;
end;
end;
end;
end;
run;

/* we put the number of counts for each ssn into the certificate dataset */
data cert2;
merge cert(in=a) cert1(in=b);
by ssn;
if a and b;
run;

/* we go through the certificate and assignment dataset again to get the total
* available certificates for each assignment */
data test2(keep=ssn assignment total );
set cert2;
do m = 1 to nobs;
set assign point=m nobs=nobs;
total=0;
array cert(3) certArea1-certArea3;
array require(2) required_cert1 - required_cert2;

do i = 1 to 3;
if cert(i) ne "" then do;
do j = 1 to 2;
if require(j) ne "" then do;
if cert(i) eq require(j) then total+(1/count);
end;
end;
end;
end;
output;
end;
run;
/* the final result */
proc means data = test2 sum maxdec=3;
class assignment;
var total;
title "Final result: the available teachers for each assignment. ";
run;

15
kofsphere 发表于 2007-9-6 05:45:00

看来你以上的codes是要一一比较?
但是不知道当array到了30,而且有50w行之后是否要run很久。。。

此外,回到我上面写的code,你即使不要另外的algo来做素数的替换,我估计了下,就算你完全手动指定(指定400个连续的素数,然后手动输入if/else语句,其实就是改动数字,其余都copy paste)也就大概30mins左右(当然过程是很boring的),但是这样可以避免你上面的do语句,特别是array和obs数目增加后的重复计算量。。。time和effort是会有一个trade off :(

[此贴子已经被作者于2007-9-6 5:56:46编辑过]

16
liujx80 发表于 2007-9-6 08:46:00
In your code '
if mod(prod,19)=0 and mod(prod,7) ne 0 then qualifiedd= 1 ;
else if mod(prod,19)ne 0 and mod(prod,7) =0 then qualifiedd= 1 ;
else if mod(prod,19)=0 and mod(prod,7) =0 then qualifiedd= 2 ;
else qualifiedd = 0 ;'

what should this code be if one assignment, let's supposed to be 'A1', has 13 qualified certificate?

I'm always screwed up with all these if/else.


17
liujx80 发表于 2007-9-6 10:19:00
one negative about your code is that if the certificate code or qualified cert for each assignment has any change, if you have to go through the if/then again. that's not good.

18
liujx80 发表于 2007-9-6 10:29:00
In other programming language, we could calculate the big O stuff. Can we do that in SAS?

19
eijuhz 发表于 2007-9-6 17:30:00
kofsphere热心回答他人疑问,给予200现金奖励

20
kofsphere 发表于 2007-9-6 19:24:00

如果在完整的dataset中,cert的数量很多,例如13个,
那么就不能在用上面的if/else语句(否则是谁都会被折腾死)

我的方法取决与你的assignments有多少obs

1 把assignment transpose一次,然后和cert表merge,最后的表每个id对应所有的assign,这样的坏处是cert表会翻番,比如当assignment有50 个obs,(即50个科目),那么cert表就会达到2500w 行(这个merge我想大概会历时10mins~20mins);

接下来,判断a b c ... 的值就不再用if/else对每个数判断,因为这时每一个id将对应所有的assign,所以每个值将只需判断一次,结果取1和0(不能整除即为0),最后a b c 对应的值占到改行总和的比例即为其真实的贡献值

另一种方法的算法和1相同,但不用merge2个表,而是在处理cert表的datastep中手动定义array,这些array指定assignment中的值,e.g,array array_assign1[30]= ... ; 但坏处是仅仅当assignment的obs不多的时候才比较适宜

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2026-1-16 06:49