楼主: liujx80
4689 19

真正SAS编程高手请指教 [推广有奖]

  • 0关注
  • 0粉丝

高中生

60%

还不是VIP/贵宾

-

威望
0
论坛币
34 个
通用积分
0
学术水平
0 点
热心指数
0 点
信用等级
0 点
经验
730 点
帖子
47
精华
0
在线时间
4 小时
注册时间
2007-9-4
最后登录
2017-6-3

楼主
liujx80 发表于 2007-9-5 01:27:00 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
In USA, one teacher must have some certificates to be assigned to teach a subject. One certificate could be used in several assignments, for example, teaching math needs either cert1 or cert2 or cert3.
There are two datasets, ‘certificate’ which shows how many certificates one 'id' has, at most 30 certs.
the ‘assignment’shows what type of certificates one assignment needs. If one assignment have more than 2 qualified_cert,
which means either of them could be counted.

data certificate;
input id (certArea1-certArea30) ($) ;
datalines;
1 32 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 32 . 23 . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 23 21 32 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 30 24 25 . . . . . . . . . . . . . . . . . . . . . . . . . . .
;
run;
data assignment;
input assign $ (qualified_cert1- qualified_cert13) ($) ;
datalines;
a 11 . . . . . . . . . . . .
b 30 11 . . . . . . . . . . .
c 23 . . . . . . . . . . . .
d 32 23 . . . . . . . . . . .
;
run;

My question is, how to calculate how many people have the required certificate for
each assignment. If one person can be qualified for two jobs, then count ‘1/2’for each
assignment.

Basically, I have to first output each value of cert1-cert30 for each observation,
then use that value to check if this certificate meets the requirement of
assignment(go through the assignment dataset). To avoid count one person with
multiple certificates as multiple person, we have to go through the assignment
dataset with all his/her certificates for one teacher to get the total number of his
match, then we go through the assignment dataset again.

Would any body tell me how to write this code?


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:SAS编程 编程高手 请指教 certificates Certificate 高手 SAS 指教

回帖推荐

kofsphere 发表于5楼  查看完整内容

希望是你要求的结果。 这里必须要说明一下,因为没有看你的sample,所以做这个假设- - - 科目种类,即23,32这样的,代表math或者eng的代码种类的数量是有限的!(或者说,即使数量很大,也有办法替换)- - - 虽然我也相信不会出现几百种科目-_-||| 那么: 首先,替换你的raw data values,比如23 32这类的,既然他们没有算术上的意义,那么一律替换为除1以外的素数,举个例子,如果整个dataset中出现过的科目种类是11,21,23, ...

本帖被以下文库推荐

沙发
liujx80 发表于 2007-9-5 01:32:00
any one trying to help must keep one fact in mind that 'certificate' dataset is a over 500,000 records dataset. So optimization must be considered.
I tried one code with a lot of MACRO which took me 6-hour CPU time. So I don't think my code is usable and that's why I'm asking help for master of SAS.

藤椅
kofsphere 发表于 2007-9-5 03:44:00

对于你的dataset有点疑惑,eg:

data certificate;
input id (certArea1-certArea30) ($) ;
datalines;
1 32 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 32 . 23 . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 23 21 32 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 30 24 25 . . . . . . . . . . . . . . . . . . . . . . . . . . .

如上,比如certArea1,为何它对应的cert的种类会不同?即,certarea1 在第二个第三个obs里面分别是32,23 -是说

第二三个人在certarea1里面分别取得了32和23分数么?(但你用的又是char)

板凳
liujx80 发表于 2007-9-5 04:11:00
thanks for reading my question.

The following explanation might be helpful. You can assume certArea1 is the first Certificate he/she obtained. and CertArea30 is the last one.
So id ='1' obtained his certificate which is '32' ,for example ,32='math', id='3' obtained his certificate which is '23', for example 23='english'.

Let's put it in another way, the value of certArea1-certArea30 specify the area of certificate.

Please let me know if I haven't put it clear.

报纸
kofsphere 发表于 2007-9-5 09:49:00

希望是你要求的结果。

这里必须要说明一下,因为没有看你的sample,所以做这个假设- - - 科目种类,即23,32这样的,代表math或者eng的代码种类的数量是有限的!(或者说,即使数量很大,也有办法替换)- - - 虽然我也相信不会出现几百种科目-_-|||

那么:

首先,替换你的raw data values,比如23 32这类的,既然他们没有算术上的意义,那么一律替换为除1以外的素数,举个例子,如果整个
dataset中出现过的科目种类是11,21,23,32,那么,就把它们全部对应替换为3,5,7,11,如果科目种类更多,那么就以此替换下去,换成质数。同时,把certarea1-certarea30里面所有的missing values替换为 1.这个在data step中即使是50w obs相信也可以很快完成。

然后,在新的dataset中增加一个variable,比如叫做prod,作为certarea1-certarea30的乘积。

这样一来,只要各个assignment中的值能够被prod整除,就说明改obs 可以做这个assignment从而避免了你文中提到的用2个表的数据做cross comparison,也不存在人数重复计算的问题。

但是由于没有你的dataset sample,所以我不把整个code写完,下面是部分草稿,仅为了阐述我上面说的。

data certificate ;
input id (certArea1-certArea30) ($) ;
datalines ;
1 11 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 11 . 5 . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 5 13 11 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 7 17 19 . . . . . . . . . . . . . . . . . . . . . . . . . . .
;
run ;
data assignment ;
input assign $ (certArea1-certArea30) ($) ;
datalines ;
a 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
b 7 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
c 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
d 11 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
;
run ; *这里我直接替换了数据 ;
data certificate1 ;
set certificate ;
array certa[30] certarea1 - certarea30 ;
array certa_n[30] certarea_n1 - certarea_n30 ;
do i= 1 to 30 ;
certa_n= input( certa, $8.) ;
end ;
drop i certarea1 - certarea30 ;
run ;

* 把char换成numeric ;

data certificate2 ;
set certificate1 ;
array certa_n[30] certarea_n1 - certarea_n30 ;
do i= 1 to 30 ;
if certa_n=. then certa_n= 1 ;
end ;
prod = 1 ;
do j =1 to 30 ;

prod =prod*certa_n[j] ;
end ;
keep id prod ;
run ; *加了一个var叫做prod ;
data certificate3 ;
set certificate2 ;
if mod(prod,7)=0 or mod(prod,3)=0 then qualified= 1 ;
else qualified = 0 ;
run ;

* 如果可以整除,那么qualified写为1 ,(这里我简略了下,就直接用assign a来测试了下。)



[此贴子已经被作者于2007-9-5 9:59:20编辑过]

已有 1 人评分经验 论坛币 收起 理由
bakoll + 3 + 10 精彩帖子

总评分: 经验 + 3  论坛币 + 10   查看全部评分

地板
liujx80 发表于 2007-9-5 11:12:00
I can see you really have strong math foundation. But I'm not sure if your result is correct.

Let's put it simpler. Let's suppose we only have four persons 'id=1,2,3,4. Each person has up to 3 certificates .
there are only four assignments and each assignment has up to 2 qualified certificates.

data certificate;
input id (certArea1-certArea3) ($) ;
datalines;
1 32 11 .
2 32 . 23
3 23 21 32
4 30 24 25
;
run;
data assignment;
input assign $ (qualified_cert1- qualified_cert2) ($) ;
datalines;
a 11 .
b 30 11
c 23 .
d 32 23
;
run;

Let's do it one by one;
for 'id=1', he has two certs '32','11', and can meet the requirement of three assignments 'a' , 'b','d'. Thus any assignment
counting 'id=1' will be 1/3. now :
a: 1/3
b: 1/3
c: 0
d: 1/3

for 'id=2', he has two certs '32' '23', and can meet the requirement of two assignments 'c' 'd', but three qualified certs,
c:23, d:32 , d:23. thus any assignment counting 'id=2' will be 1/3. now:
a: 1/3
b: 1/3
c: 1/3
d: 1/3+1/3+1/3= 1

after 'id=3', the assignment will be :
a:1/3
b: 1/3
c: 1/3+1/3= 2/3
d: 1+1/3+1/3=1.67

after 'id=4', the assignment will be:
a:1/3
b: 1/3+1= 1.3
c: 2/3
d: 1.67

So is your result the same?


7
kofsphere 发表于 2007-9-5 19:30:00

上面说的只在于回答‘My question is, how to calculate how many people have the required certificate for
each assignment.’

如果你要求 If one person can be qualified for two jobs, then count ‘1/2’for each
assignment.这个稍作修改也容易,

此外,刚刚发现,你误解了我的意思,我并没有把每个obs放到4个assign里面去比较!!而是
反过来把各个assign去和每个obs比较整除,并赋上对应的值0 or 1

首先,是上面的codes得出的结构表:
1 11 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2 11 . 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3 5 13 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
4 7 17 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2261
这里最后一列是前面非missing value的乘积,
那么,在比较这个表中最后一列是否能被assignment表中各个assign对应值整除后,
可以得到追加的一列qualified,1代表可以做,2代表不可以,如下

1 11 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1
2 11 . 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 0
3 5 13 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 0
4 7 17 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2261 0 (具体参照我修改的那个assignment表,obs a)

那么现在,如果你要求的不是‘人次’而是具体平均后的’人数‘,那么只需要加入几列,如下

obs a b c d sum _a _b _c _d
1 11 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1 1 0 1 3 1/3 1/3 0 1/3
2 11 . 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 0 0 1 1 2 0 0 1/2 1/2
3 5 13 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 1 0 1 1 3 1/3 0 1/3 1/3
4 7 17 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2261 0 1 0 0 1 0 1 0 0

在这个数据结构中,比如说上面的红字,a b c 对应的数各自代表obs1 可以教授的科目,因为obs1可以同时教授a和b,d 那么他对a,b d
3门科目的贡献值就是各1/3,即_a 和_b _d在obs1时为1/3, 最后对_a _b _c _d 这样的列求和,即是最后的结果,(如果到此我对你的问题没有其他误解的话)

btw,如果这个问题中,科目真的有成百上千个,那么也只需要加一段code,使得这些科目能够对应的和质数进行替换

具体code你写起来应该很容易,恕我偷个懒了



[此贴子已经被作者于2007-9-5 19:51:44编辑过]

8
liujx80 发表于 2007-9-5 21:23:00
I can see your way is of greate ingenuity. Howevery, I cannot see what's happening .
I think we all appreicate that if you could spend some time writing the code.
all you need to do is use the following dataset:
data certificate;
input id (certArea1-certArea3) ($) ;
datalines;
1 32 11 .
2 32 . 23
3 23 21 32
4 30 24 25
;
run;
data assignment;
input assign $ (qualified_cert1- qualified_cert2) ($) ;
datalines;
a 11 .
b 30 11
c 23 .
d 32 23
;
run;

and give me a final dataset with the following result:
a: 0.33
b: 1.33
c: 0.67
d: 1.67


9
kofsphere 发表于 2007-9-5 22:40:00

既然你要求重复情况也多次计算,我改了下_a - _d的计数,所有代码如下:

data certificate ;
input id (certArea1-certArea30) ($) ;
datalines ;
1 32 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 32 . 23 . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 23 21 32 . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 30 24 25 . . . . . . . . . . . . . . . . . . . . . . . . . . .
;
run ;
data assignment ;
input assign $ (certArea1-certArea30) ($) ;
datalines ;
a 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
b 30 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
c 23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
d 32 23 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
;
run ;
data certificate ;
set certificate ;
array cert[30] certarea1-certarea30 ;
do i= 1 to 30 ;
if cert= 11 then cert= 3 ;
if cert= 21 then cert= 5 ;
if cert= 23 then cert= 7 ;
if cert= 24 then cert= 11 ;
if cert= 25 then cert= 13 ;
if cert= 30 then cert= 17 ;
if cert= 32 then cert= 19 ;
end ;
run ; * replace raw values with prime numbers ;
data assignment ;
set assignment ;
array cert[30] certarea1-certarea30 ;
do i= 1 to 30 ;
if cert= 11 then cert= 3 ;
if cert= 21 then cert= 5 ;
if cert= 23 then cert= 7 ;
if cert= 24 then cert= 11 ;
if cert= 25 then cert= 13 ;
if cert= 30 then cert= 17 ;
if cert= 32 then cert= 19 ;
end ;
run ; * replace raw values with prime numbers ;

data certificate1 ;
set certificate ;
array certa[30] certarea1 - certarea30 ;
array certa_n[30] certarea_n1 - certarea_n30 ;
do i= 1 to 30 ;
certa_n= input( certa, $8.) ;
end ;
drop i certarea1 - certarea30 ;
run ;* transfer to numeric values ;

data certificate2 ;
set certificate1 ;
array certa_n[30] certarea_n1 - certarea_n30 ;
do i= 1 to 30 ;
if certa_n=. then certa_n= 1 ;
end ;
prod = 1 ;
do j =1 to 30 ;
prod =prod*certa_n[j] ;
end ;
keep id prod ;
run ; * substitude missing values, and create the product value for each id;


data certificate3 ;
set certificate2 ;

if mod(prod,3)=0 then qualifieda= 1 ;
else qualifieda = 0 ;

if mod(prod,17)=0 and mod(prod,3) ne 0 then qualifiedb= 1 ;
else if mod(prod,17)ne 0 and mod(prod,3) =0 then qualifiedb= 1 ;
else if mod(prod,17)=0 and mod(prod,3) =0 then qualifiedb= 2 ;
else qualifiedb = 0 ;

if mod(prod,7)=0 then qualifiedc= 1 ;
else qualifiedc = 0 ;

if mod(prod,19)=0 and mod(prod,7) ne 0 then qualifiedd= 1 ;
else if mod(prod,19)ne 0 and mod(prod,7) =0 then qualifiedd= 1 ;
else if mod(prod,19)=0 and mod(prod,7) =0 then qualifiedd= 2 ;
else qualifiedd = 0 ;

sum= qualifieda+qualifiedb+qualifiedc+qualifiedd ;
_a= qualifieda/sum ;
_b=qualifiedb/sum ;
_c=qualifiedc/sum ;
_d= qualifiedd/sum ;
run ; * determine the contribution values based on the mod function ;
proc means noprint ;
output out= final
sum= ;
var _a _b _c _d ;

run ;

10
kofsphere 发表于 2007-9-5 22:41:00

The SAS System
_TYPE_ _FREQ_ _a _b _c _d
0 4 0.33333 1.33333 0.66667 1.66667

[此贴子已经被作者于2007-9-5 22:45:07编辑过]

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-26 08:22