请教各位如何实现一个程序，比较复杂

3关注
0粉丝

已卖：172份资源

硕士生

60%

还不是VIP/贵宾

-

0%

威望: 0 级
论坛币: 2312 个
通用积分: 28.8975
学术水平: 1 点
热心指数: 1 点
信用等级: 1 点
经验: 1051 点
帖子: 76
精华: 0
在线时间: 212 小时
注册时间: 2011-10-11
最后登录: 2024-11-6

楼主

hqs811 发表于 2014-8-19 11:05:07 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

现有数据如下, 三个column （title，authors(不同的name用|隔开），number_authors)

Title                   Authors                                                 Number_authors
Title 1             Name A | Name B                                              2
Title 2             Name A | Name B  | Name C                               3
Title 3             Name A | Name C  | Name E | Name Z                   4
TITLE 4             NAME A                                                          1
TITLE 5                NAME F | NAME Z                                           2
..
大概有20000个observations，其中
1. title是unique的
2. number_authors 取值从1-200.

现在想做的是，对每一个observation生成一系列variables（5个）：at_least_x_authors_repeat. X从1-5取整数值. 变量取值0或1
也就是：at_least_1_authors_repeat； at_least_2_authors_repeat；at_least_3_authors_repeat；at_least_4_authors_repeat；
at_least_5_authors_repeat.
变量描述了在这组数据中有多少作者是重复的

变量举例描述：比如at_least_2_authors_repeat：title3有name A, name C, name E, name Z 四个author 如果其中至少两个名字在别的observation里也同时出现过，那么at_least_2_authors_repeat = 1, 如果任意两个名字在其他observation里都没有同时出现过，那么at_least_2_authors_repeat = 0.
从目前数据看来，A 和 C 在title 2 中出现过，所以title 2 和title3 的at_least_2_authors_repeat取值为1.
同样的,对于at_least_3_authors_repeat,我们需要检验至少三个.

这个程序应该怎样实现的？小弟已经冥思苦想很久了，仍然没有头绪，希望各位指点一二，谢谢！

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：如何实现 observations observation Variables observat 程序 title 如何

回帖推荐

ziyenano 发表于25楼查看完整内容

本帖被以下文库推荐

· Eternal SAS|主题: 62, 订阅: 7

沙发

420948492 发表于 2014-8-19 13:35:24

比较复杂

藤椅

zhengbo8 发表于 2014-8-19 13:42:23

感觉要用hash表，hash表好实现些。

板凳

DataAnalysis007 发表于 2014-8-19 13:43:58

有偿编程，价格优惠哈

报纸

pobel

发表于 2014-8-19 14:25:10

INPUT那句里的两个informat不能正常显示。

data test;
  input Title & $10.          Authors    $60.             Number_authors ;
  authors=upcase(authors);
  cards;
Title 1 Name A | Name B                                                                                  2
Title 2 Name A | Name B  | Name C                                                                   3
Title 3 Name A | Name F  | Name B | Name Z                                                    4
Title 4 NAME A                                                                                                 1
Title 5 NAME F | NAME T                                                                                  2
Title 6 NAME F | NAME Z  | Name A | Name N | Name B | Name X                      6
;

*** Max number of authors;
proc sql noprint;
select distinct max(number_authors) into: maxn
   from test;
quit;
%let maxn=&maxn;
%put *&maxn*;

*** Get 1-5 authors for each title;
data test1;
array author(&maxn) $10;
set test;

      comb=2**number_authors-1;
      fmt="Binary"||cats(number_authors)||".";

do i=1 to comb;
         k=0;
         binary=reverse(putn(i,fmt));
      call missing(of author1- author&maxn);
            do j=1 to number_authors;
                  if substr(binary,j,1)="1" then do;
                        k+1;
                              author(k)=left(scan(authors,j,"|"));
                     end;
            end;
            call sortc(of author&maxn-author1);
            output;
      end;
      keep author1-author5 title;
run;

*** AT_LEAST_x_AUTHORS_REPEAT;
proc sql;
create table test2 as
select distinct author1,author2, author3,author4,author5,title,count(distinct title) as titlenum
      from test1
         group by author1,author2,author3,author4,author5
   order by author1,author2,author3,author4,author5;

quit;

data test3;
set test2;
      by author1-author5;
      array repeat_(5);

      tmp=catx("*", of author1-author5);
      varn=count(tmp,"*")+1;
repeat_(varn)=(titlenum>1);
run;

proc sql;
create table author_repeat as
select distinct title, max(repeat_1) as at_least_1_authors_repeat
                           , max(repeat_2) as at_least_2_authors_repeat
                                             , max(repeat_3) as at_least_3_authors_repeat
                                             , max(repeat_4) as at_least_4_authors_repeat
                                             , max(repeat_5) as at_least_5_authors_repeat
            from test3
   group by title;
quit;

*** Merge;
data wanted;
merge test author_repeat;
      by title;
run;

已有 1 人评分	论坛币	学术水平	热心指数	信用等级	收起理由
hqs811	+ 5	+ 1	+ 1	+ 1	精彩帖子

总评分: 论坛币 + 5 学术水平 + 1 热心指数 + 1 信用等级 + 1 查看全部评分

地板

freerunning_sky

发表于 2014-8-19 15:31:39

只会写简单的程序，不知道是不是想要的结果。。。。

data test;
length Title $20. Authors $200.;
input Title $ Authors $ Number_authors ;
authors=upcase(authors);
cards;
Title1 NameA|NameB 2
Title2 NameA|NameB|NameC 3
Title3 NameA|NameF|NameB|NameZ 4
Title4 NAMEA 1
Title5 NAMEF|NAMET 2
Title6 NAMEF|NAMEZ|NameA|NameN|NameB|NameX 6
;
run;
data test1;
set test;
length Name $20.;
do i=1 to number_authors;
Name=scan(Authors,i,"|");
output;
end;
keep title Name;
proc sort nodupkey;by name title;
run;
proc sql noprint;
create table test2 as
select distinct a.*,count(*) as cnt,(count(*)>=2) as index from test1 a
group by name;
create table test3 as
select distinct a.title,sum(index) as repeat_cnt from test2 a
group by title;
quit;
data want;
merge test test3;
by title;
array repeat(*) at_least_1_authors_repeat at_least_2_authors_repeat at_least_3_authors_repeat
at_least_4_authors_repeat at_least_5_authors_repeat;
do i=1 to dim(repeat);
if i<=repeat_cnt then repeat(i)=1;else
repeat(i)=0;
end;
drop i repeat_cnt;
run;

复制代码

7楼

jingju11 发表于 2014-8-20 05:28:14

和楼上的程序和气相似.京剧
Also read my program about McNemar's test here.
删除错误程序。

已有 1 人评分	学术水平	热心指数	收起理由
hqs811	+ 1	+ 1	热心帮助其他会员

总评分: 学术水平 + 1 热心指数 + 1 查看全部评分

8楼

pobel

发表于 2014-8-20 07:25:27

jingju11 发表于 2014-8-20 05:28
和楼上的程序和气相似.京剧
Also read my program about McNemar's test here.

9楼

jingju11 发表于 2014-8-20 08:01:56

pobel 发表于 2014-8-20 07:25
不太懂最后的PROC STDIZE。
不过只是把单个author拿出来，应该不能确保除at_least_1_authors_repeat之外 ...

也许没有准确理解题意。在这里的假设是在一个题目里，作者是不重复的。
对于每一个作者，求她在不同的题目里出现的次数。比如说，次数为2 .那就说明她写过两本书。比如，题目2 里面的作者B和 C。都写过两本书。then at_least_1_, _2_ all = 1.
title2 name =B cGT1=1
title2 name =C cGT1=1
here, we need to order the cGT1 from large to small.
after transpose, at_least_1_authors_repeat =1 and at_least_2_authors_repeat =1. the rest ==.

did you find any contradictory example?
jingju

10楼

jingju11 发表于 2014-8-20 08:06:36

for example,for title1:
(1)count first
title1 name =a counts =1 cGT1 =0
title1 name =b conts =2 cGT1 =1
For title1 has two authors, A and B, B wrote 2 books while A wrote 1 book (only title1). you would expect only at_least_1_authors_repeat =1.
(2)after descending order,
name =b cGT1=1(if counts > 1)
name=a cGT1=0
(3)after transpose,
at_least_1_authors_repeat =1
at_least_2_authors_repeat =0
at_least_3_authors_repeat =.
at_least_4_authors_repeat =.
at_least_5_authors_repeat =.
(4)after refilling by 0
at_least_1_authors_repeat =1
at_least_2_authors_repeat =0
at_least_3_authors_repeat =0
at_least_4_authors_repeat =0
at_least_5_authors_repeat =0说白了，就是在找对某个题目，有多少个该题目的作者写过至少两本书。比如，如果有三个，那么_1_ to _3_全为1，剩下的为0。
JIngju

请教各位如何实现一个程序，比较复杂 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

回帖推荐

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

初级热心勋章

中级热心勋章

初级信用勋章

初级学术勋章

中级信用勋章

中级学术勋章

高级热心勋章

高级学术勋章

本版微信群

请教各位如何实现一个程序，比较复杂 [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

回帖推荐

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

初级热心勋章

中级热心勋章

初级信用勋章

初级学术勋章

中级信用勋章

中级学术勋章

高级热心勋章

高级学术勋章

本版微信群

扫码加我拉你入群