楼主: Imasasor
3779 9

如何利用input, infile导入medline文章数据 [推广有奖]

  • 1关注
  • 64粉丝

VIP

已卖:215份资源

学科带头人

33%

还不是VIP/贵宾

-

TA的文库  其他...

超哥喜欢的文章

威望
1
论坛币
47033 个
通用积分
3.1376
学术水平
238 点
热心指数
246 点
信用等级
231 点
经验
37102 点
帖子
849
精华
3
在线时间
2235 小时
注册时间
2012-7-4
最后登录
2024-10-10

初级学术勋章 初级热心勋章 初级信用勋章 中级热心勋章 中级学术勋章

楼主
Imasasor 发表于 2013-8-1 21:37:51 |AI写论文
1888论坛币
medline.txt (116.96 KB)

附件为我在pubmed下载的medline结果,为37篇英文文章(我们这里称作记录),文件中以"PMID-"隔开的为一个记录
我要生成一个SAS数据集,每个记录(即文章)为一个观测,其中的变量需要包含以下几个:PMID;TI;AB;MH。其它的可有可无。
这四个东西分别代表我文章的id, title, abstract 和 mesh terms,其中变量的值是短杠“-”后面的文字,当然,有些文字可能跨行。

另外,MH可能有多个,我需要将所有的MH合并成一个变量

如何有效地利用infile, input进行导入,望高手指点

非常感谢,些许论坛币,聊表寸心
  1. PMID- 23495623
  2. OWN - NLM
  3. STAT- MEDLINE
  4. DA  - 20130318
  5. DCOM- 20130411
  6. IS  - 1049-510X (Print)
  7. IS  - 1049-510X (Linking)
  8. VI  - 23
  9. IP  - 1
  10. DP  - 2013 Winter
  11. TI  - Depression and type 2 diabetes among Alaska Native primary care patients.
  12. PG  - 56-64
  13. AB  - OBJECTIVES: To assess whether type 2 diabetes mellitus (DM2) and DM2
  14.       complications are associated with presence and severity of depression among
  15.       Alaska Native and American Indian people (AN/Als). DESIGN: Retrospective,
  16.       cross-sectional analysis of medical records. SETTING: Southcentral Foundation
  17.       Primary Care Center (SCF-PCC) in Anchorage, Alaska. PARTICIPANTS: Total of 23,529
  18.       AN/AI adults. PRIMARY OUTCOME MEASURES: Patient Health Questionnaire (PHQ) scores
  19.       (0-9 negative, 10-14 mild, 15-19 moderate, 20+ severe) and DSM-IV depression
  20.       diagnosis. RESULTS: DM2 prevalence was 6% (n=1,526). Of those with DM2, 19% (n =
  21.       292) had one or more DM2 complications and average HbAlc was 7.1%. Prevalence of
  22.       depression diagnosis was similar between AN/Als with and without DM2 (P = .124).
  23.       Among those screened for depression (n = 12,280), there were similar rates of PHQ
  24.       severity between those without and with DM2; respectively 4% (n = 452) vs 4% (n =
  25.       42) mild, 4% (n = 404) vs 3% (n = 29) moderate, and 4% (n = 354) vs 4% (n = 38)
  26.       severe. In multivariable logistic regression, DM2 was not associated with PHQ
  27.       severity (OR 1.02, 95% CI 0.81-1.27) or depression diagnosis (OR 1.27, 95% CI
  28.       1.00-1.62). Increased odds of depression and higher depression severity were
  29.       associated with female sex, younger age, being unmarried, substance
  30.       abuse/dependence, and increased ambulatory visits. Depression was associated with
  31.       number of other chronic conditions among AN/Als with DM2 but not with number of
  32.       complications. CONCLUSIONS: Presence and severity of depression among AN/Al
  33.       primary care patients was not significantly associated with DM2 nor DM2
  34.       complications, despite a slightly higher rate of depression diagnosis among those
  35.       with DM2.
  36. AD  - Research Department, Southcentral Foundation, Anchorage, Alaska 99508, USA.
  37.       ddillard@scf.cc
  38. FAU - Dillard, Denise A
  39. AU  - Dillard DA
  40. FAU - Robinson, Renee F
  41. AU  - Robinson RF
  42. FAU - Smith, Julia J
  43. AU  - Smith JJ
  44. FAU - Khan, Burhan A
  45. AU  - Khan BA
  46. FAU - Dubois, Edward W
  47. AU  - Dubois EW
  48. FAU - Mau, Marjorie K
  49. AU  - Mau MK
  50. LA  - eng
  51. GR  - P20 MD000173/MD/NIMHD NIH HHS/United States
  52. PT  - Journal Article
  53. PT  - Research Support, N.I.H., Extramural
  54. PL  - United States
  55. TA  - Ethn Dis
  56. JT  - Ethnicity & disease
  57. JID - 9109034
  58. SB  - IM
  59. MH  - Adolescent
  60. MH  - Adult
  61. MH  - Alaska
  62. MH  - Depression/epidemiology/*ethnology
  63. MH  - Diabetes Mellitus, Type 2/*complications/epidemiology/*ethnology
  64. MH  - Female
  65. MH  - Humans
  66. MH  - *Indians, North American
  67. MH  - Logistic Models
  68. MH  - Male
  69. MH  - Middle Aged
  70. MH  - Primary Health Care
  71. MH  - Young Adult
  72. EDAT- 2013/03/19 06:00
  73. MHDA- 2013/04/12 06:00
  74. CRDT- 2013/03/19 06:00
  75. PST - ppublish
  76. SO  - Ethn Dis. 2013 Winter;23(1):56-64.

  77. PMID- 22089223
  78. OWN - NLM
  79. STAT- MEDLINE
  80. DA  - 20111117
  81. DCOM- 20120319
  82. IS  - 1760-4788 (Electronic)
  83. IS  - 1279-7707 (Linking)
  84. VI  - 15
  85. IP  - 9
  86. DP  - 2011 Nov
  87. TI  - Older people with diabetes have higher risk of depression, cognitive and
  88.       functional impairments: implications for diabetes services.
  89. PG  - 751-5
  90. AB  - OBJECTIVES: To examine the relationship between diabetes and impairments in
  91.       functional and cognitive status as well as depression in older people. DESIGN:
  92.       Cross-sectional study. SETTING: Elderly Health Centres (EHC) in Hong Kong.
  93.       PARTICIPANTS: 66,813 older people receiving baseline assessment at EHC in 1998 to
  94.       2001. MEASUREMENTS: Diabetes status was defined by self-report and blood glucose
  95.       tests. Functional status was assessed by 5 items of instrumental activities of
  96.       daily living (IADL) and 7 items of activities of daily living (ADL). Cognitive
  97.       status was screened by the Abbreviated Mental Test-Hong Kong version (AMT).
  98.       Depressive symptoms were screened by the Geriatric Depression Scale-Chinese
  99.       version (GDS). RESULTS: Among the subjects, 10.4% reported having regular
  100.       treatment for diabetes, 3.4% had diabetes but were not receiving regular
  101.       treatment, and 86.2% did not have diabetes. After controlling for age, sex and
  102.       education level, those having regular treatment for diabetes were 1.7 times more
  103.       likely (OR=1.65, 95% CI: 1.51-1.80) to have functional impairment, 1.3 times more
  104.       likely (OR=1.28, 95% CI: 1.11-1.48) to have cognitive impairment and 1.3 times
  105.       more likely (OR=1.35, 95% CI: 1.25-1.46) to have depression, than older people
  106.       without diabetes. CONCLUSION: Older people with diabetes may be less capable of
  107.       managing the disease than the younger ones as a result of increased risk of both
  108.       physical and cognitive impairment. This study provided further evidence for the
  109.       need of an international consensus statement regarding care of diabetes in older
  110.       people.
  111. AD  - Faculty of Social Sciences, The University of Hong Kong, Hong Kong.
  112.       phchau@graduate.hku.hk
  113. FAU - Chau, P H
  114. AU  - Chau PH
  115. FAU - Woo, J
  116. AU  - Woo J
  117. FAU - Lee, C H
  118. AU  - Lee CH
  119. FAU - Cheung, W L
  120. AU  - Cheung WL
  121. FAU - Chen, J
  122. AU  - Chen J
  123. FAU - Chan, W M
  124. AU  - Chan WM
  125. FAU - Hui, L
  126. AU  - Hui L
  127. FAU - McGhee, S M
  128. AU  - McGhee SM
  129. LA  - eng
  130. PT  - Journal Article
  131. PT  - Research Support, Non-U.S. Gov't
  132. PL  - France
  133. TA  - J Nutr Health Aging
  134. JT  - The journal of nutrition, health & aging
  135. JID - 100893366
  136. SB  - IM
  137. MH  - Activities of Daily Living/psychology
  138. MH  - Aged
  139. MH  - Aged, 80 and over
  140. MH  - Cognition Disorders/*epidemiology/psychology
  141. MH  - Cross-Sectional Studies
  142. MH  - Depression/*epidemiology/psychology
  143. MH  - Diabetes Mellitus/*epidemiology/psychology
  144. MH  - Educational Status
  145. MH  - Female
  146. MH  - Geriatric Assessment/*statistics & numerical data
  147. MH  - Hong Kong/epidemiology
  148. MH  - Humans
  149. MH  - Logistic Models
  150. MH  - Male
  151. MH  - Prevalence
  152. MH  - Risk Factors
  153. EDAT- 2011/11/18 06:00
  154. MHDA- 2012/03/20 06:00
  155. CRDT- 2011/11/18 06:00
  156. PST - ppublish
  157. SO  - J Nutr Health Aging. 2011 Nov;15(9):751-5.

  158. PMID- 21357362
  159. OWN - NLM
  160. STAT- MEDLINE
  161. DA  - 20110301
  162. DCOM- 20110608
  163. LR  - 20130630
  164. IS  - 1935-5548 (Electronic)
  165. IS  - 0149-5992 (Linking)
  166. VI  - 34
  167. IP  - 3
  168. DP  - 2011 Mar
  169. TI  - Prevalence of depression in individuals with impaired glucose metabolism or
  170.       undiagnosed diabetes: a systematic review and meta-analysis of the European
  171.       Depression in Diabetes (EDID) Research Consortium.
  172. PG  - 752-62
  173. LID - 10.2337/dc10-1414 [doi]
  174. AB  - OBJECTIVE: Meta-analyses have shown that the risk for depression is elevated in
  175.       type 2 diabetes. Whether this risk in individuals with impaired glucose
  176.       metabolism (IGM) or undiagnosed diabetes (UDD) is elevated relative to normal
  177.       glucose metabolism (NGM) or decreased relative to previously diagnosed type 2
  178.       diabetes (PDD) has not been the subject of a systematic review/meta-analysis.
  179.       This study examined the prevalence of depression in IGM and UDD subjects relative
  180.       to each other and to NGM and PDD subjects by reviewing the literature and
  181.       conducting a meta-analysis of studies on this topic. RESEARCH DESIGN AND METHODS:
  182.       EMBASE and MEDLINE databases were searched for articles published up to May 2010.
  183.       All studies that compared the prevalence of depression in subjects with IGM and
  184.       UDD were included. Odds ratios (ORs) were calculated using fixed and
  185.       random-effects models. RESULTS: The meta-analysis showed that the risk for
  186.       depression was not increased in IGM versus NGM subjects (OR 0.96, 95% CI
  187.       0.85-1.08). Risk for depression did not differ between individuals with UDD and
  188.       individuals with either NGM (OR 0.94, 95% CI 0.71-1.25) or IGM (OR 1.16, 95% CI
  189.       0.88-1.54). Finally, individuals with IGM or UDD both had a significantly lower
  190.       risk of depression than individuals with PDD (OR 0.59, 95% CI 0.48-0.73, and OR
  191.       0.57, 95% CI 0.45-0.74, respectively). CONCLUSIONS: Results of this meta-analysis
  192.       show that the risk of depression is similar for NGM, IGM, and UDD subjects. PDD
  193.       subjects have an increased risk of depression relative to IGM and UDD subjects.
  194. AD  - School of Psychology, University of Birmingham, Birmingham, UK. f.pouwer@uvt.nl
  195. FAU - Nouwen, Arie
  196. AU  - Nouwen A
  197. FAU - Nefs, Giesje
  198. AU  - Nefs G
  199. FAU - Caramlau, Isabela
  200. AU  - Caramlau I
  201. FAU - Connock, Martin
  202. AU  - Connock M
  203. FAU - Winkley, Kirsty
  204. AU  - Winkley K
  205. FAU - Lloyd, Cathy E
  206. AU  - Lloyd CE
  207. FAU - Peyrot, Mark
  208. AU  - Peyrot M
  209. FAU - Pouwer, Francois
  210. AU  - Pouwer F
  211. CN  - European Depression in Diabetes Research Consortium
  212. LA  - eng
  213. PT  - Journal Article
  214. PT  - Meta-Analysis
  215. PT  - Review
  216. PL  - United States
  217. TA  - Diabetes Care
  218. JT  - Diabetes care
  219. JID - 7805975
  220. RN  - 50-99-7 (Glucose)
  221. SB  - IM
  222. MH  - Depression/*epidemiology
  223. MH  - Diabetes Mellitus/*diagnosis/*psychology
  224. MH  - Glucose/metabolism
  225. MH  - Glucose Intolerance/*psychology
  226. MH  - Humans
  227. PMC - PMC3041222
  228. OID - NLM: PMC3041222
  229. EDAT- 2011/03/02 06:00
  230. MHDA- 2011/06/09 06:00
  231. CRDT- 2011/03/02 06:00
  232. AID - 34/3/752 [pii]
  233. AID - 10.2337/dc10-1414 [doi]
  234. PST - ppublish
  235. SO  - Diabetes Care. 2011 Mar;34(3):752-62. doi: 10.2337/dc10-1414.
复制代码



关键词:Medline infile Input line file 文章 如何

本帖被以下文库推荐

欢迎加入亚太地区第一R&Python数据挖掘群: 251548215;

沙发
hopewell 发表于 2013-8-1 21:37:52
  1. data Ben(drop=key);
  2.     length KEY $5 ID 8 PMID $20 TI $200 AB $3000 MH $500;
  3.     retain key pmid ti ab mh;
  4.     infile "c:\medline.txt" pad end=last;
  5.     input @;
  6.     key=coalescec(substr(_infile_,1,5),key);
  7.     _infile_=substr(_infile_,7);
  8.     select(key);
  9.         when('PMID-') do;
  10.             if id ge 1 then do;
  11.                 output;
  12.                 call missing(pmid,ti,ab,mh);
  13.             end;
  14.             id+1; pmid=_infile_;
  15.         end;
  16.         when('TI  -') ti=_infile_;
  17.         when('AB  -') ab=catx(' ',ab,_infile_);
  18.         when('MH  -') mh=catx('||',mh,_infile_);
  19.         otherwise;
  20.     end;
  21.     if last then output;
  22. run;
复制代码
已有 3 人评分经验 论坛币 学术水平 热心指数 信用等级 收起 理由
Tigflanker + 5 + 3 + 3 + 3 总有看不明白的地方..
stata18 + 1 + 1 + 1 高手,基本功非常扎实。
webgu + 100 + 100 + 5 + 5 + 5 精彩帖子

总评分: 经验 + 100  论坛币 + 105  学术水平 + 9  热心指数 + 9  信用等级 + 9   查看全部评分

观钓颇逾垂钓趣 种花何问看花谁

藤椅
Imasasor 发表于 2013-8-1 22:13:18
楼主明显过虑了,没那么复杂
  1. data a;
  2. infile "E:medline.txt" pad truncover lrecl=400;
  3. input name $1-4 gang $5 content $ 7-400;
  4. run;

  5. data b;
  6. set a;
  7. if name="" and gang="" and content="" then delete;
  8. run;

  9. data c;
  10. set b;
  11. retain name1;
  12. if name^="" then name1=name;
  13. else name=name1;
  14. run;

  15. data d;
  16. set c(drop=name1);
  17. where name in ("PMID","TI","AB","MH");
  18. run;

  19. data e(drop=gang content);
  20. set d;
  21. informat text $3000.;
  22. format text $1000.;
  23. retain text;
  24. if first.name then text=content;
  25. else text=catx("||",text,content);
  26. by name notsorted;
  27. if last.name then output;
  28. run;

  29. data f;
  30. set e;
  31. retain num 0;
  32. if name="PMID" then num+1;
  33. run;

  34. proc transpose data=f out=g;
  35. var text;
  36. id name;
  37. by num;
  38. run;

  39. data h;
  40. set g;
  41. ab=tranwrd(ab,"||"," ");
  42. drop _name_;
  43. run;

  44. proc export data=h outfile="E:\medline.xls" replace;
  45. run;
复制代码
已有 1 人评分经验 论坛币 学术水平 热心指数 信用等级 收起 理由
webgu + 60 + 60 + 3 + 3 + 3 自力更生型的。

总评分: 经验 + 60  论坛币 + 60  学术水平 + 3  热心指数 + 3  信用等级 + 3   查看全部评分

欢迎加入亚太地区第一R&Python数据挖掘群: 251548215;

板凳
你的太阳 发表于 2013-8-1 23:05:46
可以考虑下用这个,程序中有些变量的长度可能需要调整,但你说的功能基本能够实现

filename medline "C:\medline.txt";
/*read the file*/
data a;
infile medline truncover;
informat raw $200.;
input raw;
raw = _infile_;
N = _n_;
run;
data b;
set a;
/*processing for PMID*/
informat pmid $30.;
retain PMID ;
if index(upcase(raw),'PMID-') then PMID = scan(raw,2,'-');
/*processing for description, may need to update the length of desc*/
informat desc $5000.;
retain desc " " flag1 0;
if index(upcase(raw),'AB  -') then do;
  desc = substr(raw,7);
  flag1 = 1;
end;
if index(upcase(raw),'AD  -') then flag1=0;
if desc ne " " and substr(raw,1,6)='      ' and flag1=1 then  desc = strip(desc)||' '||strip(raw);
/*processing for TI*/
informat ti $500.;
retain ti " " flag2 0;
if index(upcase(raw),'TI  -') then do;
  TI = substr(raw,7);
  flag2 = 1;
end;
if index(upcase(raw),'AB  -') then flag2 = 0;
if ti ne " " and substr(raw,1,6)='      ' and flag2=1 then  ti = strip(ti)||' '||strip(raw);
/*processing for MH*/
/* the split character is |, you can change the one that you want*/
informat mh $500.;
retain mh " " flag3 0;
if index(upcase(raw),'SB  -') then do;
  mh = "Flag";
  flag3 = 1;
end;
if index(upcase(raw),'EDAT-') then flag3 = 0;
if mh ne " " and flag3=1 and index(raw,'MH  -') then mh = strip(mh)||'|'||strip(substr(raw,7));

run;
proc sort data = b(where=(pmid ne ' '));
by pmid n;
run;
data c;
set b;
by pmid n;
if last.pmid;
mh = substr(mh,6);
keep pmid desc mh ti;
run;

已有 1 人评分经验 论坛币 学术水平 热心指数 信用等级 收起 理由
webgu + 80 + 80 + 3 + 3 + 3 观点有启发

总评分: 经验 + 80  论坛币 + 80  学术水平 + 3  热心指数 + 3  信用等级 + 3   查看全部评分

报纸
webgu 发表于 2013-8-2 21:28:44
呵呵,想起3年前,2010年左右写的一个89行的代码专门处理这个问题,现在一看,真是弱暴了啊。
SAS资源
1. SAS 微信:StatsThinking
2. SAS QQ群:348941365

地板
henryyhl 发表于 2013-8-2 21:37:13
学习了。收藏下。以备不时之需,谢谢各位
It's not going to be easy, but it is going to be worth it.

7
Imasasor 发表于 2013-8-3 11:12:41
webgu 发表于 2013-8-2 21:28
呵呵,想起3年前,2010年左右写的一个89行的代码专门处理这个问题,现在一看,真是弱暴了啊。
版主大人也搞过这个东西?
欢迎加入亚太地区第一R&Python数据挖掘群: 251548215;

8
jolterheadmmtt 发表于 2013-8-4 14:09:29
真是学到不少

9
jolterheadmmtt 发表于 2013-8-4 14:11:04
!!!收藏

10
liu5355776 发表于 2014-3-1 09:29:27
good

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2025-12-6 07:51