楼主: Imasasor
5689 8

[原创博文] 坐等高手:如何统计一篇文章中各字母的出现频率 [推广有奖]

  • 1关注
  • 63粉丝

VIP

学科带头人

31%

还不是VIP/贵宾

-

TA的文库  其他...

超哥喜欢的文章

威望
1
论坛币
47129 个
学术水平
237 点
热心指数
246 点
信用等级
231 点
经验
31640 点
帖子
866
精华
3
在线时间
2170 小时
注册时间
2012-7-4
最后登录
2019-3-18

初级学术勋章 初级热心勋章 初级信用勋章 中级热心勋章 中级学术勋章

楼主
Imasasor 发表于 2012-9-5 12:40:43 |只看作者 |倒序
10论坛币
本帖最后由 wanghaidong918 于 2013-1-13 03:48 编辑

如下面一篇英文文章:
It is Teacher's Day today. On this special occasion I would like to extend my heartfelt congratulations to all teachers, "Happy Teacher's Day!"    Of all teachers who have taught me since my early childhood, the most unforgettable one is my first English teacher in college, Ms. Zhang. It is she who has aroused my keen interest in the learning of English and helped me realize the importance of self-reliance. Born into a poor farmer's family in a mountainous area and educated in relatively primitive surroundings, I found myself lagging far behind in the first class in college, which happened to be Ms. Zhang's English class. I was really discouraged and frustrated, so I decided to drop out. Ms. Zhang was so keenly insightful that she had noticed my embarrassment in class. After class, she called me into the Teacher's Room and discussed the situation with me, earnestly and kindly, citing the example of Robinson Crusoe to motivate me to go ahead in spite of all kinds of difficulties. "Be a man and rely on yourself," she nudged me. The next time we met, she brought me a simplified version of Robinson Crusoe and recommended that I finish reading it in a week and write a book report. Under her consistent and patient guidance, not only has my English been greatly improved, but my confidence and courage enhanced considerably.    "Rely on yourself and be a man," Ms. Zhang's inspiring words have been echoing in my mind. I will work harder and try my utmost to lay a solid foundation for my future career. Only by so doing can I repay Ms. Zhang's kindness and live up to her expectations of me, that is, to become a useful person and contribute to society.


其中有各种字符,如果想统计26个英文字母(不考虑大小写)在这篇文章中出现的次数和频率,sas是否有能力解决这个问题?如果可以,需要怎么做,如果高手愿意写程序,感激不尽,本人论坛币较少,就只悬赏10个了。主要是交流学习
第二个问题:能否统计在这里面出现的全部单词有哪些,每个单词出现了多少次?

最佳答案

ziyenano 查看完整内容

data ex; infile "e:\x.txt" delimiter='@' lrecl=2000; input x:$5000.; run; data ex1; set ex; array char(4) $ _temporary_ ('a','b','c','d'); do i=1 to dim(char); name=char(i); count=count(lowcase(x),compress(char(i))); output; end; drop x i; run; data ex2; infile "e:\x.txt" lrecl=2000; input x:$20. @@; x=compress(lowcase(x),'','p'); run; proc sql; create table ex3 as ...

回帖推荐

ziyenano 发表于3楼  查看完整内容

data ex; infile "e:\x.txt" delimiter='@' lrecl=2000; input x:$5000.; run; data ex1; set ex; array char(4) $ _temporary_ ('a','b','c','d'); do i=1 to dim(char); name=char(i); count=count(lowcase(x),compress(char(i))); output; end; drop x i; run; data ex2; infile "e:\x.txt" lrecl=2000; input x:$20. @@; x=compress(lowcase(x),'','p'); run; proc sql; create table ex3 as ...

本帖被以下文库推荐

stata SPSS
沙发
ziyenano 发表于 2012-9-5 12:40:44 |只看作者
本帖最后由 ziyenano 于 2012-9-5 15:44 编辑

data ex;
infile "e:\x.txt" delimiter='@' lrecl=2000;
input x:$5000.;
run;

data ex1;
set ex;
array char(4) $ _temporary_  ('a','b','c','d');
do i=1 to dim(char);
name=char(i);
count=count(lowcase(x),compress(char(i)));
output;
end;
drop x i;
run;


data ex2;
infile "e:\x.txt"  lrecl=2000;
input x:$20. @@;
x=compress(lowcase(x),'','p');
run;

proc sql;
create table ex3 as
select x,count(*) from ex2 group by x;
quit;
大体思想就是这样,有些细节可能还要加工一下

回复

使用道具 举报

藤椅
Imasasor 发表于 2012-9-5 16:40:42 |只看作者
ziyenano 发表于 2012-9-5 15:26
data ex;
infile "e:\x.txt" delimiter='@' lrecl=2000;
input x:$5000.;
高手,infile中的delimiter="@''是什么意思,貌似是强制读取下一行数据。可是如果数据弄到cards里面就不能用了,为什么,还有,即便是在txt里面,如果将数据变成多个段落,也不能保证将全部数据导入一个观测中,望高手指点
欢迎加入亚太地区第一R&Python数据挖掘群: 251548215;
回复

使用道具 举报

板凳
Imasasor 发表于 2012-9-5 16:50:32 |只看作者
ziyenano 发表于 2012-9-5 15:26
data ex;
infile "e:\x.txt" delimiter='@' lrecl=2000;
input x:$5000.;
另,高手,删除标点符号有没有办法只删除字符串开始和结尾的,而不删除中间的。
如student's 这个中间的标点符号不要删除
欢迎加入亚太地区第一R&Python数据挖掘群: 251548215;
回复

使用道具 举报

报纸
466046020 发表于 2012-9-5 17:01:33 |只看作者
本帖最后由 466046020 于 2012-9-5 17:16 编辑
  1. data test;
  2. input text & $260.;
  3. cards;
  4. It is Teacher's Day today. On this special occasion I would like to extend my heartfelt congratulations to all teachers, "Happy Teacher's Day!"
  5. Of all teachers who have taught me since my early childhood, the most unforgettable one is my first English teacher in college, Ms. Zhang.
  6. It is she who has aroused my keen interest in the learning of English and helped me realize the importance of self-reliance.
  7. Born into a poor farmer's family in a mountainous area and educated in relatively primitive surroundings, I found myself lagging far behind in the first class in college,
  8. which happened to be Ms. Zhang's English class. I was really discouraged and frustrated, so I decided to drop out. Ms. Zhang was so keenly insightful that she had noticed my embarrassment in class.
  9. After class, she called me into the Teacher's Room and discussed the situation with me, earnestly and kindly, citing the example of Robinson Crusoe to motivate me to go ahead in spite of all kinds of difficulties.
  10. "Be a man and rely on yourself," she nudged me. The next time we met, she brought me a simplified version of Robinson Crusoe and recommended that I finish reading it in a week and write a book report.
  11. Under her consistent and patient guidance, not only has my English been greatly improved, but my confidence and courage enhanced considerably. "Rely on yourself and be a man,"
  12. Ms. Zhang's inspiring words have been echoing in my mind. I will work harder and try my utmost to lay a solid foundation for my future career.
  13. Only by so doing can I repay Ms. Zhang's kindness and live up to her expectations of me, that is, to become a useful person and contribute to society.
  14. ;
  15. run;



  16. /*分别统计每个字母的频数*/
  17. %macro count_alpha(alpha);
  18. data count_α
  19. set test end=last;
  20. index=0;
  21. do until(index=0);
  22. index=find(text,"&alpha",'i',index+1);
  23. if index ne 0 then count+1;
  24. end;
  25. alpha="&alpha";
  26. if last then output;
  27. keep alpha count;
  28. run;
  29. %mend count_alpha;
  30. /*统计所有的字母数*/
  31. %macro count_all;
  32. data count_all;
  33. set test end=last;
  34. index=0;
  35. do until(index=0);
  36. index=anyalpha(text,index+1);
  37. if index ne 0 then count_all+1;
  38. end;
  39. if last then output;
  40. keep count_all;
  41. run;
  42. %mend count_all;
  43. /*开始计算输出*/
  44. %macro calculate;
  45. /*计算各字母出现频数*/
  46. %do i=65 %to 90;
  47. %count_alpha(%sysfunc(byte(&i)))
  48. %end;
  49. /*计算所有的字母数量*/
  50. %count_all
  51. /*计算出现频率*/
  52. data all_alpha;
  53. title "各字母频数、频率统计";
  54. file print;
  55. if _n_=1 then set count_all;
  56. set
  57. %do i=65 %to 90;
  58. count_%sysfunc(byte(&i))
  59. %end;
  60. ;
  61. rate=count/count_all;
  62. put "字母" alpha "的频数是:" count ",出现的频率为:" rate;
  63. run;
  64. proc datasets lib=work nolist;
  65. delete
  66. %do i=65 %to 90;
  67. count_%sysfunc(byte(&i))
  68. %end;
  69. ;
  70. quit;
  71. %mend calculate;


  72. %calculate
复制代码

自己随便想了想,实现的。程序生成了26个数据集,这是非常没必要的,但懒得去改了!
看了几个回复,发现还有很多可以优化,发现很多sas的函数根本就不知道。悲剧!

QQ截图20120905165411.jpg

回复

使用道具 举报

地板
ziyenano 发表于 2012-9-5 17:02:11 |只看作者
本帖最后由 ziyenano 于 2012-9-5 17:02 编辑

txt中数据是一行,以@为分隔符,将数据读到一个观测中;cards中应该是可以的,分隔符设置为@或者是其他的符号,只要是文本中没有的字符;
如果在txt中是多段的话,要读到一个观测中,可以用行指针。
回复

使用道具 举报

7
ziyenano 发表于 2012-9-5 17:04:44 |只看作者
Imasasor 发表于 2012-9-5 16:50
另,高手,删除标点符号有没有办法只删除字符串开始和结尾的,而不删除中间的。
如student's 这个中间的 ...
这些细节用正则处理一下。
回复

使用道具 举报

8
ziyenano 发表于 2012-9-5 17:08:12 |只看作者
本帖最后由 ziyenano 于 2012-9-5 17:08 编辑
466046020 发表于 2012-9-5 17:01
学习了
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 我要注册

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2019-3-22 19:45