- 阅读权限
- 255
- 威望
- 0 级
- 论坛币
- 13076 个
- 通用积分
- 2.3774
- 学术水平
- 2 点
- 热心指数
- 5 点
- 信用等级
- 2 点
- 经验
- 2841 点
- 帖子
- 189
- 精华
- 0
- 在线时间
- 194 小时
- 注册时间
- 2010-6-7
- 最后登录
- 2016-4-1
|
13楼
jjtww
发表于 2013-12-28 15:06:38
看到SAS处理这种底层问题时显得多么的复杂,用Python就简单的多:
- def Distinct(str,w={}):
- token=w
- for s in str:
- for i in range(len(s)):
- if s[i] in token.keys():
- token[s[i]]+=1
- else:
- token[s[i]]=1
- return token
- str=['cbba','wcc','rrtw','45rwe','afsaf','uityikj','fdasfo','!@#$%']
- for i in range(len(str)):
- w={}
- w=Distinct(str)
- for key in w:
- w[key]=0
- print Distinct(str[i],w)
复制代码
输出:
- {'!': 0, '#': 0, '%': 0, '
- : 0, '5': 0, '4': 0, '@': 0, 'a': 1, 'c': 1, 'b': 2, 'e': 0, 'd': 0, 'f': 0, 'i': 0, 'k': 0, 'j': 0, 'o': 0, 's': 0, 'r': 0, 'u': 0, 't': 0, 'w': 0, 'y': 0}
- {'!': 0, '#': 0, '%': 0, '
- : 0, '5': 0, '4': 0, '@': 0, 'a': 0, 'c': 2, 'b': 0, 'e': 0, 'd': 0, 'f': 0, 'i': 0, 'k': 0, 'j': 0, 'o': 0, 's': 0, 'r': 0, 'u': 0, 't': 0, 'w': 1, 'y': 0}
- {'!': 0, '#': 0, '%': 0, '
- : 0, '5': 0, '4': 0, '@': 0, 'a': 0, 'c': 0, 'b': 0, 'e': 0, 'd': 0, 'f': 0, 'i': 0, 'k': 0, 'j': 0, 'o': 0, 's': 0, 'r': 2, 'u': 0, 't': 1, 'w': 1, 'y': 0}
- {'!': 0, '#': 0, '%': 0, '
- : 0, '5': 1, '4': 1, '@': 0, 'a': 0, 'c': 0, 'b': 0, 'e': 1, 'd': 0, 'f': 0, 'i': 0, 'k': 0, 'j': 0, 'o': 0, 's': 0, 'r': 1, 'u': 0, 't': 0, 'w': 1, 'y': 0}
- {'!': 0, '#': 0, '%': 0, '
- : 0, '5': 0, '4': 0, '@': 0, 'a': 2, 'c': 0, 'b': 0, 'e': 0, 'd': 0, 'f': 2, 'i': 0, 'k': 0, 'j': 0, 'o': 0, 's': 1, 'r': 0, 'u': 0, 't': 0, 'w': 0, 'y': 0}
- {'!': 0, '#': 0, '%': 0, '
- : 0, '5': 0, '4': 0, '@': 0, 'a': 0, 'c': 0, 'b': 0, 'e': 0, 'd': 0, 'f': 0, 'i': 2, 'k': 1, 'j': 1, 'o': 0, 's': 0, 'r': 0, 'u': 1, 't': 1, 'w': 0, 'y': 1}
- {'!': 0, '#': 0, '%': 0, '
- : 0, '5': 0, '4': 0, '@': 0, 'a': 1, 'c': 0, 'b': 0, 'e': 0, 'd': 1, 'f': 2, 'i': 0, 'k': 0, 'j': 0, 'o': 1, 's': 1, 'r': 0, 'u': 0, 't': 0, 'w': 0, 'y': 0}
- {'!': 1, '#': 1, '%': 1, '
- : 1, '5': 0, '4': 0, '@': 1, 'a': 0, 'c': 0, 'b': 0, 'e': 0, 'd': 0, 'f': 0, 'i': 0, 'k': 0, 'j': 0, 'o': 0, 's': 0, 'r': 0, 'u': 0, 't': 0, 'w': 0, 'y': 0}
复制代码
下面可以用Distinct这个函数来统计一篇英文小说中各字符出现的频数。
这里简单测试选取了羊脂球-BALL-OF-FAT第一章,第一段话:
- str2='For many days now the fag-end of the army had been straggling through the town.They were not troops,\
- but a disbanded horde.The beards of the men were long and filthy,their uniforms in tatters,and they advanced \
- at an easy pace without flag or regiment.All seemed worn-out and back-broken,incapable of a thought or a \
- resolution,marching by habit solely, and falling from fatigue as soon as they stopped.In short,they were \
- a mobilized,pacific people,bending under the weight of the gun;some little squads on the alert,easy to \
- take alarm and prompt in enthusiasm,ready to attack or to flee;and in the midst of them,some red \
- breeches,the remains of a division broken up in a great battle;some somber artillery men in line with \
- these varied kinds of foot soldiers;and,sometimes the brilliant helmet of a dragoon on foot who followed \
- with difficulty the shortest march of the lines.';
- print Distinct(str2)
复制代码
输出:
- {'!': 1, ' ': 138, '#': 1, '%': 1, ': 1, '-': 3, '\xac': 14, '\xae': 5, '5': 0, '4': 0, '\xbb': 4, 'A': 1, '@': 1, 'F': 1, 'I': 1, '\xa3': 23, 'T': 2, 'a': 58, 'c': 11, 'b': 16, 'e': 87, 'd': 33, 'g': 17, 'f': 23, 'i': 44, 'h': 40, 'k': 6, 'j': 0, 'm': 24, 'l': 31, 'o': 61, 'n': 50, 'q': 1, 'p': 11, 's': 38, 'r': 42, 'u': 14, 't': 67, 'w': 12, 'v': 3, 'y': 15, 'z': 1}
复制代码
上面可以看出,文章中出现最多字符的是空格,有138个,其次是字母e,有87个。
|
|