[问答] 英文文本分块 [推广有奖]

1关注
2粉丝

已卖：4份资源

本科生

73%

还不是VIP/贵宾

-

0%

威望: 0 级
论坛币: 15 个
通用积分: 1.3500
学术水平: 9 点
热心指数: 9 点
信用等级: 8 点
经验: 1790 点
帖子: 121
精华: 0
在线时间: 99 小时
注册时间: 2010-7-20
最后登录: 2024-12-21

楼主

clay444 发表于 2018-1-2 23:46:17 |AI写论文

100论坛币

import nltk
text = "the little yellow dog barked at the cat."
sens = nltk.sent_tokenize(text)
words = [nltk.word_tokenize(sentence) for sentence in sens]
tags = [nltk.pos_tag(tokens) for tokens in words]
grammar = r"""
  NP: {<DT|PP\$>?<JJ>*<NN>}
   {<NNP>+}
"""
cp = nltk.RegexpParser(grammar)
result = cp.parse(tags[0])
print(result)

得到的结果是：
(S
  (NP the/DT little/JJ yellow/JJ dog/NN)
  barked/VBD
  at/IN
  (NP the/DT cat/NN)
  ./.)

这个结果对我来说没有用。我想得到的是['the little yellow dog','barked','at',' the cat'].因为我相做分块的词频分析。得到的Tree怎么可以转换？或者有没有别的办法实现？

分享0 收藏0 回帖

关键词：Sentence Grammar YELLOW little Result

[问答] 英文文本分块 [推广有奖]

相关帖子

浏览过的帖子

浏览过的版块

本版微信群