人大经济论坛 › 论坛 › 计量经济学与统计论坛五区 › 计量经济学与统计软件 › winbugs及其他软件专版 › [Text Mining]TextBlob: Simplified Text Processing

发帖

楼主: Lisrelchen

1820 15

[Text Mining]TextBlob: Simplified Text Processing [推广有奖]

0关注
62粉丝

VIP

已卖：4196份资源

院士

67%

还不是VIP/贵宾

TA的文库 其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望: 0 级
论坛币: 50294 个
通用积分: 83.8106
学术水平: 253 点
热心指数: 300 点
信用等级: 208 点
经验: 41518 点
帖子: 3256
精华: 14
在线时间: 766 小时
注册时间: 2006-5-4
最后登录: 2022-11-6

楼主

Lisrelchen 发表于 2017-7-6 23:16:31 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

TextBlob: Simplified Text Processing
Homepage: https://textblob.readthedocs.io/
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
from textblob import TextBlobtext = '''The titular threat of The Blob has always struck me as the ultimate moviemonster: an insatiably hungry, amoeba-like mass able to penetratevirtually any safeguard, capable of--as a doomed doctor chillinglydescribes it--"assimilating flesh on contact.Snide comparisons to gelatin be damned, it's a concept with the mostdevastating of potential consequences, not unlike the grey goo scenarioproposed by technological theorists fearful ofartificial intelligence run rampant.'''blob = TextBlob(text)blob.tags # [('The', 'DT'), ('titular', 'JJ'), blob.noun_phrases # WordList(['titular threat', 'blob', # 'ultimate movie monster', # 'amoeba-like mass', ...])for sentence in blob.sentences: print(sentence.sentiment.polarity)# 0.060# -0.341blob.translate(to="es") # 'La amenaza titular de The Blob...'

TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.
FeaturesNoun phrase extraction
Part-of-speech tagging
Sentiment analysis
Classification (Naive Bayes, Decision Tree)
Language translation and detection powered by Google Translate
Tokenization (splitting text into words and sentences)
Word and phrase frequencies
Parsing
n-grams
Word inflection (pluralization and singularization) and lemmatization
Spelling correction
Add new models or languages through extensions
WordNet integration

Get it now$ pip install -U textblob$ python -m textblob.download_corporaExamplesSee more examples at the Quickstart guide.
DocumentationFull documentation is available at https://textblob.readthedocs.io/.
RequirementsPython >= 2.7 or >= 3.4

Project LinksDocs: https://textblob.readthedocs.io/
Changelog: https://textblob.readthedocs.io/en/latest/changelog.html
PyPI: https://pypi.python.org/pypi/TextBlob
Issues: https://github.com/sloria/TextBlob/issues

LicenseMIT licensed. See the bundled LICENSE file for more details.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏1 回帖

关键词：Text Mining Processing simplified processI Process

本帖被以下文库推荐

· Text Mining NewOccidental|主题: 213, 订阅: 43

沙发

Lisrelchen 发表于 2017-7-6 23:17:26

Create a TextBlob
First, the import.
>>> from textblob import TextBlob
Let’s create our first TextBlob.
>>> wiki = TextBlob("Python is a high-level, general-purpose programming language.

复制代码

藤椅

Lisrelchen 发表于 2017-7-6 23:17:48

Part-of-speech Tagging
Part-of-speech tags can be accessed through the tags property.
>>> wiki.tags
[('Python', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('high-level', 'JJ'), ('general-purpose', 'JJ'), ('programming', 'NN'), ('language', 'NN')]

复制代码

板凳

Lisrelchen 发表于 2017-7-6 23:18:17

Noun Phrase Extraction
Similarly, noun phrases are accessed through the noun_phrases property.
>>> wiki.noun_phrases
WordList(['python'])

复制代码

报纸

MouJack007 发表于 2017-7-6 23:43:32

谢谢楼主分享！

地板

MouJack007 发表于 2017-7-6 23:43:49

7楼

Lisrelchen 发表于 2017-7-7 00:00:46

Words Inflection and Lemmatization
Each word in TextBlob.words or Sentence.words is a Word object (a subclass of unicode) with useful methods, e.g. for word inflection.
>>> sentence = TextBlob('Use 4 spaces per indentation level.')
>>> sentence.words
WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level'])
>>> sentence.words[2].singularize()
'space'
>>> sentence.words[-1].pluralize()
'levels'

复制代码

8楼

Lisrelchen 发表于 2017-7-7 00:01:39

WordNet Integration
You can access the synsets for a Word via the synsets property or the get_synsets method, optionally passing in a part of speech.
>>> from textblob import Word
>>> from textblob.wordnet import VERB
>>> word = Word("octopus")
>>> word.synsets
[Synset('octopus.n.01'), Synset('octopus.n.02')]
>>> Word("hack").get_synsets(pos=VERB)
[Synset('chop.v.05'), Synset('hack.v.02'), Synset('hack.v.03'), Synset('hack.v.04'), Synset('hack.v.05'), Synset('hack.v.06'), Synset('hack.v.07'), Synset('hack.v.08')]

复制代码

9楼

Lisrelchen 发表于 2017-7-7 00:03:20

WordLists
A WordList is just a Python list with additional methods.
>>> animals = TextBlob("cat dog octopus")
>>> animals.words
WordList(['cat', 'dog', 'octopus'])
>>> animals.words.pluralize()
WordList(['cats', 'dogs', 'octopodes'])

复制代码

10楼

Lisrelchen 发表于 2017-7-7 00:04:10

Spelling Correction
Use the correct() method to attempt spelling correction.
>>> b = TextBlob("I havv goood speling!")
>>> print(b.correct())
I have good spelling!
Word objects have a spellcheck() Word.spellcheck() method that returns a list of (word, confidence) tuples with spelling suggestions.
>>> from textblob import Word
>>> w = Word('falibility')
>>> w.spellcheck()
[('fallibility', 1.0)]

复制代码

返回列表

12 下一页

发帖

本版微信群

加好友,备注jltj
拉您入交流群

京ICP备16021002号-2 京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明

[Text Mining]TextBlob: Simplified Text Processing [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

[Text Mining]TextBlob: Simplified Text Processing [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群