楼主: Lisrelchen
1794 15

[Text Mining]TextBlob: Simplified Text Processing [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50288 个
通用积分
83.6306
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2017-7-6 23:16:31 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
TextBlob: Simplified Text Processing

Homepage: https://textblob.readthedocs.io/

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

from textblob import TextBlobtext = '''The titular threat of The Blob has always struck me as the ultimate moviemonster: an insatiably hungry, amoeba-like mass able to penetratevirtually any safeguard, capable of--as a doomed doctor chillinglydescribes it--"assimilating flesh on contact.Snide comparisons to gelatin be damned, it's a concept with the mostdevastating of potential consequences, not unlike the grey goo scenarioproposed by technological theorists fearful ofartificial intelligence run rampant.'''blob = TextBlob(text)blob.tags           # [('The', 'DT'), ('titular', 'JJ'),                    blob.noun_phrases   # WordList(['titular threat', 'blob',                    #            'ultimate movie monster',                    #            'amoeba-like mass', ...])for sentence in blob.sentences:    print(sentence.sentiment.polarity)# 0.060# -0.341blob.translate(to="es")  # 'La amenaza titular de The Blob...'

TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both.

Features
  • Noun phrase extraction
  • Part-of-speech tagging
  • Sentiment analysis
  • Classification (Naive Bayes, Decision Tree)
  • Language translation and detection powered by Google Translate
  • Tokenization (splitting text into words and sentences)
  • Word and phrase frequencies
  • Parsing
  • n-grams
  • Word inflection (pluralization and singularization) and lemmatization
  • Spelling correction
  • Add new models or languages through extensions
  • WordNet integration
Get it now$ pip install -U textblob$ python -m textblob.download_corporaExamples

See more examples at the Quickstart guide.

Documentation

Full documentation is available at https://textblob.readthedocs.io/.

Requirements
  • Python >= 2.7 or >= 3.4
Project LinksLicense

MIT licensed. See the bundled LICENSE file for more details.


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Text Mining Processing simplified processI Process

本帖被以下文库推荐

沙发
Lisrelchen 发表于 2017-7-6 23:17:26
  1. Create a TextBlob
  2. First, the import.

  3. >>> from textblob import TextBlob
  4. Let’s create our first TextBlob.

  5. >>> wiki = TextBlob("Python is a high-level, general-purpose programming language.
复制代码

藤椅
Lisrelchen 发表于 2017-7-6 23:17:48
  1. Part-of-speech Tagging
  2. Part-of-speech tags can be accessed through the tags property.

  3. >>> wiki.tags
  4. [('Python', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('high-level', 'JJ'), ('general-purpose', 'JJ'), ('programming', 'NN'), ('language', 'NN')]
复制代码

板凳
Lisrelchen 发表于 2017-7-6 23:18:17
  1. Noun Phrase Extraction
  2. Similarly, noun phrases are accessed through the noun_phrases property.

  3. >>> wiki.noun_phrases
  4. WordList(['python'])
复制代码

报纸
MouJack007 发表于 2017-7-6 23:43:32
谢谢楼主分享!

地板
MouJack007 发表于 2017-7-6 23:43:49

7
Lisrelchen 发表于 2017-7-7 00:00:46
  1. Words Inflection and Lemmatization
  2. Each word in TextBlob.words or Sentence.words is a Word object (a subclass of unicode) with useful methods, e.g. for word inflection.

  3. >>> sentence = TextBlob('Use 4 spaces per indentation level.')
  4. >>> sentence.words
  5. WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level'])
  6. >>> sentence.words[2].singularize()
  7. 'space'
  8. >>> sentence.words[-1].pluralize()
  9. 'levels'
复制代码

8
Lisrelchen 发表于 2017-7-7 00:01:39
  1. WordNet Integration
  2. You can access the synsets for a Word via the synsets property or the get_synsets method, optionally passing in a part of speech.

  3. >>> from textblob import Word
  4. >>> from textblob.wordnet import VERB
  5. >>> word = Word("octopus")
  6. >>> word.synsets
  7. [Synset('octopus.n.01'), Synset('octopus.n.02')]
  8. >>> Word("hack").get_synsets(pos=VERB)
  9. [Synset('chop.v.05'), Synset('hack.v.02'), Synset('hack.v.03'), Synset('hack.v.04'), Synset('hack.v.05'), Synset('hack.v.06'), Synset('hack.v.07'), Synset('hack.v.08')]
复制代码

9
Lisrelchen 发表于 2017-7-7 00:03:20
  1. WordLists
  2. A WordList is just a Python list with additional methods.

  3. >>> animals = TextBlob("cat dog octopus")
  4. >>> animals.words
  5. WordList(['cat', 'dog', 'octopus'])
  6. >>> animals.words.pluralize()
  7. WordList(['cats', 'dogs', 'octopodes'])
复制代码

10
Lisrelchen 发表于 2017-7-7 00:04:10
  1. Spelling Correction
  2. Use the correct() method to attempt spelling correction.

  3. >>> b = TextBlob("I havv goood speling!")
  4. >>> print(b.correct())
  5. I have good spelling!
  6. Word objects have a spellcheck() Word.spellcheck() method that returns a list of (word, confidence) tuples with spelling suggestions.

  7. >>> from textblob import Word
  8. >>> w = Word('falibility')
  9. >>> w.spellcheck()
  10. [('fallibility', 1.0)]
复制代码

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-2 17:56