楼主: Lisrelchen
1795 15

[Text Mining]TextBlob: Simplified Text Processing [推广有奖]

11
Lisrelchen 发表于 2017-7-7 00:04:44
  1. Get Word and Noun Phrase Frequencies
  2. There are two ways to get the frequency of a word or noun phrase in a TextBlob.

  3. The first is through the word_counts dictionary.

  4. >>> monty = TextBlob("We are no longer the Knights who say Ni. "
  5. ...                     "We are now the Knights who say Ekki ekki ekki PTANG.")
  6. >>> monty.word_counts['ekki']
复制代码

12
Lisrelchen 发表于 2017-7-7 00:06:06
  1. Translation and Language Detection
  2. New in version 0.5.0.

  3. TextBlobs can be translated between languages.

  4. >>> en_blob = TextBlob(u'Simple is better than complex.')
  5. >>> en_blob.translate(to='es')
  6. TextBlob("Simple es mejor que complejo.")
  7. If no source language is specified, TextBlob will attempt to detect the language. You can specify the source language explicitly, like so. Raises TranslatorError if the TextBlob cannot be translated into the requested language or NotTranslated if the translated result is the same as the input string.

  8. >>> chinese_blob = TextBlob(u"美丽优于丑陋")
  9. >>> chinese_blob.translate(from_lang="zh-CN", to='en')
  10. TextBlob("Beauty is better than ugly")
  11. You can also attempt to detect a TextBlob’s language using TextBlob.detect_language().

  12. >>> b = TextBlob(u"بسيط هو أفضل من مجمع")
  13. >>> b.detect_language()
  14. 'ar'
复制代码

13
Lisrelchen 发表于 2017-7-7 00:18:08
  1. Parsing
  2. Use the parse() method to parse the text.

  3. >>> b = TextBlob("And now for something completely different.")
  4. >>> print(b.parse())
  5. And/CC/O/O now/RB/B-ADVP/O for/IN/B-PP/B-PNP something/NN/B-NP/I-PNP completely/RB/B-ADJP/O different/JJ/I-ADJP/O ././O/O
复制代码

14
ReneeBK 发表于 2017-7-7 00:19:55
  1. n-grams
  2. The TextBlob.ngrams() method returns a list of tuples of n successive words.

  3. >>> blob = TextBlob("Now is better than never.")
  4. >>> blob.ngrams(n=3)
  5. [WordList(['Now', 'is', 'better']), WordList(['is', 'better', 'than']), WordList(['better', 'than', 'never'])]
复制代码

15
ReneeBK 发表于 2017-7-7 00:20:38
  1. Get Start and End Indices of Sentences
  2. Use sentence.start and sentence.end to get the indices where a sentence starts and ends within a TextBlob.

  3. >>> for s in zen.sentences:
  4. ...     print(s)
  5. ...     print("---- Starts at index {}, Ends at index {}".format(s.start, s.end))
  6. Beautiful is better than ugly.
  7. ---- Starts at index 0, Ends at index 30
  8. Explicit is better than implicit.
  9. ---- Starts at index 31, Ends at index 64
  10. Simple is better than complex.
  11. ---- Starts at index 65, Ends at index 95
复制代码

16
钱学森64 发表于 2017-7-7 01:10:01
谢谢分享

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-2 20:07