楼主: ReneeBK
2206 11

【GitHub】Python Natural Language Processing [推广有奖]

  • 1关注
  • 62粉丝

VIP

已卖:4897份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49635 个
通用积分
55.7537
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57805 点
帖子
4005
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

楼主
ReneeBK 发表于 2017-8-13 02:54:38 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Python Natural Language Processing

本帖隐藏的内容

https://github.com/PacktPublishing/Python-Natural-Language-Processing


This is the code repository for Python Natural Language Processing, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.

About the Book

This book starts off by laying the foundation for Natural Language Processing and gives you a better understanding of available free forms of corpus and different types of dataset. After this, you will know how to choose a dataset for natural language processing applications and find the right NLP techniques to process sentences in datasets and understand their structure. You will also learn how to tokenize different parts of sentences and ways to analyze them.

Instructions and Navigation

All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.

The code will look like the following:

import nltkfrom nltk.corpus import brown as cbfrom nltk.corpus import gutenberg as cg

Let's discuss some prerequisites for this book. Don't worry, it's not math or statistics, just basic Python coding syntax is all you need to know. Apart from that, you need Python 2.7.X or Python 3.5.X installed on your computer; I recommend using any Linux operating system as well. The list of Python dependencies can be found at GitHub repository athttps://github.com/jalajthanaki/NLPython/blob/master/pip-requirements.txt. Now let's look at the hardware required for this book. A computer with 4 GB RAM and at least a two-core CPU is good enough to execute the code, but for machine learning and deep learning examples, you may have more RAM, perhaps 8 GB or 16 GB, and computational power that uses GPU(s).

Related ProductsSuggestions and Feedback

Click here if you have any feedback or suggestions.


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Processing processI Language Process Natural

本帖被以下文库推荐

沙发
ReneeBK 发表于 2017-8-13 02:55:25
  1. # Various ways to scrape the page here I'm using my own blog pages.

  2. import requests
  3. from bs4 import BeautifulSoup


  4. def Get_the_page_by_beautibulsoup():
  5.     page = requests.get("https://simplifydatascience.wordpress.com/about/")
  6.     #print page.status_code
  7.     #print page.content
  8.     soup = BeautifulSoup(page.content, 'html.parser')
  9.     #print soup()
  10.     #print(soup.prettify()) #display source of the html page in readable format.
  11.     soup = BeautifulSoup(page.content, 'html.parser')
  12.     print soup.find_all('p')[0].get_text()
  13.     print soup.find_all('p')[1].get_text()
  14.     print soup.find_all('p')[2].get_text()
  15.     print soup.find_all('p')[3].get_text()


  16. if __name__ =="__main__":
  17.     Get_the_page_by_beautibulsoup()
复制代码

藤椅
ReneeBK 发表于 2017-8-13 02:56:24
  1. # This script give you idea how stemming has been placed by using NLTK and Polyglot libraries.
  2. # It is part of morphological analysis

  3. from nltk.stem import PorterStemmer
  4. from polyglot.text import Text, Word

  5. word = "unexpected"
  6. text = "disagreement"
  7. text1 = "disagree"
  8. text2 = "agreement"
  9. text3 = "quirkiness"
  10. text4 = "historical"
  11. text5 = "canonical"
  12. text6 = "happiness"
  13. text7 = "unkind"
  14. text8 = "dogs"
  15. text9 = "expected"
  16. words_derv = ["happiness", "unkind"]
  17. word_infle = ["dogs", "expected"]
  18. words = ["unexpected", "disagreement", "disagree", "agreement", "quirkiness", "canonical" "historical"]

  19. def stemmer_porter():
  20.     port = PorterStemmer()
  21.     print "\nDerivational Morphemes"
  22.     print " ".join([port.stem(i) for i in text6.split()])
  23.     print " ".join([port.stem(i) for i in text7.split()])
  24.     print "\nInflectional  Morphemes"
  25.     print " ".join([port.stem(i) for i in text8.split()])
  26.     print " ".join([port.stem(i) for i in text9.split()])
  27.     print "\nSome examples"
  28.     print " ".join([port.stem(i) for i in word.split()])
  29.     print " ".join([port.stem(i) for i in text.split()])
  30.     print " ".join([port.stem(i) for i in text1.split()])
  31.     print " ".join([port.stem(i) for i in text2.split()])
  32.     print " ".join([port.stem(i) for i in text3.split()])
  33.     print " ".join([port.stem(i) for i in text4.split()])
  34.     print " ".join([port.stem(i) for i in text5.split()])


  35. def polyglot_stem():
  36.     print "\nDerivational Morphemes using polyglot library"
  37.     for w in words_derv:
  38.         w = Word(w, language="en")
  39.         print("{:<20}{}".format(w, w.morphemes))
  40.     print "\nInflectional Morphemes using polyglot library"
  41.     for w in word_infle:
  42.         w = Word(w, language="en")
  43.         print("{:<20}{}".format(w, w.morphemes))
  44.     print "\nSome Morphemes examples using polyglot library"
  45.     for w in word_infle:
  46.         w = Word(w, language="en")
  47.         print("{:<20}{}".format(w, w.morphemes))


  48. if __name__ == "__main__":
  49.     stemmer_porter()
  50.     polyglot_stem()
复制代码

板凳
ReneeBK 发表于 2017-8-13 02:58:00
  1. # This script give you idea how tokenization and lemmatization has been placed by using NLTK.
  2. # It is part of lexical analysis
  3. from nltk.tokenize import word_tokenize
  4. from nltk.stem.wordnet import WordNetLemmatizer

  5. def wordtokenization():
  6.     content = """Stemming is funnier than a bummer says the sushi loving computer scientist.
  7.     She really wants to buy cars. She told me angrily. It is better for you.
  8.     Man is walking. We are meeting tomorrow. You really don't know..!"""
  9.     print word_tokenize(content)

  10. def wordlemmatization():
  11.     wordlemma = WordNetLemmatizer()
  12.     print wordlemma.lemmatize('cars')
  13.     print wordlemma.lemmatize('walking',pos='v')
  14.     print wordlemma.lemmatize('meeting',pos='n')
  15.     print wordlemma.lemmatize('meeting',pos='v')
  16.     print wordlemma.lemmatize('better',pos='a')
  17.     print wordlemma.lemmatize('is',pos='v')
  18.     print wordlemma.lemmatize('funnier',pos='a')
  19.     print wordlemma.lemmatize('expected',pos='v')
  20.     print wordlemma.lemmatize('fantasized',pos='v')

  21. if __name__ =="__main__":
  22.     wordtokenization()
  23.     print "\n"
  24.     print "----------Word Lemmatization----------"
  25.     wordlemmatization()
复制代码

报纸
oyjy1986 在职认证  发表于 2017-8-13 03:19:12 来自手机
Take a look

地板
langaoz 发表于 2017-8-13 04:01:06

Thanks

7
cszcszcsz 发表于 2017-8-13 05:38:39
谢谢 分享!

8
军旗飞扬 发表于 2017-8-13 06:28:20
谢谢楼主分享!

9
MouJack007 发表于 2017-8-13 07:30:07
谢谢楼主分享!

10
MouJack007 发表于 2017-8-13 07:31:37

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-1 07:11