【GitHub】Python Natural Language Processing

1关注
62粉丝

VIP

已卖：4901份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库 其他...

R资源总汇

Panel Data Analysis

Experimental Design

0%

威望: 1 级
论坛币: 49675 个
通用积分: 56.2487
学术水平: 370 点
热心指数: 273 点
信用等级: 335 点
经验: 57805 点
帖子: 4005
精华: 21
在线时间: 582 小时
注册时间: 2005-5-8
最后登录: 2023-11-26

楼主

ReneeBK 发表于 2017-8-13 02:54:38 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Python Natural Language Processing

本帖隐藏的内容

https://github.com/PacktPublishing/Python-Natural-Language-Processing

This is the code repository for Python Natural Language Processing, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.
About the BookThis book starts off by laying the foundation for Natural Language Processing and gives you a better understanding of available free forms of corpus and different types of dataset. After this, you will know how to choose a dataset for natural language processing applications and find the right NLP techniques to process sentences in datasets and understand their structure. You will also learn how to tokenize different parts of sentences and ways to analyze them.
Instructions and NavigationAll of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.
The code will look like the following:
import nltkfrom nltk.corpus import brown as cbfrom nltk.corpus import gutenberg as cg
Let's discuss some prerequisites for this book. Don't worry, it's not math or statistics, just basic Python coding syntax is all you need to know. Apart from that, you need Python 2.7.X or Python 3.5.X installed on your computer; I recommend using any Linux operating system as well. The list of Python dependencies can be found at GitHub repository athttps://github.com/jalajthanaki/NLPython/blob/master/pip-requirements.txt. Now let's look at the hardware required for this book. A computer with 4 GB RAM and at least a two-core CPU is good enough to execute the code, but for machine learning and deep learning examples, you may have more RAM, perhaps 8 GB or 16 GB, and computational power that uses GPU(s).
Related ProductsNatural Language Processing: Python and NLTK
Mastering Natural Language Processing with Python
Natural Language Processing with Java - Second Edition

Suggestions and FeedbackClick here if you have any feedback or suggestions.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Processing processI Language Process Natural

本帖被以下文库推荐

· 编程语言(Coding Languages)|主题: 3936, 订阅: 126

沙发

ReneeBK 发表于 2017-8-13 02:55:25

# Various ways to scrape the page here I'm using my own blog pages.
import requests
from bs4 import BeautifulSoup
def Get_the_page_by_beautibulsoup():
page = requests.get("https://simplifydatascience.wordpress.com/about/")
#print page.status_code
#print page.content
soup = BeautifulSoup(page.content, 'html.parser')
#print soup()
#print(soup.prettify()) #display source of the html page in readable format.
soup = BeautifulSoup(page.content, 'html.parser')
print soup.find_all('p')[0].get_text()
print soup.find_all('p')[1].get_text()
print soup.find_all('p')[2].get_text()
print soup.find_all('p')[3].get_text()
if __name__ =="__main__":
Get_the_page_by_beautibulsoup()

复制代码

藤椅

ReneeBK 发表于 2017-8-13 02:56:24

# This script give you idea how stemming has been placed by using NLTK and Polyglot libraries.
# It is part of morphological analysis
from nltk.stem import PorterStemmer
from polyglot.text import Text, Word
word = "unexpected"
text = "disagreement"
text1 = "disagree"
text2 = "agreement"
text3 = "quirkiness"
text4 = "historical"
text5 = "canonical"
text6 = "happiness"
text7 = "unkind"
text8 = "dogs"
text9 = "expected"
words_derv = ["happiness", "unkind"]
word_infle = ["dogs", "expected"]
words = ["unexpected", "disagreement", "disagree", "agreement", "quirkiness", "canonical" "historical"]
def stemmer_porter():
port = PorterStemmer()
print "\nDerivational Morphemes"
print " ".join([port.stem(i) for i in text6.split()])
print " ".join([port.stem(i) for i in text7.split()])
print "\nInflectional Morphemes"
print " ".join([port.stem(i) for i in text8.split()])
print " ".join([port.stem(i) for i in text9.split()])
print "\nSome examples"
print " ".join([port.stem(i) for i in word.split()])
print " ".join([port.stem(i) for i in text.split()])
print " ".join([port.stem(i) for i in text1.split()])
print " ".join([port.stem(i) for i in text2.split()])
print " ".join([port.stem(i) for i in text3.split()])
print " ".join([port.stem(i) for i in text4.split()])
print " ".join([port.stem(i) for i in text5.split()])
def polyglot_stem():
print "\nDerivational Morphemes using polyglot library"
for w in words_derv:
w = Word(w, language="en")
print("{:<20}{}".format(w, w.morphemes))
print "\nInflectional Morphemes using polyglot library"
for w in word_infle:
w = Word(w, language="en")
print("{:<20}{}".format(w, w.morphemes))
print "\nSome Morphemes examples using polyglot library"
for w in word_infle:
w = Word(w, language="en")
print("{:<20}{}".format(w, w.morphemes))
if __name__ == "__main__":
stemmer_porter()
polyglot_stem()

复制代码

板凳

ReneeBK 发表于 2017-8-13 02:58:00

# This script give you idea how tokenization and lemmatization has been placed by using NLTK.
# It is part of lexical analysis
from nltk.tokenize import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer
def wordtokenization():
content = """Stemming is funnier than a bummer says the sushi loving computer scientist.
She really wants to buy cars. She told me angrily. It is better for you.
Man is walking. We are meeting tomorrow. You really don't know..!"""
print word_tokenize(content)
def wordlemmatization():
wordlemma = WordNetLemmatizer()
print wordlemma.lemmatize('cars')
print wordlemma.lemmatize('walking',pos='v')
print wordlemma.lemmatize('meeting',pos='n')
print wordlemma.lemmatize('meeting',pos='v')
print wordlemma.lemmatize('better',pos='a')
print wordlemma.lemmatize('is',pos='v')
print wordlemma.lemmatize('funnier',pos='a')
print wordlemma.lemmatize('expected',pos='v')
print wordlemma.lemmatize('fantasized',pos='v')
if __name__ =="__main__":
wordtokenization()
print "\n"
print "----------Word Lemmatization----------"
wordlemmatization()

复制代码

报纸

oyjy1986

发表于 2017-8-13 03:19:12 来自手机

Take a look

地板

langaoz 发表于 2017-8-13 04:01:06

Thanks

7楼

cszcszcsz 发表于 2017-8-13 05:38:39

谢谢分享！

8楼

军旗飞扬

发表于 2017-8-13 06:28:20

谢谢楼主分享！

9楼

MouJack007 发表于 2017-8-13 07:30:07

谢谢楼主分享！

10楼

MouJack007 发表于 2017-8-13 07:31:37

【GitHub】Python Natural Language Processing [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

【GitHub】Python Natural Language Processing [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我 拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群