签到
- 苹果/安卓/wp
- 苹果/安卓/wp
客户端
0.0

0.00

人大经济论坛 › 论坛 › 计量经济学与统计论坛五区 › 计量经济学与统计软件 › winbugs及其他软件专版 › Python Social Media Analytics

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

12 3 4 5 下一页

提升主题| 本版置顶| 关闭主题| 变更主题颜色| 抢沙发| 顶贴| 显身卡| 道具中心

楼主: Nicolle

3755 43

Python Social Media Analytics [推广有奖]

加关注串个门加好友发消息 0关注 463 粉丝巨擘 Nicolle 当前离线阅读权限 255 威望 16 级论坛币 12402323 个通用积分 1620.8615 学术水平 3305 点热心指数 3329 点信用等级 3095 点经验 477211 点帖子 23879 精华 91 在线时间 9878 小时注册时间 2005-4-23 最后登录 2022-3-6 雷达卡 0% 加关注串个门加好友发消息 0关注 463 粉丝巨擘 0% 巨擘积分 77103, 距离下一级还需 999922896 积分权限: 自定义头衔, 签名中使用图片, 隐身, 设置帖子权限, 设置回复可见, 签名中使用代码道具: 涂鸦板, 彩虹炫, 雷达卡, 热点灯, 显身卡, 匿名卡, 金钱卡, 变色卡, 抢沙发, 置顶卡, 提升卡, 沉默卡, 千斤顶还不是VIP/贵宾 - 还不是VIP/贵宾购买后可立即获得权限: 隐身道具: 金钱卡, 涂鸦板, 变色卡, 彩虹炫, 雷达卡, 热点灯 TA的文库其他... Python(Must-Read Books) SAS Programming Must-Read Books 0% 威望 16 级论坛币 12402323 个通用积分 1620.8615 学术水平 3305 点热心指数 3329 点信用等级 3095 点经验 477211 点帖子 23879 精华 91 在线时间 9878 小时注册时间 2005-4-23 最后登录 2022-3-6 该用户从未签到	楼主 Nicolle 发表于 2018-6-15 08:08:55 \|只看作者 \|坛友微信交流群\|倒序 \|AI写论文提示: 作者被禁止或删除内容自动屏蔽本帖被以下文库推荐 · 经典计算机教材文库\|主题: 1297, 订阅: 216 · Social Media Mining\|主题: 70, 订阅: 10 · Python(Must-Read Books)\|主题: 1687, 订阅: 407

	回复使用道具举报提升卡置顶卡沉默卡变色卡抢沙发千斤顶显身卡

沙发

cszcszcsz 发表于 2018-6-15 08:16:35 |只看作者 |坛友微信交流群

Stream API
In [ ]:
from pymongo import MongoClient
from requests_oauthlib import OAuth1
client = MongoClient('mongodb://localhost:27017/')
db = client['test']
collection = db['test']
url = 'https://stream.Twitter.com/1.1/statuses/filter.json'
auth = OAuth1(consumer_key, consumer_secret, access_token, access_token_secret)
pms = {'track' : 'premier league -filter:retweets AND -filter:replies', 'lang': 'en'}
res = requests.post(url, auth=auth, params = pms, stream = True)
for line in res.iter_lines():
if line:
tweet = json.loads(line)
try:
collection.insert(tweet)
except:
pass

复制代码

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

回复

使用道具举报

藤椅

suzhzh 发表于 2018-6-15 08:39:35 |只看作者 |坛友微信交流群

Customized sentiment analysis
In [75]:
#You need to tag the dataset before
dataset = pd.read_pickle('tagged.pickle')
classes = ['pos', 'neu', 'neg']
train_data = dataset['final'][0:80]
train_labels = dataset['label'][0:80]
test_data = dataset['final'][80:96]
test_labels = dataset['label'][80:96]
train_data = list(train_data.apply(' '.join))
test_data = list(test_data.apply(' '.join))
In [77]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
vectorizer = TfidfVectorizer(min_df=5, max_df = 0.8, sublinear_tf=True, use_idf=True)
train_vectors = vectorizer.fit_transform(train_data)
test_vectors = vectorizer.transform(test_data)
### Perform a logistic regression model, and fit with X and y
nb = MultinomialNB()
nb.fit(train_vectors, train_labels).score(test_vectors, test_labels)

复制代码

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

回复

使用道具举报

板凳

suzhzh 发表于 2018-6-15 08:39:36 |只看作者 |坛友微信交流群

Many thanks

回复

使用道具举报

报纸

HappyAndy_Lo 发表于 2018-6-15 08:48:27 |只看作者 |坛友微信交流群

NER Recognition
In [ ]:
from nltk.tag import StanfordNERTagger
from collections import Counter
import numpy as np
st = StanfordNERTagger('path_to_your_folder/english.all.3class.distsim.crf.ser.gz')
st.tag(sentence.split())
for r in tweets:
lst_tags = st.tag(r.split())
for tup in lst_tags:
if(tup[1] != 'O'):
entities.append(tup)
In [ ]:
organizations = df_entities[df_entities['ner'].str.contains("ORGANIZATION")]
cnt = Counter(organizations['word'])
cnt.most_common(10)

复制代码

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

回复

使用道具举报

地板

albertwishedu 发表于 2018-6-15 08:55:59 |只看作者 |坛友微信交流群

Getting the data
In [1]:
def get_statistics(video_id):
url = "https://www.googleapis.com/youtube/v3/videos"
pms = {'key': api_key, 'id': video_id, 'part':'contentDetails,statistics'}
res = requests.get(url, params = pms)
data = res.json()
return(data)
In [3]:
def get_channel_videos(channel_id='UC-2Y8dQb0S6DtpxNgAKoJKA'):
url = "https://www.googleapis.com/youtube/v3/search"
pms = {'type': 'video', 'id' : id, 'key': api_key, 'channelId': channel_id, 'part':'snippet', 'order':'viewCount','maxResults':50}
res = requests.get(url, params = pms)
print("Connection status: %s" % res)
data = res.json()
#print(data)
lst = []
for video in data['items']:
video_stats = get_statistics(video['id']['videoId'])
#print(video['snippet']['title'])
results_json = {
'channelTitle' : video['snippet']['channelTitle'],
'title' : video['snippet']['title'],
'publishedAt' : video['snippet']['publishedAt'],
'videoId' : video['id']['videoId'],
'viewCount' : video_stats['items'][0]['statistics']['viewCount'],
'commentCount' : video_stats['items'][0]['statistics']['commentCount'],
'likeCount' : video_stats['items'][0]['statistics']['likeCount'],
'dislikeCount' : video_stats['items'][0]['statistics']['dislikeCount'],
}
lst.append(results_json)
df = pd.read_json(json.dumps(lst))
return(df)
In [5]:
import requests
url = 'https://www.googleapis.com/youtube/v3/videos'
pms = {'part': 'snippet, statistics', 'id' : 'YQUpg795iBo', 'key': 'xxx'}
res = requests.get(url, params = pms)
data = res.json()
In [6]:
url = 'https://www.googleapis.com/youtube/v3/commentThreads'
full_data = [] # return list
page = '' # init paging
while True:
pms = {'part': 'snippet', 'videoId' : id, 'maxResults' : 100, 'key': api_key, 'pageToken': page}
res = requests.get(url, params = pms)
print("Connection status: %s" % res)
data = res.json()
full_data.extend(data['items'])
print("Just downloaded: %s, Total: %s" % (len(data['items']), len(full_data)))
try:
page = data['nextPageToken']
except:
break

复制代码

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

回复

使用道具举报

7楼

学生认证

发表于 2018-6-15 09:51:41 |只看作者 |坛友微信交流群

好书，学习了

回复

使用道具举报

8楼

小陆家嘴 发表于 2018-6-15 10:15:29 |只看作者 |坛友微信交流群

谢谢分享

回复

使用道具举报

9楼

啸傲江弧 发表于 2018-6-15 10:30:44 |只看作者 |坛友微信交流群

回复

使用道具举报

10楼

lhf8059 发表于 2018-6-15 10:42:23 |只看作者 |坛友微信交流群

看看看

回复

使用道具举报

12 3 4 5 下一页

发帖

本版微信群

加好友,备注jltj
拉您入交流群

如有投资本站、合作意向或投放广告，请联系：13661292478（刘老师）

联系客服

邮箱：service@pinggu.org 投诉或不良信息处理：（010-68466864）

京ICP备16021002-2号京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明