签到
- 苹果/安卓/wp
- 苹果/安卓/wp
客户端
0.0

0.00

人大经济论坛 › 论坛 › 计量经济学与统计论坛五区 › 计量经济学与统计软件 › winbugs及其他软件专版 › 【博文】A Beginner’s Guide to Tweet Analytics with ...

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

提升主题| 本版置顶| 关闭主题| 变更主题颜色| 抢沙发| 顶贴| 显身卡| 道具中心

楼主: Lisrelchen

773 7

【博文】A Beginner’s Guide to Tweet Analytics with Pandas [推广有奖]

0关注
62粉丝

院士

67%

还不是VIP/贵宾

-

TA的文库 其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

0%

威望: 0 级
论坛币: 49957 个
通用积分: 79.5487
学术水平: 253 点
热心指数: 300 点
信用等级: 208 点
经验: 41518 点
帖子: 3256
精华: 14
在线时间: 766 小时
注册时间: 2006-5-4
最后登录: 2022-11-6

楼主

Lisrelchen 发表于 2017-4-16 01:38:24 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

def load_tweets(tweet_file):
""" Load and process a Twitter analytics data file """
# Read tweet data (obtained from Twitter Analytics)
tweet_df = pd.read_csv(tweet_file)
# Drop irrelevant columns
tweet_df = tweet_df.drop(tweet_df.columns[[13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39]], axis=1)
return tweet_df

复制代码

http://www.kdnuggets.com/2017/03/beginners-guide-tweet-analytics-pandas.html

二维码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏1 回帖

关键词：Analytics Analytic beginner beginn pandas

相关帖子

• CDA数据分析师认证考试

回复

使用道具举报

沙发

Lisrelchen 发表于 2017-4-16 01:39:00 |只看作者 |坛友微信交流群

Basic Tweet Stats
So, given what data is shown in the output of running head() on the dataset above, and having a rough intuition of what tweet metrics would be useful, we will grab the following stats:
Retweets - Mean RTs per tweet & top 5 RTed tweets
Likes - Mean likes per tweet & top 5 liked tweets
Impressions - Mean impressions per tweet & top 5 tweets with most impressions
# Total tweets
print 'Total tweets this period:', len(tweet_df.index), '\n'
# Retweets
tweet_df = tweet_df.sort_values(by='retweets', ascending=False)
tweet_df = tweet_df.reset_index(drop=True)
print 'Mean retweets:', round(tweet_df['retweets'].mean(),2), '\n'
print 'Top 5 RTed tweets:'
print '------------------'
for i in range(5):
print tweet_df['Tweet text'].ix[i], '-', tweet_df['retweets'].ix[i]
print '\n'
# Likes
tweet_df = tweet_df.sort_values(by='likes', ascending=False)
tweet_df = tweet_df.reset_index(drop=True)
print 'Mean likes:', round(tweet_df['likes'].mean(),2), '\n'
print 'Top 5 liked tweets:'
print '-------------------'
for i in range(5):
print tweet_df['Tweet text'].ix[i], '-', tweet_df['likes'].ix[i]
print '\n'
# Impressions
tweet_df = tweet_df.sort_values(by='impressions', ascending=False)
tweet_df = tweet_df.reset_index(drop=True)
print 'Mean impressions:', round(tweet_df['impressions'].mean(),2), '\n'
print 'Top 5 tweets with most impressions:'
print '-----------------------------------'
for i in range(5):
print tweet_df['Tweet text'].ix[i], '-', tweet_df['impressions'].ix[i]

复制代码

回复

使用道具举报

藤椅

Lisrelchen 发表于 2017-4-16 01:40:14 |只看作者 |坛友微信交流群

Top #Hashtags and @Mentions
It's no secret that hashtags play an important role in Twitter, and mentions can also help grow your network and influence. Together they help put the 'social' in social networking, transforming platforms like Twitter from passive experiences to very active ones. With that, getting a handle on the most social aspect of this social network can be a helpful endeavour.
# Hashtags & mentions
tag_dict = {}
mention_dict = {}
for i in tweet_df.index:
tweet_text = tweet_df.ix[i]['Tweet text']
tweet = tweet_text.lower()
tweet_tokenized = tweet.split()
for word in tweet_tokenized:
# Hashtags - tokenize and build dict of tag counts
if (word[0:1] == '#' and len(word) > 1):
key = word.translate(string.maketrans("",""), string.punctuation)
if key in tag_dict:
tag_dict[key] += 1
else:
tag_dict[key] = 1
# Mentions - tokenize and build dict of mention counts
if (word[0:1] == '@' and len(word) > 1):
key = word.translate(string.maketrans("",""), string.punctuation)
if key in mention_dict:
mention_dict[key] += 1
else:
mention_dict[key] = 1
# The 10 most popular tags and counts
top_tags = dict(sorted(tag_dict.iteritems(), key=operator.itemgetter(1), reverse=True)[:10])
top_tags_sorted = sorted(top_tags.items(), key=lambda x: x[1])[::-1]
print 'Top 10 hashtags:'
print '----------------'
for tag in top_tags_sorted:
print tag[0], '-', str(tag[1])
# The 10 most popular mentions and counts
top_mentions = dict(sorted(mention_dict.iteritems(), key=operator.itemgetter(1), reverse=True)[:10])
top_mentions_sorted = sorted(top_mentions.items(), key=lambda x: x[1])[::-1]
print '\nTop 10 mentions:'
print '----------------'
for mention in top_mentions_sorted:
print mention[0], '-', str(mention[1])

复制代码

回复

使用道具举报

板凳

Lisrelchen 发表于 2017-4-16 01:41:19 |只看作者 |坛友微信交流群

Time-series Analysis
Finally, let's have a look at some very basic temporal data. We will check mean impressions for tweets based -- independently -- on both the hour of day and day of week that they are tweeted. I caution (once gain) that this is based on very little data, and so nothing useful will likely be gleaned. However, given much larger amounts of tweet data, entire social media campaigns are planned.
While this is based on impressions, it could just as reasonably (and easily changed to) be based on engagements, or RTs, or whatever else you pleased. Working in advertising, and promoting tweets? Maybe you are more interested in some of those promotion* metrics we hacked off the dataset at the start.
We have to convert the Twitter supplied date field to a legitimate Python datetime object, bin the data based on which hourly slot it falls into, identify days of week, and then capture this data in a couple of additional columns in the DataFrame, which we will pillage for stats afterward.
# Time-series impressions (DOW, HOD, etc) (0 = Sunday... 6 = Saturday)
gmt_offset = -4
# Create proper datetime column, apply local GMT offset
tweet_df['ts'] = pd.to_datetime(tweet_df['time'])
tweet_df['ts'] = tweet_df.ts + pd.to_timedelta(gmt_offset, unit='h')
# Add hour of day and day of week columns
tweet_df['hod'] = [t.hour for t in tweet_df.ts]
tweet_df['dow'] = [t.dayofweek for t in tweet_df.ts]
hod_dict = {}
hod_count = {}
dow_dict = {}
dow_count = {}
weekday_dict = {0: 'Mon', 1: 'Tue', 2: 'Wed', 3: 'Thu', 4: 'Fri', 5: 'Sat', 6: 'Sun'}
# Process tweets, collect stats
for i in tweet_df.index:
hod = tweet_df.ix[i]['hod']
dow = tweet_df.ix[i]['dow']
imp = tweet_df.ix[i]['impressions']
if hod in hod_dict:
hod_dict[hod] += int(imp)
hod_count[hod] += 1
else:
hod_dict[hod] = int(imp)
hod_count[hod] = 1
if dow in dow_dict:
dow_dict[dow] += int(imp)
dow_count[dow] += 1
else:
dow_dict[dow] = int(imp)
dow_count[dow] = 1
print 'Average impressions per tweet by hour tweeted:'
print '----------------------------------------------'
for hod in hod_dict:
print hod, '-', hod+1, ':', hod_dict[hod]/hod_count[hod], '=>', hod_count[hod], 'tweets'
print '\nAverage impressions per tweet by day of week tweeted:'
print '-----------------------------------------------------'
for dow in dow_dict:
print weekday_dict[dow], ':', dow_dict[dow]/dow_count[dow], '=>', dow_count[dow], ' tweets'

复制代码

回复

使用道具举报

报纸

学生认证

发表于 2017-4-16 01:50:54 |只看作者 |坛友微信交流群

感谢分享

回复

使用道具举报

地板

MouJack007 发表于 2017-4-16 03:41:04 |只看作者 |坛友微信交流群

谢谢楼主分享！

回复

使用道具举报

7楼

MouJack007 发表于 2017-4-16 03:41:28 |只看作者 |坛友微信交流群

回复

使用道具举报

8楼

franky_sas 发表于 2017-4-16 13:41:29 |只看作者 |坛友微信交流群

回复

使用道具举报

发帖

本版微信群

加好友,备注jltj
拉您入交流群

如有投资本站、合作意向或投放广告，请联系：13661292478（刘老师）

联系客服

邮箱：service@pinggu.org 投诉或不良信息处理：（010-68466864）

京ICP备16021002-2号京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明