So, given what data is shown in the output of running head() on the dataset above, and having a rough intuition of what tweet metrics would be useful, we will grab the following stats:
Retweets - Mean RTs per tweet & top 5 RTed tweets
Likes - Mean likes per tweet & top 5 liked tweets
Impressions - Mean impressions per tweet & top 5 tweets with most impressions
# Total tweets
print 'Total tweets this period:', len(tweet_df.index), '\n'
It's no secret that hashtags play an important role in Twitter, and mentions can also help grow your network and influence. Together they help put the 'social' in social networking, transforming platforms like Twitter from passive experiences to very active ones. With that, getting a handle on the most social aspect of this social network can be a helpful endeavour.
# Hashtags & mentions
tag_dict = {}
mention_dict = {}
for i in tweet_df.index:
tweet_text = tweet_df.ix[i]['Tweet text']
tweet = tweet_text.lower()
tweet_tokenized = tweet.split()
for word in tweet_tokenized:
# Hashtags - tokenize and build dict of tag counts
Finally, let's have a look at some very basic temporal data. We will check mean impressions for tweets based -- independently -- on both the hour of day and day of week that they are tweeted. I caution (once gain) that this is based on very little data, and so nothing useful will likely be gleaned. However, given much larger amounts of tweet data, entire social media campaigns are planned.
While this is based on impressions, it could just as reasonably (and easily changed to) be based on engagements, or RTs, or whatever else you pleased. Working in advertising, and promoting tweets? Maybe you are more interested in some of those promotion* metrics we hacked off the dataset at the start.
We have to convert the Twitter supplied date field to a legitimate Python datetime object, bin the data based on which hourly slot it falls into, identify days of week, and then capture this data in a couple of additional columns in the DataFrame, which we will pillage for stats afterward.