发帖

楼主: Lisrelchen

2048 0

Text Mining: Ukraine Tweet Network Analysis in R [推广有奖]

0关注
62粉丝

VIP

已卖：4196份资源

院士

67%

还不是VIP/贵宾

-

TA的文库 其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

0%

威望: 0 级
论坛币: 50294 个
通用积分: 83.8106
学术水平: 253 点
热心指数: 300 点
信用等级: 208 点
经验: 41518 点
帖子: 3256
精华: 14
在线时间: 766 小时
注册时间: 2006-5-4
最后登录: 2022-11-6

楼主

Lisrelchen 发表于 2015-1-22 07:15:39 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Text Mining: Ukraine Tweet Network Analysis in R

#Ukraine Tweets as a Network
There were certain key terms in the tweets that connected the #Ukraine tweets together. Removing them would improve our ability to see underlying connections (besides the obvious), and simplify the network graph. So here I chose to remove "ukraine", "prorussian", and "russia".

You might remember last time to create an adjacency matrix for the terms, we multiplied the term-document matrix and its transpose together. Here we will perform the same matrix multiplication but in a different order, to create an adjacency matrix for the tweets (documents). This time we require the transpose of the tweet matrix multiplied by the tweet matrix, so that the tweets (docs) are multiplied together.

Tweet Adjacency Matrix Code:
# Tweet Network Analysis ####
load("ukraine.tdm.RData")
# remove common terms to simplify graph and find
# relationships between tweets beyond keywords
ukraine.m <- as.matrix(ukraine.tdm)
idx <- which(dimnames(ukraine.m)$Terms %in% c("ukraine", "prorussian", "russia"))
ukraine.tweetm <- ukraine.m[-idx,]
# build tweet-tweet adjacency matrix
ukraine.tweetm <- t(ukraine.tweetm) %*% ukraine.tweetm
ukraine.tweetm[5:10,5:10]
Docs
Docs 5 6 7 8 9 10
5 0 0 0 0 0 0
6 0 2 0 0 1 0
7 0 0 1 0 0 0
8 0 0 0 0 0 0
9 0 1 0 0 4 0
10 0 0 0 0 0 0

复制代码

We see from the tweet adjacency matrix, the terms two documents have in common. For example, tweet 9 has 1 term in common with tweet 6. The number will be the same whether you start at tweet 9 or tweet 6, and compare the other.

Now we are ready for plotting the network graphic.

Visualizing the Network
Again we will use the igraph library in R, and use the graph.adjacency() function to create the network graph object. Recall that V( ) allows us to manipulate the vertices and E() allows us to format the edges. Below we change and set the labels, color, and size for the vertices.

Tweet Network Setup Code:
# configure plot
library(igraph)
ukraine.g <- graph.adjacency(ukraine.tweetm, weighted=TRUE, mode="undirected")
V(ukraine.g)$degree <- degree(ukraine.g)
ukraine.g <- simplify(ukraine.g)
# set labels of vertices to tweet IDs
V(ukraine.g)$label <- V(ukraine.g)$name
V(ukraine.g)$label.cex <- 1
V(ukraine.g)$color <- rgb(.4, 0, 0, .7)
V(ukraine.g)$size <- 2
V(ukraine.g)$frame.color <- NA
# barplot of connections
barplot(table(V(ukraine.g)$degree), main="Number of Adjacent Edges")

复制代码

Barplot of Number of Connections

From the barplot, we see that there are over 60 tweets which do not share any edges with other tweets. For the most connections, there is 1 tweet with 59 connections. The median connection number is 16.

Next we modify the the graph object even more by accenting the vertices with zero degrees selected by index in the idx variable.. In order to understand the content of those isolated tweets, we pull the first 20 characters of tweet text from the raw tweet data (you can specify how many you want).

Then we change the color and width of the edges to reflect a scale of the minimum and maximum weights (width/strength of the connections). This way we can discern the size of the weight relative to the maximum weight. Then we plot the tweet network graphic.

Plotting Code:
# set vertex colors based on degree
idx <- V(ukraine.g)$degree == 0
V(ukraine.g)$label.color[idx] <- rgb(0,0,.3,.7)
# load raw twitter text
library(twitteR)
load("ukraine.raw.RData")
# convert tweets to data.frame
ukraine.df <- do.call("rbind", lapply(ukraine, as.data.frame))
# set labels to the IDs and the first 20 characters of tweets
V(ukraine.g)$label[idx] <- paste(V(ukraine.g)$name[idx],
substr(ukraine.df$text[idx], 1, 20),
sep=": ")
egam <- (log(E(ukraine.g)$weight)+.2) / max(log(E(ukraine.g)$weight)+.2)
E(ukraine.g)$color <- rgb(.5, .5, 0, egam)
E(ukraine.g)$width <- egam
layout2 <- layout.fruchterman.reingold(ukraine.g)
plot(ukraine.g, layout2)

复制代码

Initial Tweet Network Graphic

The first 20 characters of tweets with no degrees in blue surround the network of interconnected tweets. Looking at this cumbersome graphic, I would like to eliminate the zero degree tweets so we can look at the connected tweets.

Revised Plotting Code:
# delete vertices in crescent with no degrees
# remove from graph using delete.vertices()
ukraine.g2 <- delete.vertices(ukraine.g,
V(ukraine.g)[degree(ukraine.g)==0])
plot(ukraine.g2, layout=layout.fruchterman.reingold)

复制代码

Tweet Network Graphic- Removed Unconnected Vertices

Now with the degree-less tweets removed, we can get a better view of the tweet network. Additionally, we can delete the edges with low weights to accentuate the connections with heavier weights.

Revised Again Plotting Code:

# remove edges with low degreesukraine.g3 <- delete.edges(ukraine.g, E(ukraine.g)[E(ukraine.g)$weights <= 1])ukraine.g3 <- delete.vertices(ukraine.g3, V(ukraine.g3)[degree(ukraine.g3)==0])plot(ukraine.g3, layout=layout.fruchterman.reingold)

复制代码

Tweet Network Graphic- Removed Low Degree Tweets

The new tweet network graphic is much more manageable than the first two graphics, which included the zero degree tweets, and edges with low weight. We can observe a few close tweet clusters- at least six.

Tweet Clusters
Since we now have our visual of tweets, and see how they cluster together with various weights, we would like to read the tweets. For example, let us explore the cluster in the very top right of the graphic, consisting of text numbers 105, 177, 145, 152, 68, 89, 88, 55, 104, 174, and 196.

Code:

[code]
# check tweet cluster texts
ukraine.df$text[c(105,177,145,152,68,89,88,55,104,174,196)]

复制代码

[1] "@ericmargolis Is Russia or the US respecting the sovereignty and territorial integrity of #Ukraine as per the 1994 Budapest Memorandum????"
[2] "Troops on the Ground: U.S. and NATO Plan PSYOPS Teams in #Ukraine - http://t.co/pXP3TR0uwi #LNYHBT #TEAPARTY #WAAR #REDNATION #CCOT #TCOT"
[3] "US condemns a

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Text Mining Analysis Analysi Ukraine network connected remember together Network network

本帖被以下文库推荐

· Text Mining NewOccidental|主题: 213, 订阅: 43

返回列表

发帖

Text Mining: Ukraine Tweet Network Analysis in R [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

Text Mining: Ukraine Tweet Network Analysis in R [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群