楼主: Lisrelchen
2000 0

Text Mining: Ukraine Tweet Network Analysis in R [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50288 个
通用积分
83.6306
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2015-1-22 07:15:39 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Text Mining: Ukraine Tweet Network Analysis in R


#Ukraine Tweets as a Network
There were certain key terms in the tweets that connected the #Ukraine tweets together. Removing them would improve our ability to see underlying connections (besides the obvious), and simplify the network graph. So here I chose to remove "ukraine", "prorussian", and "russia".

You might remember last time to create an adjacency matrix for the terms, we multiplied the term-document matrix and its transpose together. Here we will perform the same matrix multiplication but in a different order, to create an adjacency matrix for the tweets (documents). This time we require the transpose of the tweet matrix multiplied by the tweet matrix, so that the tweets (docs) are multiplied together.
  1. Tweet Adjacency Matrix Code:
  2. # Tweet Network Analysis ####
  3. load("ukraine.tdm.RData")

  4. # remove common terms to simplify graph and find
  5. # relationships between tweets beyond keywords
  6. ukraine.m <- as.matrix(ukraine.tdm)
  7. idx <- which(dimnames(ukraine.m)$Terms %in% c("ukraine", "prorussian", "russia"))
  8. ukraine.tweetm <- ukraine.m[-idx,]

  9. # build tweet-tweet adjacency matrix
  10. ukraine.tweetm <- t(ukraine.tweetm) %*% ukraine.tweetm
  11. ukraine.tweetm[5:10,5:10]

  12.     Docs
  13. Docs 5 6 7 8 9 10
  14.   5  0 0 0 0 0  0
  15.   6  0 2 0 0 1  0
  16.   7  0 0 1 0 0  0
  17.   8  0 0 0 0 0  0
  18.   9  0 1 0 0 4  0
  19.   10 0 0 0 0 0  0
复制代码

We see from the tweet adjacency matrix, the terms two documents have in common. For example, tweet 9 has 1 term in common with tweet 6. The number will be the same whether you start at tweet 9 or tweet 6, and compare the other.

Now we are ready for plotting the network graphic.



Visualizing the Network
Again we will use the
igraph library in R, and use the graph.adjacency() function to create the network graph object. Recall that V( ) allows us to manipulate the vertices and E() allows us to format the edges. Below we change and set the labels, color, and size for the vertices.
  1. Tweet Network Setup Code:
  2. # configure plot
  3. library(igraph)
  4. ukraine.g <- graph.adjacency(ukraine.tweetm, weighted=TRUE, mode="undirected")
  5. V(ukraine.g)$degree <- degree(ukraine.g)
  6. ukraine.g <- simplify(ukraine.g)

  7. # set labels of vertices to tweet IDs
  8. V(ukraine.g)$label <- V(ukraine.g)$name
  9. V(ukraine.g)$label.cex <- 1
  10. V(ukraine.g)$color <- rgb(.4, 0, 0, .7)
  11. V(ukraine.g)$size <- 2
  12. V(ukraine.g)$frame.color <- NA

  13. # barplot of connections
  14. barplot(table(V(ukraine.g)$degree), main="Number of Adjacent Edges")
复制代码


Barplot of Number of Connections

From the barplot, we see that there are over 60 tweets which do not share any edges with other tweets. For the most connections, there is 1 tweet with 59 connections. The median connection number is 16.

Next we modify the the graph object even more by accenting the vertices with zero degrees selected by index in the
idx variable.. In order to understand the content of those isolated tweets, we pull the first 20 characters of tweet text from the raw tweet data (you can specify how many you want).

Then we change the color and width of the edges to reflect a scale of the minimum and maximum weights (width/strength of the connections). This way we can discern the size of the weight relative to the maximum weight. Then we plot the tweet network graphic.

  1. Plotting Code:
  2. # set vertex colors based on degree
  3. idx <- V(ukraine.g)$degree == 0
  4. V(ukraine.g)$label.color[idx] <- rgb(0,0,.3,.7)
  5. # load raw twitter text
  6. library(twitteR)
  7. load("ukraine.raw.RData")
  8. # convert tweets to data.frame
  9. ukraine.df <- do.call("rbind", lapply(ukraine, as.data.frame))
  10. # set labels to the IDs and the first 20 characters of tweets
  11. V(ukraine.g)$label[idx] <- paste(V(ukraine.g)$name[idx],
  12.                                  substr(ukraine.df$text[idx], 1, 20),
  13.                                  sep=": ")
  14. egam <- (log(E(ukraine.g)$weight)+.2) / max(log(E(ukraine.g)$weight)+.2)
  15. E(ukraine.g)$color <- rgb(.5, .5, 0, egam)
  16. E(ukraine.g)$width <- egam
  17. layout2 <- layout.fruchterman.reingold(ukraine.g)
  18. plot(ukraine.g, layout2)
复制代码



Initial Tweet Network Graphic

The first 20 characters of tweets with no degrees in blue surround the network of interconnected tweets. Looking at this cumbersome graphic, I would like to eliminate the zero degree tweets so we can look at the connected tweets.


  1. Revised Plotting Code:
  2. # delete vertices in crescent with no degrees
  3. # remove from graph using delete.vertices()
  4. ukraine.g2 <- delete.vertices(ukraine.g,
  5.                               V(ukraine.g)[degree(ukraine.g)==0])
  6. plot(ukraine.g2, layout=layout.fruchterman.reingold)
复制代码


Tweet Network Graphic- Removed Unconnected Vertices

Now with the degree-less tweets removed, we can get a better view of the tweet network. Additionally, we can delete the edges with low weights to accentuate the connections with heavier weights.


Revised Again Plotting Code:


  1. # remove edges with low degreesukraine.g3 <- delete.edges(ukraine.g, E(ukraine.g)[E(ukraine.g)$weights <= 1])ukraine.g3 <- delete.vertices(ukraine.g3, V(ukraine.g3)[degree(ukraine.g3)==0])plot(ukraine.g3, layout=layout.fruchterman.reingold)
复制代码



Tweet Network Graphic- Removed Low Degree Tweets

The new tweet network graphic is much more manageable than the first two graphics, which included the zero degree tweets, and edges with low weight. We can observe a few close tweet clusters- at least six.



Tweet Clusters
Since we now have our visual of tweets, and see how they cluster together with various weights, we would like to read the tweets. For example, let us explore the cluster in the very top right of the graphic, consisting of text numbers 105, 177, 145, 152, 68, 89, 88, 55, 104, 174, and 196.


Code:

  1. [code]
  2. # check tweet cluster texts
  3. ukraine.df$text[c(105,177,145,152,68,89,88,55,104,174,196)]
复制代码

[1] "@ericmargolis Is Russia or the US respecting the sovereignty and territorial integrity of #Ukraine as per the 1994 Budapest Memorandum????"      
[2] "Troops on the Ground: U.S. and NATO Plan PSYOPS Teams in #Ukraine - http://t.co/pXP3TR0uwi #LNYHBT #TEAPARTY #WAAR #REDNATION #CCOT #TCOT"      
[3] "US condemns a
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Text Mining Analysis Analysi Ukraine network connected remember together Network network

本帖被以下文库推荐

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-24 17:53