不久之前,我曾经做过一个关于冰与火之歌人物关系的图谱分析。在这个分析中,我发现史塔克家族(尤其是Ned 和三傻[皮一下很开心])和兰尼斯特家族(尤其是Tyrion),是权利的游戏中,最主要的家族连接点。他们连接着很多故事线,也是整个故事的中心点。
在之前的 PO 文中,我使用了 igraphl 来描绘和计算关系矩阵。
但是现在有两个更好的包可以完成整个关系分析: tidyverse: tidygraph 和 ggraph。
所以这篇文章将使用这两个包来制作 冰与火之歌/权利的游戏 的任务关系分析图。(内容基于冰与火这个著作而非电视节目)
社会关系分析 或者社会网络分析能带来什么?
关系分析能发现和挖掘社会或者专业领域社交网络中的关系。我们通常会问:
- 每个网络中的人(节点)有多少关系连接(边)?
- 谁是连接数最多(最重要,最有影响力)的人?
- 紧密联系的人是否导致大的集群的存在?
- 是否有关键人物在集群之间拥有重要作用?
这些问题的答案通常可以帮助我们理解,人类是怎么在社会中交流和互动。
所以,我们怎么才能找到在网络中最重要的角色?简单来说,当一个人拥有最多关系或最多与之相连的人的时候,其重要性不言则明。同时也有一些其他的属性可以帮助我们寻找这些关键人物,例如节点中心度(node centrality).
冰与火之歌角色关系图
- library(readr) # fast reading of csv files
- library(tidyverse) # tidy data analysis
- library(tidygraph) # tidy graph analysis
- library(ggraph) # for plotting
数据
数据来源于 Github Repository,Andrew Beveridge:
Character Interaction Networks for George R. R. Martin’s “A Song of Ice and Fire” saga These networks were created by connecting two characters whenever their names (or nicknames) appeared within 15 words of one another in one of the books in “A Song of Ice and Fire.” The edge weight corresponds to the number of interactions. You can use this data to explore the dynamics of the Seven Kingdoms using network science techniques. For example, community detection finds coherent plotlines. Centrality measures uncover the multiple ways in which characters play important roles in the saga.
Andrew 已经做过一个关于冰与火之歌角色关系的分析,如果你感兴趣,也可以去他的网站浏览他的结论[color=rgb(63, 63, 63) !important]https://networkofthrones.wordpress.com
这里我不想复制他的分析,而是希望展示如何使用 tidygraph 和 ggraph。所以我并不会完全使用他的数据。
- path <- "/Users/shiringlander/Documents/Github/Data/asoiaf/data/"
- files <- list.files(path = path, full.names = TRUE)
- files
- ## [1] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-all-edges.csv"
- ## [2] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-all-nodes.csv"
- ## [3] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-book1-edges.csv"
- ## [4] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-book1-nodes.csv"
- ## [5] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-book2-edges.csv"
- ## [6] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-book2-nodes.csv"
- ## [7] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-book3-edges.csv"
- ## [8] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-book3-nodes.csv"
- ## [9] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-book4-edges.csv"
- ## [10] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-book4-nodes.csv"
- ## [11] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-book45-edges.csv"
- ## [12] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-book45-nodes.csv"
- ## [13] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-book5-edges.csv"
- ## [14] "/Users/shiringlander/Documents/Github/Data/asoiaf/data//asoiaf-book5-nodes.csv"
全书角色
首先要使用的是全书角色互动关系的数据。这里我不打算使用节点的数据,因为我发现关系(edge)的名称标记已经足够用来标注。当然,如果你希望使用更好的名词标识,可以使用节点数据。
- cooc_all_edges <- read_csv(files[1])
因为书中有太多角色,而且很多都是小角色,所以我抽出前100位互动关系较多的角色。关系都是无向关系,所以没有冗余的Source-Target combination
- main_ch <- cooc_all_edges %>%
- select(-Type) %>%
- gather(x, name, Source:Target) %>%
- group_by(name) %>%
- summarise(sum_weight = sum(weight)) %>%
- ungroup()
- main_ch_l <- main_ch %>%
- arrange(desc(sum_weight)) %>%
- top_n(100, sum_weight)
- main_ch_l
- ## # A tibble: 100 x 2
- ## name sum_weight
- ##
- ## 1 Tyrion-Lannister 2873
- ## 2 Jon-Snow 2757
- ## 3 Cersei-Lannister 2232
- ## 4 Joffrey-Baratheon 1762
- ## 5 Eddard-Stark 1649
- ## 6 Daenerys-Targaryen 1608
- ## 7 Jaime-Lannister 1569
- ## 8 Sansa-Stark 1547
- ## 9 Bran-Stark 1508
- ## 10 Robert-Baratheon 1488
- ## # ... with 90 more rows
- cooc_all_f <- cooc_all_edges %>%
- filter(Source %in% main_ch_l$name & Target %in% main_ch_l$name)
tidygraph 和 ggraph
两个工具包都来自于 Thomas Lin Pedersen:
With tidygraph I set out to make it easier to get your data into a graph and perform common transformations on it, but the aim has expanded since its inception. The goal of tidygraph is to empower the user to formulate complex questions regarding relational data as simple steps, thus enabling them to retrieve insights directly from the data itself. The central idea this all boils down to is this: you don’t have to plot a network to understand it. While I absolutely love the field of network visualisation, it is in many ways overused in data science — especially when it comes to extracting knowledge from a network. Just as you don’t need a plot to tell you which car in a dataset is the fastest, you don’t need a plot to tell you which pair of friends are the closest. What you do need, instead of a plot, is a tool that allow you to formulate your question into a logic sequence of operations. For many people in the world of rectangular data, this tool is increasingly dplyr (and friends), and I do hope that tidygraph can take on the same role in the world of relational data. https://www.data-imaginist.com/2018/tidygraph-1-1-a-tidy-hope/
首先,将边的表格转换为 tbl_graph 格式。这里使用 tidygrpah 中的as_tbl_graph()函数,其可以输入 data.frame,matrix,dendrogram,igraph,etc.
Underneath the hood of tidygraph lies the well-oiled machinery of igraph, ensuring efficient graph manipulation. Rather than keeping the node and edge data in a list and creating igraph objects on the fly when needed, tidygraph subclasses igraph with the tbl_graph class and simply exposes it in a tidy manner. This ensures that all your beloved algorithms that expects igraph objects still works with tbl_graph objects. Further, tidygraph is very careful not to override any of igraphs exports so the two packages can coexist quite happily. https://www.data-imaginist.com/2018/tidygraph-1-1-a-tidy-hope/
- as_tbl_graph(cooc_all_f, directed = FALSE) %>%
- activate(nodes) %>%
- mutate(n_rank_trv = node_rank_traveller()) %>%
- arrange(n_rank_trv)
- ## # A tbl_graph: 100 nodes and 798 edges
- ## #
- ## # An undirected simple graph with 1 component
- ## #
- ## # Node Data: 100 x 2 (active)
- ## name n_rank_trv
- ##
- ## 1 Janos-Slynt 1
- ## 2 Aemon-Targaryen-(Maester-Aemon) 2
- ## 3 Jeor-Mormont 3
- ## 4 Samwell-Tarly 4
- ## 5 Qhorin-Halfhand 5
- ## 6 Ygritte 6
- ## # ... with 94 more rows
- ## #
- ## # Edge Data: 798 x 5
- ## from to Type id weight
- ##
- ## 1 2 75 Undirected 43 7
- ## 2 2 76 Undirected 44 4
- ## 3 2 73 Undirected 52 3
- ## # ... with 795 more rows
Centrality 中心度
中心度用来表示节点入度和出度的数量。高度中心化的网络中,且之有较少的节点拥有较大数量的边。低度中心化的网络中拥有较多的节点,同时节点度相对小而平均。而节点中心度衡量了节点在网络中的重要程度。
This version adds 19(!) new ways to define the notion of centrality along with a manual version where you can mix and match different distance measures and summation strategies opening up the world to even more centrality scores. All of this wealth of centrality comes from the netrankr package that provides a framework for defining and calculating centrality scores. If you use centrality measures somewhere in your analysis I cannot recommend the vignettes provided by netrankr enough as they provide a fundamental intuition about the nature of such measures and how they can/should be used. https://www.data-imaginist.com/2018/tidygraph-1-1-a-tidy-hope/
- ## # A tbl_graph: 100 nodes and 798 edges
- ## #
- ## # An undirected simple graph with 1 component
- ## #
- ## # Node Data: 100 x 2 (active)
- ## name neighbors
- ##
- ## 1 Tyrion-Lannister 54.
- ## 2 Cersei-Lannister 49.
- ## 3 Joffrey-Baratheon 49.
- ## 4 Robert-Baratheon 47.
- ## 5 Jaime-Lannister 45.
- ## 6 Sansa-Stark 44.
- ## # ... with 94 more rows
- ## #
- ## # Edge Data: 798 x 5
- ## from to Type id weight
- ##
- ## 1 41 42 Undirected 43 7
- ## 2 41 60 Undirected 44 4
- ## 3 41 63 Undirected 52 3
- ## # ... with 795 more rows


雷达卡





京公网安备 11010802022788号







