请选择 进入手机版 | 继续访问电脑版
楼主: Lisrelchen
2772 20

【Rindra Ramamonjison】Apache Spark Graph Processing [推广有奖]

Lisrelchen 发表于 2017-3-6 05:40:01 |显示全部楼层 |坛友微信交流群
In-degree and out-degree of the Enron email network

  1. For the Enron email network, we can confirm that there are roughly ten times more links than nodes:

  2. scala> emailGraph.numEdges

  3. scala> emailGraph.numVertices

  4. scala> emailGraph.inDegrees.map(_._2).sum / emailGraph.numVertices

  5. scala> emailGraph.outDegrees.map(_._2).sum / emailGraph.numVertices

  6. def max(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = {
  7.   if (a._2 > b._2) a else b
  8. }


  9. scala> emailGraph.outDegrees.reduce(max)


  10. scala> emailGraph.outDegrees.filter(_._2 <= 1).count
复制代码

使用道具

Lisrelchen 发表于 2017-3-6 05:44:10 |显示全部楼层 |坛友微信交流群
Degrees in the bipartite food network
  1. For the bipartite ingredient-compound graph, we can also explore which food has the largest number of compounds, or which compound is the most prevalent in our list of ingredients:

  2. scala> foodNetwork.outDegrees.reduce(max)
  3. res: (org.apache.spark.graphx.VertexId, Int) = (908,239)

  4. scala> foodNetwork.vertices.filter(_._1 == 908).collect()
  5. res: Array[(org.apache.spark.graphx.VertexId, FNNode)] = Array((908,Ingredient(black_tea,plant derivative)))

  6. scala> foodNetwork.inDegrees.reduce(max)
  7. res: (org.apache.spark.graphx.VertexId, Int) = (10292,299)

  8. scala> foodNetwork.vertices.filter(_._1 == 10292).collect()
  9. res: Array[(org.apache.spark.graphx.VertexId, FNNode)] = Array((10292,Compound(1-octanol,111-87-5)))
复制代码

使用道具

Lisrelchen 发表于 2017-3-6 05:45:01 |显示全部楼层 |坛友微信交流群
Degree histogram of the social ego networks
  1. Similarly, we can compute the degrees of the connections in the ego network. Let's look at the maximum and minimum degrees in the network:

  2. scala> egoNetwork.degrees.reduce(max)
  3. res91: (org.apache.spark.graphx.VertexId, Int) = (1643293729,1084)

  4. scala> egoNetwork.degrees.reduce(min)
  5. res92: (org.apache.spark.graphx.VertexId, Int) = (550756674,1)
复制代码

使用道具

Lisrelchen 发表于 2017-3-6 05:49:40 |显示全部楼层 |坛友微信交流群
Visualizing the graph data
  1. 1.Import the GraphStream classes with the following:
  2. scala> import org.graphstream.graph.{Graph => GraphStream}
  3. scala> import org.graphstream.graph.implementations._

  4. 2.Load the social ego network data using Graph.fromEdges, as we did in the previous chapter. After that, we will create a SingleGraph object:

  5. // Create a SingleGraph class for GraphStream visualization
  6. val graph: SingleGraph = new SingleGraph("EgoSocial")

  7. node {
  8.     fill-color: #a1d99b;
  9.     size: 20px;
  10.     text-size: 12;
  11.     text-alignment: at-right;
  12.     text-padding: 2;
  13.     text-background-color: #fff7bc;
  14. }
  15. edge {
  16.     shape: cubic-curve;
  17.     fill-color: #dd1c77;
  18.     z-index: 0;
  19.     text-background-mode: rounded-box;
  20.     text-background-color: #fff7bc;
  21.     text-alignment: above;
  22.     text-padding: 2;
  23. }

  24. 3.Connect it to the SingleGraph object graph:

  25. // Set up the visual attributes for graph visualization
  26. graph.addAttribute("ui.stylesheet","url(file:.//style/stylesheet)")
  27. graph.addAttribute("ui.quality")
  28. graph.addAttribute("ui.antialias")
  29. In the last two lines, we simply informed the rendering engine to favor quality instead of speed. Next, we have to reload the graph that we built in the previous chapter. To avoid repetitions, we omit the graph building part. After this, we now load VertexRDD and EdgeRDD of the social network into the GraphStream graph object, with the following code:

  30. // Given the egoNetwork, load the graphX vertices into GraphStream
  31. for ((id,_) <- egoNetwork.vertices.collect()) {
  32. val node = graph.addNode(id.toString).asInstanceOf[SingleNode]
  33. }
  34. // Load the graphX edges into GraphStream edges
  35. for (Edge(x,y,_) <- egoNetwork.edges.collect()) {
  36. val edge = graph.addEdge(x.toString ++ y.toString, x.toString, y.toString,
  37. true).
  38.      asInstanceOf[AbstractEdge]
  39. }
复制代码

使用道具

Lisrelchen 发表于 2017-3-6 05:54:04 |显示全部楼层 |坛友微信交流群
Plotting the degree distribution
  1. 1.Import some classes from JFreeChart and Breeze:

  2. import org.jfree.chart.axis.ValueAxis
  3. import breeze.linalg._
  4. import breeze.plot._
  5. 2.Define degreeHistogram function.
  6. def degreeHistogram(net: Graph[Int, Int]): Array[(Int, Int)] =  
  7.     net.degrees.map(t => (t._2,t._1)).
  8.           groupByKey.map(t => (t._1,t._2.size)).
  9.           sortBy(_._1).collect()
  10. 3.Normalize the node degrees by the total number of nodes, so that the degree probabilities add up to one:

  11. val nn = egoNetwork.numVertices
  12. val egoDegreeDistribution = degreeHistogram(egoNetwork).map({case (d,n) => (d,n.toDouble/nn)})
  13. 4.Display the degree distribution
  14. val f = Figure()
  15. val p1 = f.subplot(2,1,0)
  16. val x = new DenseVector(egoDegreeDistribution map (_._1.toDouble))
  17. val y = new DenseVector(egoDegreeDistribution map (_._2))
  18. p1.xlabel = "Degrees"
  19. p1.ylabel = "Distribution"
  20. p1 += plot(x, y)
  21. p1.title = "Degree distribution of social ego network"
  22. val p2 = f.subplot(2,1,1)
  23. val egoDegrees = egoNetwork.degrees.map(_._2).collect()

  24. p1.xlabel = "Degrees"
  25. p1.ylabel = "Histogram of node degrees"
  26. p2 += hist(egoDegrees, 10)
复制代码

使用道具

kkkm_db 发表于 2017-3-6 07:37:11 |显示全部楼层 |坛友微信交流群
谢谢分享

使用道具

谢谢分享

使用道具

谢谢分享

使用道具

MouJack007 发表于 2017-3-6 13:33:05 |显示全部楼层 |坛友微信交流群
谢谢楼主分享!

使用道具

MouJack007 发表于 2017-3-6 13:33:26 |显示全部楼层 |坛友微信交流群

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-19 09:03