- 阅读权限
- 255
- 威望
- 1 级
- 论坛币
- 49655 个
- 通用积分
- 55.9937
- 学术水平
- 370 点
- 热心指数
- 273 点
- 信用等级
- 335 点
- 经验
- 57805 点
- 帖子
- 4005
- 精华
- 21
- 在线时间
- 582 小时
- 注册时间
- 2005-5-8
- 最后登录
- 2023-11-26
|
PageRank
- A more complex pattern of data sharing occurs in
- PageRank [6]. The algorithm iteratively updates a rank
- for each document by adding up contributions from documents
- that link to it. On each iteration, each document
- sends a contribution of r
- n
- to its neighbors, where r is its
- rank and n is its number of neighbors. It then updates
- its rank to α/N + (1 − α)∑ci
- , where the sum is over
- the contributions it received and N is the total number of
- documents. We can write PageRank in Spark as follows:
- // Load graph as an RDD of (URL, outlinks) pairs
- val links = spark.textFile(...).map(...).persist()
- var ranks = // RDD of (URL, rank) pairs
- for (i <- 1 to ITERATIONS) {
- // Build an RDD of (targetURL, float) pairs
- // with the contributions sent by each page
- val contribs = links.join(ranks).flatMap {
- (url, (links, rank)) =>
- links.map(dest => (dest, rank/links.size))
- }
- // Sum contributions by URL and get new ranks
- ranks = contribs.reduceByKey((x,y) => x+y)
- .mapValues(sum => a/N + (1-a)*sum)
- }
复制代码
|
|