[英文文献] Large-Scale Graph Biconnectivity in MapReduce-MapReduce中大规模图的双nectivity [推广有奖]

0关注
0粉丝

等待验证会员

学前班

还不是VIP/贵宾

威望: 0 级
论坛币: 0 个
通用积分: 0
学术水平: 0 点
热心指数: 0 点
信用等级: 0 点
经验: 10 点
帖子: 0
精华: 0
在线时间: 0 小时
注册时间: 2020-9-21
最后登录: 2020-9-21

楼主

狗狗币598 发表于 2005-6-16 19:33:28 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

英文文献：Large-Scale Graph Biconnectivity in MapReduce-MapReduce中大规模图的双nectivity
英文文献作者：Giorgio Ausiello,Donatella Firmani,Luigi Laura,Emanuele Paracone
英文文献摘要：
The MapReduce framework, originally proposed by Google, and its open source implementation, Hadoop, are nowadays considered the standard frameworks, both in industry and academia, to deal with petabyte scale datasets. In this paper we describe a two-rounds MapReduce approach to biconnectivity in undirected graphs, that is the computation of the set of articulation points, the set of bridges and the set of biconnected components of a graph G. We recall that an articulation point (resp. a bridge) is a vertex (resp. an edge) whose removal increases the number of connected components. A biconnected component is a maximal biconnected subgraph, i.e., it does not include neither articulation points nor bridges. In order to minimize the communication cost, the algorithm is based on a summary of the input data set, that is a compact data structure from which queries on biconnectivity properties can be answered. This summary, called navigational sketch, was originally designed in the data streams framework and was implicitly proved to be incrementally maintainable.Here we define it in a different framework in order to prove that it is mergeable . Mergeability is the key property of summaries in distributed or parallel computation: in particular, it provides a way to split the computation of the summary across separate subsets of the original data set, and thus to exploit the parallelism of the MapReduce framework. We finally discuss a scenario in which it is assumed that the machines have limited memory, showing tradeoffs between the memory available and the number of rounds of the algorithm. We conclude the paper with an experimental analysis that, on the basis of different executions of an Hadoop implementation of the algorithm against large-scale real world graphs, confirms the effectiveness of our approach..

最初由谷歌提出的MapReduce框架和它的开源实现Hadoop，现在在工业和学术界都被认为是处理pb规模数据集的标准框架。本文描述了一种求解无向图双连通的两轮MapReduce方法，即图g中连接点集、桥集和双连通分量集的计算方法。桥)是一个顶点(resp。边)其去除增加连接的部件的数量。一个双连通部件是一个最大的双连通子图，即它既不包括连接点也不包括桥。为了使通信成本最小化，该算法基于输入数据集的汇总，这是一种紧凑的数据结构，可以从该数据结构中回答有关双nectivity属性的查询。这个摘要称为导航草图，最初是在数据流框架中设计的，并被隐式地证明具有递增性可维护性。这里我们用不同的框架来定义它，以证明它是可合并的。在分布式或并行计算中，可合并性是摘要的关键属性:特别是，它提供了一种将摘要的计算拆分到原始数据集的不同子集的方法，从而利用MapReduce框架的并行性。最后我们讨论了一个假设机器内存有限的场景，展示了可用内存和算法轮数之间的权衡。在对大规模真实世界图的Hadoop实现的不同执行情况的基础上，我们通过实验分析来总结本文，验证了我们的方法的有效性。