请选择 进入手机版 | 继续访问电脑版
楼主: Nicolle
1769 6

Data Algorithms:Recipes for Scaling Up with Hadoop and Spark [推广有奖]

巨擘

0%

还不是VIP/贵宾

-

TA的文库  其他...

Python(Must-Read Books)

SAS Programming

Must-Read Books

威望
16
论坛币
12402323 个
通用积分
1620.8615
学术水平
3305 点
热心指数
3329 点
信用等级
3095 点
经验
477211 点
帖子
23879
精华
91
在线时间
9878 小时
注册时间
2005-4-23
最后登录
2022-3-6

Nicolle 学生认证  发表于 2015-7-18 07:13:22 |显示全部楼层 |坛友微信交流群
提示: 作者被禁止或删除 内容自动屏蔽

本帖被以下文库推荐

auirzxp 学生认证  发表于 2015-7-18 07:28:30 |显示全部楼层 |坛友微信交流群
1.1 What is a Secondary Sort Problem?

What is a “secondary sorting” problem? “Secondary Sorting Problem” is the problem of sorting values associated with a key in the reduce phase. Sometimes, this is called “value-to-key conversion.” The “secondary sorting” technique will enable us to sort the values (in ascending or descending order) passed to each reducer.


The goal of this chapter is to implement “secondary sort” design-pattern by MapReduce/Hadoop and Spark. In software design and programming, a design pattern is a reusable algorithm (typically, a design pattern is not presented in a specific programming language – but can be implemented by many programming languages) that is a solution to a commonly occurring problem.

MapReduce framework automatically sorts the keys generated by mappers. This means that, before starting reducers all intermediate (key, value) pairs generated by mappers must be sorted by key (and not by value). Values passed to each reducer are not sorted (arbitrarily ordered) at all and they can be in any order. What if we want to sort reducer’s values also? MapReduce/Hadoop and Spark do not sort values for a reducer. For example, for some applications (such as time series data), you want your reducer data to be sorted. Secondary Sort design pattern enable us to sort redcer’s values.

First we focus on MapReduce/Hadoop solution. Let’s look at the MapReduce paradigm and then explain the concept of the Secondary Sort:

map(key1, value1) → list(key2, value2)

reduce(key2, list(value2)) → list(key3, value3)

First, the map() function receives a key-value pair input, (key1, value1). Then it outputs another (any number of them) key-value pair, (key2, value2). Second, the reduce() function receives as input another key-value pair, (key2, list(value2)), and outputs (any number of them) (key3, value3).

Now consider the following key-value pair (key2, list(value2)) as an input for a reducer:

list(value2) = (V1, V2, ..., Vn)

已有 1 人评分论坛币 收起 理由
Nicolle + 20 鼓励积极发帖讨论

总评分: 论坛币 + 20   查看全部评分

使用道具

Nicolle 学生认证  发表于 2015-7-19 11:42:58 |显示全部楼层 |坛友微信交流群
提示: 作者被禁止或删除 内容自动屏蔽

使用道具

Nicolle 学生认证  发表于 2015-7-19 11:46:06 |显示全部楼层 |坛友微信交流群
提示: 作者被禁止或删除 内容自动屏蔽

使用道具

Nicolle 学生认证  发表于 2015-7-19 11:47:55 |显示全部楼层 |坛友微信交流群
提示: 作者被禁止或删除 内容自动屏蔽

使用道具

Nicolle 学生认证  发表于 2015-7-19 11:49:36 |显示全部楼层 |坛友微信交流群
提示: 作者被禁止或删除 内容自动屏蔽

使用道具

Nicolle 学生认证  发表于 2015-7-19 11:53:01 |显示全部楼层 |坛友微信交流群
提示: 作者被禁止或删除 内容自动屏蔽

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-19 09:36