【数据科学】Pig Design Patterns

0关注
62粉丝

VIP

已卖：4196份资源

院士

67%

还不是VIP/贵宾

-

TA的文库 其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

0%

威望: 0 级
论坛币: 50294 个
通用积分: 83.8106
学术水平: 253 点
热心指数: 300 点
信用等级: 208 点
经验: 41518 点
帖子: 3256
精华: 14
在线时间: 766 小时
注册时间: 2006-5-4
最后登录: 2022-11-6

楼主

Lisrelchen 发表于 2015-4-6 19:57:40 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Author:Pradeep Pasupuleti
Isbn:978-1-78328-555-6
Year:2014
Pages:310
Language:English
File size:1.9 MB
File format:PDF
Category:Big Data
Book Description:
Pig Design Patterns is a comprehensive guide that will enable readers to readily use design patterns that simplify the creation of complex data pipelines in various stages of data management. This book focuses on using Pig in an enterprise context, bridging the gap between theoretical understanding and practical implementation. Each chapter contains a set of design patterns that pose and then solve technical challenges that are relevant to the enterprise use cases.
The book covers the journey of Big Data from the time it enters the enterprise to its eventual use in analytics, in the form of a report or a predictive model. By the end of the book, readers will appreciate Pig’s real power in addressing each and every problem encountered when creating an analytics-based data product. Each design pattern comes with a suggested solution, analyzing the trade-offs of implementing the solution in a different way, explaining how the code works, and the results.

复制代码

本帖隐藏的内容

Pig Design Patterns.pdf (2.08 MB, 需要: 5 个论坛币)

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Patterns Pattern Design sign 数据科学 management practical creation complex readers

本帖被以下文库推荐

· Data Science NewOccidental|主题: 1233, 订阅: 120

沙发

wyr629(未真实交易用户)

发表于 2015-4-7 00:49:43

学习一下

藤椅

meng山楂树(未真实交易用户) 发表于 2015-4-7 07:14:36

已有 1 人评分	经验	收起理由
ReneeBK	+ 20	鼓励积极发帖讨论

总评分: 经验 + 20 查看全部评分

加关注串个门加好友发消息 0关注 463 粉丝巨擘 Nicolle 当前离线阅读权限 255 威望 16 级论坛币 12403159 个通用积分 1639.2132 学术水平 3305 点热心指数 3329 点信用等级 3095 点经验 476993 点帖子 23839 精华 91 在线时间 9878 小时注册时间 2005-4-23 最后登录 2022-3-6 雷达卡	板凳 Nicolle(未真实交易用户) 发表于 2017-1-16 06:38:39 提示: 作者被禁止或删除内容自动屏蔽

	回复举报

加关注串个门加好友发消息 0关注 463 粉丝巨擘 Nicolle 当前离线阅读权限 255 威望 16 级论坛币 12403159 个通用积分 1639.2132 学术水平 3305 点热心指数 3329 点信用等级 3095 点经验 476993 点帖子 23839 精华 91 在线时间 9878 小时注册时间 2005-4-23 最后登录 2022-3-6 雷达卡	报纸 Nicolle(未真实交易用户) 发表于 2017-1-16 06:39:09 提示: 作者被禁止或删除内容自动屏蔽

	回复举报

加关注串个门加好友发消息 0关注 463 粉丝巨擘 Nicolle 当前离线阅读权限 255 威望 16 级论坛币 12403159 个通用积分 1639.2132 学术水平 3305 点热心指数 3329 点信用等级 3095 点经验 476993 点帖子 23839 精华 91 在线时间 9878 小时注册时间 2005-4-23 最后登录 2022-3-6 雷达卡	地板 Nicolle(未真实交易用户) 发表于 2017-1-16 06:39:39 提示: 作者被禁止或删除内容自动屏蔽

	回复举报

加关注串个门加好友发消息 0关注 463 粉丝巨擘 Nicolle 当前离线阅读权限 255 威望 16 级论坛币 12403159 个通用积分 1639.2132 学术水平 3305 点热心指数 3329 点信用等级 3095 点经验 476993 点帖子 23839 精华 91 在线时间 9878 小时注册时间 2005-4-23 最后登录 2022-3-6 雷达卡	7楼 Nicolle(未真实交易用户) 发表于 2017-1-16 06:40:13 提示: 作者被禁止或删除内容自动屏蔽

	回复举报

8楼

ReneeBK(未真实交易用户) 发表于 2017-1-16 06:45:23

The ingress code
The following code performs the task of connecting to MongoDB, setting up the connection, loading the MongoDB native file, parsing it, and retrieving only the specified schema in the MongoLoader constructor by mapping the fields of the MongoDB document with the fields specified in the schema. This abstraction is performed by just one call to the MongoLoader function.
/*
Register the mongo jar files to be able to use MongoLoader UDF
*/
REGISTER '/home/cloudera/pdp/jars/mongo.jar';
REGISTER '/home/cloudera/pdp/jars/mongo-hadoop-pig.jar';
/*
Load the data using MongoLoader UDF, it connects to MongoDB, loads the native file and parses it to retrieve only the specified schema.
*/
stock_data = LOAD 'mongodb://slave1/nasdaqDB.store_stock' USING com.mongodb.hadoop.pig.MongoLoader('exchange:chararray, stock_symbol:chararray, date:chararray, stock_price_open:float, stock_price_high:float, stock_price_low:float, stock_price_close:float, stock_volume:long, stock_price_adj_close:chararray') AS (exchange,stock_symbol,date,stock_price_open,stock_price_high,stock_price_low,stock_price_close,stock_volume,stock_price_adj_close);
/*
* Some processing logic goes here which is deliberately left out to improve readability
*/
/*
Display the contents of the relation stock_data on the console
*/
DUMP stock_data;

复制代码

9楼

ReneeBK(未真实交易用户) 发表于 2017-1-16 06:47:27

The ingress and egress patterns for the MongoDB

The egress code
The following code depicts the writing of data existing in a stock_data Pig relation to a MongoDB document collection:
/*
Register the mongo jar files and piggybank jar to be able to use the UDFs
*/
REGISTER '/home/cloudera/pdp/jars/mongo.jar';
REGISTER '/home/cloudera/pdp/jars/mongo_hadoop_pig.jar';
REGISTER '/usr/share/pig/contrib/piggybank/java/piggybank.jar';
/*
Assign the alias MongoStorage to MongoStorage class
*/
DEFINE MongoStorage com.mongodb.hadoop.pig.MongoStorage();
/*
Load the contents of files starting with NASDAQ_daily_prices_ into a Pig relation stock_data
*/
stock_data= LOAD '/user/cloudera/pdp/datasets/mongo/NASDAQ_daily_prices/NASDAQ_daily_prices_*' USING org.apache.pig.piggybank.storage.CSVLoader() as (exchange:chararray, stock_symbol:chararray, date:chararray, stock_price_open:chararray, stock_price_high:chararray, stock_price_low:chararray, stock_price_close:chararray, stock_volume:chararray, stock_price_adj_close:chararray);
/*
* Some processing logic goes here which is deliberately left out to improve readability
*/
/*
Store data to MongoDB by specifying the MongoStorage serializer. The MongoDB URI nasdaqDB.store_stock is the document collection created to hold this data.
*/
STORE stock_data INTO 'mongodb://slave1/nasdaqDB.store_stock' using MongoStorage();

复制代码

10楼

ReneeBK(未真实交易用户) 发表于 2017-1-16 06:48:33

The HBase ingress and egress pattern

The ingress code
The following code snippet illustrates the ingestion of the HBase data into a Pig relation:
/*
Load data from HBase table retail_transactions, it contains the column families transaction_details, customer_details and product_details.
The : operator is used to access columns in a column family.
First parameter to HBaseStorage is the list of columns and the second parameter is the list of options
The option -loadkey true specifies the rowkey should be loaded as the first item in the tuple, -limit 500 specifies the number of rows to be read from the HBase table
*/
transactions = LOAD 'hbase://retail_transactions'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'transaction_details:transaction_date customer_details:customer_id customer_details:age customer_details:residence_area product_details:product_subclass product_details:product_id product_details:amount product_details:asset product_details:sales_price', '-loadKey true -limit 500')
AS (id: bytearray, transaction_date: chararray, customer_id: int, age: chararray, residence_area: chararray, product_subclass: int, product_id: long, amount: int, asset: int, sales_price: int);
/*
* Some processing logic goes here which is deliberately left out to improve readability
*/
-- Display the contents of the relation transactions on the console
DUMP transactions;

复制代码

【数据科学】Pig Design Patterns [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

【数据科学】Pig Design Patterns [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我 拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群