楼主: Lisrelchen
1739 18

【数据科学】Pig Design Patterns [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50288 个
通用积分
83.6306
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2015-4-6 19:57:40 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
  1. Author:Pradeep Pasupuleti
  2. Isbn:978-1-78328-555-6
  3. Year:2014
  4. Pages:310
  5. Language:English
  6. File size:1.9 MB
  7. File format:PDF
  8. Category:Big Data
  9. Book Description:

  10. Pig Design Patterns is a comprehensive guide that will enable readers to readily use design patterns that simplify the creation of complex data pipelines in various stages of data management. This book focuses on using Pig in an enterprise context, bridging the gap between theoretical understanding and practical implementation. Each chapter contains a set of design patterns that pose and then solve technical challenges that are relevant to the enterprise use cases.

  11. The book covers the journey of Big Data from the time it enters the enterprise to its eventual use in analytics, in the form of a report or a predictive model. By the end of the book, readers will appreciate Pig’s real power in addressing each and every problem encountered when creating an analytics-based data product. Each design pattern comes with a suggested solution, analyzing the trade-offs of implementing the solution in a different way, explaining how the code works, and the results.
复制代码

本帖隐藏的内容

Pig Design Patterns.pdf (2.08 MB, 需要: 5 个论坛币)


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Patterns Pattern Design sign 数据科学 management practical creation complex readers

本帖被以下文库推荐

沙发
wyr629(未真实交易用户) 在职认证  发表于 2015-4-7 00:49:43
学习一下

藤椅
meng山楂树(未真实交易用户) 发表于 2015-4-7 07:14:36
已有 1 人评分经验 收起 理由
ReneeBK + 20 鼓励积极发帖讨论

总评分: 经验 + 20   查看全部评分

板凳
Nicolle(未真实交易用户) 学生认证  发表于 2017-1-16 06:38:39
提示: 作者被禁止或删除 内容自动屏蔽

报纸
Nicolle(未真实交易用户) 学生认证  发表于 2017-1-16 06:39:09
提示: 作者被禁止或删除 内容自动屏蔽

地板
Nicolle(未真实交易用户) 学生认证  发表于 2017-1-16 06:39:39
提示: 作者被禁止或删除 内容自动屏蔽

7
Nicolle(未真实交易用户) 学生认证  发表于 2017-1-16 06:40:13
提示: 作者被禁止或删除 内容自动屏蔽

8
ReneeBK(未真实交易用户) 发表于 2017-1-16 06:45:23
  1. The ingress code
  2. The following code performs the task of connecting to MongoDB, setting up the connection, loading the MongoDB native file, parsing it, and retrieving only the specified schema in the MongoLoader constructor by mapping the fields of the MongoDB document with the fields specified in the schema. This abstraction is performed by just one call to the MongoLoader function.

  3. /*
  4. Register the mongo jar files to be able to use MongoLoader UDF
  5. */
  6. REGISTER '/home/cloudera/pdp/jars/mongo.jar';
  7. REGISTER '/home/cloudera/pdp/jars/mongo-hadoop-pig.jar';

  8. /*
  9. Load the data using MongoLoader UDF, it connects to MongoDB, loads the native file and parses it to retrieve only the specified schema.
  10. */
  11. stock_data = LOAD 'mongodb://slave1/nasdaqDB.store_stock' USING com.mongodb.hadoop.pig.MongoLoader('exchange:chararray, stock_symbol:chararray, date:chararray, stock_price_open:float, stock_price_high:float, stock_price_low:float, stock_price_close:float, stock_volume:long, stock_price_adj_close:chararray') AS (exchange,stock_symbol,date,stock_price_open,stock_price_high,stock_price_low,stock_price_close,stock_volume,stock_price_adj_close);

  12. /*
  13. * Some processing logic goes here which is deliberately left out to improve readability
  14. */

  15. /*
  16. Display the contents of the relation stock_data on the console
  17. */
  18. DUMP stock_data;
复制代码

9
ReneeBK(未真实交易用户) 发表于 2017-1-16 06:47:27
The ingress and egress patterns for the MongoDB
  1. The egress code
  2. The following code depicts the writing of data existing in a stock_data Pig relation to a MongoDB document collection:

  3. /*
  4. Register the mongo jar files and piggybank jar to be able to use the UDFs
  5. */
  6. REGISTER '/home/cloudera/pdp/jars/mongo.jar';
  7. REGISTER '/home/cloudera/pdp/jars/mongo_hadoop_pig.jar';
  8. REGISTER '/usr/share/pig/contrib/piggybank/java/piggybank.jar';

  9. /*
  10. Assign the alias MongoStorage to MongoStorage class
  11. */
  12. DEFINE MongoStorage com.mongodb.hadoop.pig.MongoStorage();

  13. /*
  14. Load the contents of files starting with NASDAQ_daily_prices_ into a Pig relation stock_data
  15. */
  16. stock_data= LOAD '/user/cloudera/pdp/datasets/mongo/NASDAQ_daily_prices/NASDAQ_daily_prices_*' USING org.apache.pig.piggybank.storage.CSVLoader() as (exchange:chararray, stock_symbol:chararray, date:chararray, stock_price_open:chararray, stock_price_high:chararray, stock_price_low:chararray, stock_price_close:chararray, stock_volume:chararray, stock_price_adj_close:chararray);

  17. /*
  18. * Some processing logic goes here which is deliberately left out to improve readability
  19. */

  20. /*
  21. Store data to MongoDB by specifying the MongoStorage serializer.  The MongoDB URI nasdaqDB.store_stock is the document collection created to hold this data.
  22. */
  23. STORE stock_data INTO 'mongodb://slave1/nasdaqDB.store_stock' using MongoStorage();
复制代码

10
ReneeBK(未真实交易用户) 发表于 2017-1-16 06:48:33
The HBase ingress and egress pattern
  1. The ingress code
  2. The following code snippet illustrates the ingestion of the HBase data into a Pig relation:

  3. /*
  4. Load data from HBase table retail_transactions, it contains the column families transaction_details, customer_details and product_details.
  5. The : operator is used to access columns in a column family.
  6. First parameter to HBaseStorage is the list of columns and the second parameter is the list of options
  7. The option -loadkey true specifies the rowkey should be loaded as the first item in the tuple, -limit 500 specifies the number of rows to be read from the HBase table
  8. */
  9. transactions = LOAD 'hbase://retail_transactions'
  10.   USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  11.   'transaction_details:transaction_date customer_details:customer_id customer_details:age customer_details:residence_area product_details:product_subclass product_details:product_id product_details:amount product_details:asset product_details:sales_price', '-loadKey true -limit 500')
  12.   AS (id: bytearray, transaction_date: chararray, customer_id: int, age: chararray, residence_area: chararray, product_subclass: int, product_id: long, amount: int, asset: int, sales_price: int);

  13. /*
  14. * Some processing logic goes here which is deliberately left out to improve readability
  15. */

  16. -- Display the contents of the relation transactions on the console
  17. DUMP transactions;
复制代码

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-4 12:30