请选择 进入手机版 | 继续访问电脑版
楼主: oneforall
5870 6

[mapreduce] [免费书籍]基于MapReduce进行文本挖掘 [推广有奖]

  • 0关注
  • 1粉丝

教授

12%

还不是VIP/贵宾

-

威望
0
论坛币
1522 个
通用积分
0
学术水平
8 点
热心指数
7 点
信用等级
8 点
经验
67518 点
帖子
127
精华
1
在线时间
2136 小时
注册时间
2007-9-7
最后登录
2018-1-3

oneforall 发表于 2010-7-2 19:59:53 |显示全部楼层 |坛友微信交流群
相似文件 换一批

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Data-Intensive Text Processing with MapReduce
Authors: Jimmy Lin and Chris Dyer
Abstract: Our world is being revolutionized by data-driven methods: access tolarge amounts of data has generated new insights and opened excitingnew opportunities in commerce, science, and computingapplications. Processing the enormous quantities of data necessary forthese advances requires large clusters, making distributed computingparadigms more crucial than ever. MapReduce is a programming model forexpressing distributed computations on massive datasets and anexecution framework for large-scale data processing on clusters ofcommodity servers. The programming model provides aneasy-to-understand abstraction for designing scalable algorithms,while the execution framework transparently handles many system-leveldetails, ranging from scheduling to synchronization to faulttolerance. This book focuses on MapReduce algorithm design, with anemphasis on text processing algorithms common in natural languageprocessing, information retrieval, and machine learning. We introducethe notion of MapReduce design patterns, which represent generalreusable solutions to commonly occurring problems across a variety ofproblem domains. This book not only intends to help the reader "thinkin MapReduce", but also discusses limitations of the programming modelas well.

1 Introduction
1.1 Computing in the Clouds
1.2 Big Ideas
1.3 Why Is This Di erent?
1.4 What This Book Is Not

2 MapReduce Basics
2.1 Functional Programming Roots
2.2 Mappers and Reducers
2.3 The Execution Framework
2.4 Partitioners and Combiners
2.5 The Distributed File System
2.6 Hadoop Cluster Architecture
2.7 Summary

3 MapReduce Algorithm Design
3.1 Local Aggregation
3.1.1 Combiners and In-Mapper Combining
3.1.2 Algorithmic Correctness with Local Aggregation
3.2 Pairs and Stripes
3.3 Computing Relative Frequencies
3.4 Secondary Sorting
3.5 Relational Joins
3.5.1 Reduce-Side Join
3.5.2 Map-Side Join
3.5.3 Memory-Backed Join

3.6 Summary

4 Inverted Indexing for Text Retrieval
4.1 Web Crawling
4.2 Inverted Indexes
4.3 Inverted Indexing: Baseline Implementation
4.4 Inverted Indexing: Revised Implementation
4.5 Index Compression
4.5.1 Byte-Aligned and Word-Aligned Codes
4.5.2 Bit-Aligned Codes
4.5.3 Postings Compression
4.6 What About Retrieval?
4.7 Summary and Additional Readings

5 Graph Algorithms
5.1 Graph Representations
5.2 Parallel Breadth-First Search
5.3 PageRank
5.4 Issues with Graph Processing
5.5 Summary and Additional Readings

6 EM Algorithms for Text Processing
6.1 Expectation Maximization
6.1.1 Maximum Likelihood Estimation
6.1.2 A Latent Variable Marble Game
6.1.3 MLE with Latent Variables
6.1.4 Expectation Maximization
6.1.5 An EM Example
6.2 Hidden Markov Models
6.2.1 Three Questions for Hidden Markov Models
6.2.2 The Forward Algorithm
6.2.3 The Viterbi Algorithm

6.2.4 Parameter Estimation for HMMs
6.2.5 Forward-Backward Training: Summary
6.3 EM in MapReduce
6.3.1 HMM Training in MapReduce
6.4 Case Study: Word Alignment for Statistical Machine Translation
6.4.1 Statistical Phrase-Based Translation
6.4.2 Brief Digression: Language Modeling with MapReduce

6.4.3 Word Alignment
6.4.4 Experiments
6.5 EM-Like Algorithms
6.5.1 Gradient-Based Optimization and Log-Linear Models
6.6 Summary and Additional Readings

7 Closing Remarks
7.1 Limitations of MapReduce
7.2 Alternative Computing Paradigms
7.3 MapReduce and Beyond
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:MapReduce reduce 文本挖掘 免费书籍 Pred 挖掘 文本 MapReduce

Data-Intensive Text Processing with MapReduce.pdf

1.71 MB

本帖被以下文库推荐

googya 发表于 2010-7-2 20:11:45 |显示全部楼层 |坛友微信交流群
动作真快啊

使用道具

oneforall 发表于 2010-7-2 20:13:50 |显示全部楼层 |坛友微信交流群
看到有对Cloud Computing和Text Mining有同好的朋友,于是拿出来与大家分享罢了。

使用道具

chenfanwen 发表于 2010-7-3 09:26:40 |显示全部楼层 |坛友微信交流群
好书,基于云计算的数据挖掘

使用道具

charlesy 发表于 2011-6-3 11:44:37 |显示全部楼层 |坛友微信交流群
好书,谢谢楼主分享

使用道具

tianwk 发表于 2019-7-28 15:55:13 |显示全部楼层 |坛友微信交流群
thanks for sharing

使用道具

wangyong8935 在职认证  发表于 2019-11-16 15:29:16 |显示全部楼层 |坛友微信交流群
谢谢分享

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-3-29 13:29