Data-Intensive Text Processing with MapReduce.pdf-经管之家资源下载-人大经济论坛

签到
- 苹果/安卓/wp
- 苹果/安卓/wp
客户端
0.0

0.00

经管百科

人大经济论坛 › 附件下载

附件下载


所在主题： [免费书籍]基于MapReduce进行文本挖掘
文件名: Data-Intensive Text Processing with MapReduce.pdf
资料下载链接地址: https://bbs.pinggu.org/a-681210.html
附件大小: 1.71 MB 举报本内容
Data-Intensive Text Processing with MapReduce Authors: Jimmy Lin and Chris Dyer Abstract: Our world is being revolutionized by data-driven methods: access tolarge amounts of data has generated new insights and opened excitingnew opportunities in commerce, science, and computingapplications. Processing the enormous quantities of data necessary forthese advances requires large clusters, making distributed computingparadigms more crucial than ever. MapReduce is a programming model forexpressing distributed computations on massive datasets and anexecution framework for large-scale data processing on clusters ofcommodity servers. The programming model provides aneasy-to-understand abstraction for designing scalable algorithms,while the execution framework transparently handles many system-leveldetails, ranging from scheduling to synchronization to faulttolerance. This book focuses on MapReduce algorithm design, with anemphasis on text processing algorithms common in natural languageprocessing, information retrieval, and machine learning. We introducethe notion of MapReduce design patterns, which represent generalreusable solutions to commonly occurring problems across a variety ofproblem domains. This book not only intends to help the reader "thinkin MapReduce", but also discusses limitations of the programming modelas well. 1 Introduction 1.1 Computing in the Clouds 1.2 Big Ideas 1.3 Why Is This Dierent? 1.4 What This Book Is Not 2 MapReduce Basics 2.1 Functional Programming Roots 2.2 Mappers and Reducers 2.3 The Execution Framework 2.4 Partitioners and Combiners 2.5 The Distributed File System 2.6 Hadoop Cluster Architecture 2.7 Summary 3 MapReduce Algorithm Design 3.1 Local Aggregation 3.1.1 Combiners and In-Mapper Combining 3.1.2 Algorithmic Correctness with Local Aggregation 3.2 Pairs and Stripes 3.3 Computing Relative Frequencies 3.4 Secondary Sorting 3.5 Relational Joins 3.5.1 Reduce-Side Join 3.5.2 Map-Side Join 3.5.3 Memory-Backed Join 3.6 Summary 4 Inverted Indexing for Text Retrieval 4.1 Web Crawling 4.2 Inverted Indexes 4.3 Inverted Indexing: Baseline Implementation 4.4 Inverted Indexing: Revised Implementation 4.5 Index Compression 4.5.1 Byte-Aligned and Word-Aligned Codes 4.5.2 Bit-Aligned Codes 4.5.3 Postings Compression 4.6 What About Retrieval? 4.7 Summary and Additional Readings 5 Graph Algorithms 5.1 Graph Representations 5.2 Parallel Breadth-First Search 5.3 PageRank 5.4 Issues with Graph Processing 5.5 Summary and Additional Readings 6 EM Algorithms for Text Processing 6.1 Expectation Maximization 6.1.1 Maximum Likelihood Estimation 6.1.2 A Latent Variable Marble Game 6.1.3 MLE with Latent Variables 6.1.4 Expectation Maximization 6.1.5 An EM Example 6.2 Hidden Markov Models 6.2.1 Three Questions for Hidden Markov Models 6.2.2 The Forward Algorithm 6.2.3 The Viterbi Algorithm 6.2.4 Parameter Estimation for HMMs 6.2.5 Forward-Backward Training: Summary 6.3 EM in MapReduce 6.3.1 HMM Training in MapReduce 6.4 Case Study: Word Alignment for Statistical Machine Translation 6.4.1 Statistical Phrase-Based Translation 6.4.2 Brief Digression: Language Modeling with MapReduce 6.4.3 Word Alignment 6.4.4 Experiments 6.5 EM-Like Algorithms 6.5.1 Gradient-Based Optimization and Log-Linear Models 6.6 Summary and Additional Readings 7 Closing Remarks 7.1 Limitations of MapReduce 7.2 Alternative Computing Paradigms 7.3 MapReduce and Beyond
熟悉论坛请点击新手指南
下载说明
1、论坛支持迅雷和网际快车等p2p多线程软件下载，请在上面选择下载通道单击右健下载即可。 2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。 3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品，拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知，将积极的采取必要措施；同时，本站也将在技术手段和能力范围内，履行版权保护的注意义务。 (如有侵权，欢迎举报)

二维码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

京ICP备16021002号-2 京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明