搜索
人大经济论坛 附件下载

附件下载

所在主题:
文件名:  Approximate Algorithms in Apache Spark.pdf
资料下载链接地址: https://bbs.pinggu.org/a-2260658.html
附件大小:
247.22 KB   举报本内容
  1. Introduction
  2. Apache Spark is fast, but applications such as preliminary data exploration need to be even faster and are willing to sacrifice some accuracy for a faster result. Since version 1.6, Spark implements approximate algorithms for some common tasks: counting the number of distinct elements in a set, finding if an element belongs to a set, computing some basic statistical information for a large set of numbers. Eugene Zhulenev, from Collective, has already blogged in these pages about the use of approximate counting in the advertising business.

  3. The following algorithms have been implemented against DataFrames and Datasets and committed into Apache Spark’s branch-2.0, so they will be available in Apache Spark 2.0 for Python, R, and Scala:

  4. approxCountDistinct: returns an estimate of the number of distinct elements
  5. approxQuantile: returns approximate percentiles of numerical data
  6. Researchers have looked at such algorithms for a long time. Spark strives at implementing approximate algorithms that are deterministic (they do not depend on random numbers to work) and that have proven theoretical error bounds: for each algorithm, the user can specify a target error bound, and the result is guaranteed to be within this bound, either exactly (deterministic error bounds) or with very high confidence (probabilistic error bounds). Also, it is important that this algorithm works well for the wealth of use cases seen in the Spark community.

  7. In this blog, we are going to present details on the implementation of approxCountDistinct and approxQuantile algorithms and showcase its implementation in a Databricks notebook.
复制代码
[hide][/hide]


    熟悉论坛请点击新手指南
下载说明
1、论坛支持迅雷和网际快车等p2p多线程软件下载,请在上面选择下载通道单击右健下载即可。
2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。
3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品,拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知,将积极的采取必要措施;同时,本站也将在技术手段和能力范围内,履行版权保护的注意义务。
(如有侵权,欢迎举报)
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

GMT+8, 2026-1-12 01:33