| 所在主题: | |
| 文件名: 使用Spark和R语言进行探索性数据科学.pdf | |
| 资料下载链接地址: https://bbs.pinggu.org/a-2323210.html | |
| 附件大小: | |
|
国外数据科学课程视频&pdf
Enabling Exploratory DataScience with Spark and R (使用Spark和R语言进行探索性数据科学) 演讲嘉宾为Hossein Falaki Hossein Falaki is a software engineer at Databricks working on the next big thing. Prior to that he was a data scientist at Apple’s personal assistant, Siri. He graduated with Ph.D. in Computer Science from UCLA, where he was a member of the Center for Embedded Networked Sensing (CENS). Hossein是Databricks的软件工程师。在Databricks工作之前,Hossein是苹果个人助理Siri的数据科学家。他在加州大学洛杉矶分校的获得计算机科学博士学位,他是嵌入式网络传感中心(CENS)的成员。 Academics My Ph.D. research was focused on making mobile phones smarter networked devices when they were used in health applications. My Ph.D. dissertation is available here. As a Master's student at the University of Waterloo, I was a member of the Tetherless Computing Lab, where I worked on the KioskNet Project with Prof. S. Keshav. I also studied scanning strategies for opportunistic communication over Wi-Fi on mobile devices.(first person) R is a favorite language of many data scientists. In addition to a language and runtime, R is a rich ecosystem of libraries for a wide range of use cases from statistical inference to data visualization. However, handling large datasets with R is challenging, especially when data scientists use R with frameworks or tools written in other languages. In this mode most of the friction is at the interface of R and the other systems. For example, when data is sampled by a big data platform, results need to be transferred to and imported in R as native data structures. In this talk we show how SparkR solves these problems to enable a much smoother experience. In this talk we will present an overview of the SparkR architecture, including how data and control is transferred between R and JVM. This knowledge will help data scientists make better decisions when using SparkR. We will demo and explain some of the existing and supported use cases with real large datasets inside a notebook R语言是许多数据科学家最喜欢的语言之一。除了语言和运行时,R语言具有丰富的生态系统库,可用于从统计推断到数据可视化的各种用途。 然而,使用R语言处理大型数据集是很困难的,特别是当数据科学家使用其他语言编写的框架或工具时。在这种模式下,大多数阻力出现在R语言和其他系统的界面上。例如,当数据被一个大数据平台取样时,需要将结果作为原生数据结构转移、导入到R语言中。 在该讲座中,我们展示了SparkR是如何解决这些问题的,以实现更流畅的体验。当中我们将介绍SparkR架构概况,包括如何在R语言和JVM之间传输数据和控制。这些知识将帮助数据科学家在使用SparkR时做出更好的决策。我们将在一个笔记本环境中演示现有的例子。演示将强调Spark cluster、R和交互式笔记本环境,如Jupyter或Databricks,便于对大数据进行探索性分析。 Enabling Exploratory DataScience with Spark and R 使用Spark和R语言进行探索性数据科学 [视频讲解·中文字幕] 使用Spark和R语言进行探索性数据科学·pdf CDA数据分析研究院团队译制 本讲座选自Spark Summit Europe 2015 |
|
熟悉论坛请点击新手指南
|
|
| 下载说明 | |
|
1、论坛支持迅雷和网际快车等p2p多线程软件下载,请在上面选择下载通道单击右健下载即可。 2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。 3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品,拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知,将积极的采取必要措施;同时,本站也将在技术手段和能力范围内,履行版权保护的注意义务。 (如有侵权,欢迎举报) |
|
京ICP备16021002号-2 京B2-20170662号
京公网安备 11010802022788号
论坛法律顾问:王进律师
知识产权保护声明
免责及隐私声明