楼主: ReneeBK
793 3

Subqueries in Apache Spark 2.0 [推广有奖]

  • 1关注
  • 62粉丝

VIP

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49407 个
通用积分
51.8704
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57815 点
帖子
4006
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
  1. SQL Subqueries in Apache Spark 2.0
  2. Hands-on examples of scalar and predicate type of subqueries
  3. Davies LiuHerman van Hövell by Davies Liu and Herman van Hövell Posted in ENGINEERING BLOG June 17, 2016
  4. Try this notebook in Databricks
  5. In the upcoming Apache Spark 2.0 release, we have substantially expanded the SQL standard capabilities. In this brief blog post, we will introduce subqueries in Apache Spark 2.0, including their limitations, potential pitfalls and future expansions, and through a notebook, we will explore both the scalar and predicate type of subqueries, with short examples that you can try yourself.

  6. A subquery is a query that is nested inside of another query. A subquery as a source (inside a SQL FROM clause) is technically also a subquery, but it is beyond the scope of this post. There are basically two kinds of subqueries: scalar and predicate subqueries. And within scalar and predicate queries, there are uncorrelated scalar and correlated scalar queries and nested predicate queries respectively.

  7. For brevity, we will let you jump and explore the notebook, which is more an interactive experience rather than an exposition here in the blog. Click on this diagram below to view and explore the subquery notebook with Apache Spark 2.0 preview on Databricks.
复制代码


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Apache Spark Queries apache Spark Park

沙发
ReneeBK 发表于 2017-5-28 03:04:33 |只看作者 |坛友微信交流群
  1. %scala
  2. import org.apache.spark.sql.functions._
  3. val employee = spark.range(0, 10).select($"id".as("employee_id"), (rand() * 3).cast("int").as("dep_id"), (rand() * 40 + 20).cast("int").as("age"))
  4. val visit = spark.range(0, 100).select($"id".as("visit_id"), when(rand() < 0.95, ($"id" % 8)).as("employee_id"))
  5. val appointment = spark.range(0, 100).select($"id".as("appointment_id"), when(rand() < 0.95, ($"id" % 7)).as("employee_id"))
  6. employee.createOrReplaceTempView("employee")
  7. visit.createOrReplaceTempView("visit")
  8. appointment.createOrReplaceTempView("appointment")
复制代码

使用道具

藤椅
ReneeBK 发表于 2017-5-28 03:05:03 |只看作者 |坛友微信交流群
  1. Uncorrelated Scalar Subqueries
  2. An uncorrelated subquery returns the same single value for all records in a query. Uncorrelated subqueries are executed by the Spark enging before the main query is executed. The SQL below shows an example of an uncorrelated scalar subquery, here we add the maximum age in table employee to the select.
  3. %sql
  4. SELECT  employee_id,
  5.         age,
  6.         (SELECT MAX(age) FROM employee) max_age
  7. FROM    employee
复制代码

使用道具

板凳
ReneeBK 发表于 2017-5-28 03:05:37 |只看作者 |坛友微信交流群
  1. Correlated Scalar Subqueries
  2. Subqueries can be correlated, this means that the subquery contains references to the outer query. These outer references are typically used in filter clauses (SQL WHERE clause). Spark 2.0 currently only supports this case. The SQL below shows an example of a correlated scalar subquery, here we add the maximum age in an employee’s department to the select list using A.dep_id = B.dep_id as the correlated condition.
  3. %sql
  4. SELECT   A.dep_id,
  5.          A.employee_id,
  6.          A.age,
  7.          (SELECT  MAX(age) FROM employee B WHERE A.dep_id = B.dep_id) max_age
  8. FROM     employee A
  9. ORDER BY 1,2
复制代码

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-27 12:35