楼主: ReneeBK
3087 11

[GitHub]Taming Big Data with Apache Spark and Python [推广有奖]

11
ReneeBK 发表于 2017-7-7 04:08:09
  1. from pyspark import SparkConf, SparkContext

  2. conf = SparkConf().setMaster("local").setAppName("SpendByCustomerSorted")
  3. sc = SparkContext(conf = conf)

  4. def extractCustomerPricePairs(line):
  5.     fields = line.split(',')
  6.     return (int(fields[0]), float(fields[2]))

  7. input = sc.textFile("file:///sparkcourse/customer-orders.csv")
  8. mappedInput = input.map(extractCustomerPricePairs)
  9. totalByCustomer = mappedInput.reduceByKey(lambda x, y: x + y)

  10. #Changed for Python 3 compatibility:
  11. #flipped = totalByCustomer.map(lambda (x,y):(y,x))
  12. flipped = totalByCustomer.map(lambda x: (x[1], x[0]))

  13. totalByCustomerSorted = flipped.sortByKey()

  14. results = totalByCustomerSorted.collect();
  15. for result in results:
  16.     print(result)
复制代码

12
snow_boy 发表于 2017-7-7 10:54:01
ok
ok
ok

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-22 09:25