- 阅读权限
- 255
- 威望
- 1 级
- 论坛币
- 49655 个
- 通用积分
- 55.9937
- 学术水平
- 370 点
- 热心指数
- 273 点
- 信用等级
- 335 点
- 经验
- 57805 点
- 帖子
- 4005
- 精华
- 21
- 在线时间
- 582 小时
- 注册时间
- 2005-5-8
- 最后登录
- 2023-11-26
|
|
- from pyspark import SparkConf, SparkContext
- conf = SparkConf().setMaster("local").setAppName("SpendByCustomerSorted")
- sc = SparkContext(conf = conf)
- def extractCustomerPricePairs(line):
- fields = line.split(',')
- return (int(fields[0]), float(fields[2]))
- input = sc.textFile("file:///sparkcourse/customer-orders.csv")
- mappedInput = input.map(extractCustomerPricePairs)
- totalByCustomer = mappedInput.reduceByKey(lambda x, y: x + y)
- #Changed for Python 3 compatibility:
- #flipped = totalByCustomer.map(lambda (x,y):(y,x))
- flipped = totalByCustomer.map(lambda x: (x[1], x[0]))
- totalByCustomerSorted = flipped.sortByKey()
- results = totalByCustomerSorted.collect();
- for result in results:
- print(result)
复制代码
|
|