签到
- 苹果/安卓/wp
- 苹果/安卓/wp
客户端
0.0

0.00

经管百科

人大经济论坛 › 论坛 › 计量经济学与统计论坛五区 › 计量经济学与统计软件 › winbugs及其他软件专版 › [GitHub]Taming Big Data with Apache Spark and Python

楼主: ReneeBK

3147 11

[GitHub]Taming Big Data with Apache Spark and Python [推广有奖]

11楼

ReneeBK 发表于 2017-7-7 04:08:09

from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("SpendByCustomerSorted")
sc = SparkContext(conf = conf)
def extractCustomerPricePairs(line):
fields = line.split(',')
return (int(fields[0]), float(fields[2]))
input = sc.textFile("file:///sparkcourse/customer-orders.csv")
mappedInput = input.map(extractCustomerPricePairs)
totalByCustomer = mappedInput.reduceByKey(lambda x, y: x + y)
#Changed for Python 3 compatibility:
#flipped = totalByCustomer.map(lambda (x,y):(y,x))
flipped = totalByCustomer.map(lambda x: (x[1], x[0]))
totalByCustomerSorted = flipped.sortByKey()
results = totalByCustomerSorted.collect();
for result in results:
print(result)

复制代码

回复

12楼

snow_boy 发表于 2017-7-7 10:54:01

ok
ok
ok

回复

发帖

本版微信群

加好友,备注jltj
拉您入交流群

京ICP备16021002号-2 京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明