人大经济论坛 › 论坛 › 计量经济学与统计论坛五区 › 计量经济学与统计软件 › winbugs及其他软件专版 › Learning Apache Flink

发帖

楼主: ReneeBK

1167 8

Learning Apache Flink [推广有奖]

1关注
62粉丝

VIP

已卖：4901份资源

学术权威

14%

还不是VIP/贵宾

TA的文库 其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望: 1 级
论坛币: 49675 个
通用积分: 56.2487
学术水平: 370 点
热心指数: 273 点
信用等级: 335 点
经验: 57805 点
帖子: 4005
精华: 21
在线时间: 582 小时
注册时间: 2005-5-8
最后登录: 2023-11-26

楼主

ReneeBK 发表于 2017-5-11 10:48:01 |AI写论文

1论坛币

Learning Apache Flink
By: Tanmay Deshpande
Publisher: Packt Publishing
Pub. Date: February 20, 2017
Web ISBN-13: 978-1-78646-726-3
Print ISBN-13: 978-1-78646-622-8
Pages in Print Edition: 280
Subscriber Rating: [0 Ratings]

分享0 收藏0 回帖

关键词：Learning earning apache Learn link

本帖被以下文库推荐

· Data Science NewOccidental|主题: 1233, 订阅: 120

沙发

ReneeBK 发表于 2017-5-11 10:49:12

Map
This is one of the simplest transformations, where the input is one data stream and the output is also one data stream.
In Java:
inputStream.map(new MapFunction<Integer, Integer>() {
@Override
public Integer map(Integer value) throws Exception {
return 5 * value;
}
});
In Scala:
inputStream.map { x => x * 5

复制代码

藤椅

ReneeBK 发表于 2017-5-11 10:49:46

FlatMap
FlatMap takes one record and outputs zero, one, or more than one record.
In Java:
inputStream.flatMap(new FlatMapFunction<String, String>() {
@Override
public void flatMap(String value, Collector<String> out)
throws Exception {
for(String word: value.split(" ")){
out.collect(word);
}
}
});
In Scala:
inputStream.flatMap { str => str.split(" ") }

复制代码

板凳

ReneeBK 发表于 2017-5-11 10:50:07

Filter
Filter functions evaluate the conditions and then, if they result as true, only emit the record. Filter functions can output zero records.
In Java:
inputStream.filter(new FilterFunction<Integer>() {
@Override
public boolean filter(Integer value) throws Exception {
return value != 1;
}
});
In Scala:
inputStream.filter { _ != 1 }

复制代码

报纸

ReneeBK 发表于 2017-5-11 10:50:30

KeyBy
KeyBy logically partitions the stream-based on the key. Internally it uses hash functions to partition the stream. It returns KeyedDataStream.
In Java:
inputStream.keyBy("someKey");
In Scala:
inputStream.keyBy("someKey")

复制代码

地板

ReneeBK 发表于 2017-5-11 10:50:54

Reduce
Reduce rolls out the KeyedDataStream by reducing the last reduced value with the current value. The following code does the sum reduce of a KeyedDataStream.
In Java:
keyedInputStream. reduce(new ReduceFunction<Integer>() {
@Override
public Integer reduce(Integer value1, Integer value2)
throws Exception {
return value1 + value2;
}
});
In Scala:
keyedInputStream. reduce { _ + _ }

复制代码

7楼

ReneeBK 发表于 2017-5-11 10:51:42

Fold
Fold rolls out the KeyedDataStream by combining the last folder stream with the current record. It emits a data stream back.
In Java:
keyedInputStream keyedStream.fold("Start", new FoldFunction<Integer, String>() {
@Override
public String fold(String current, Integer value) {
return current + "=" + value;
}
});
In Scala:
keyedInputStream.fold("Start")((str, i) => { str + "=" + i })
The preceding given function when applied on a stream of (1,2,3,4,5) would emit a stream like this: Start=1=2=3=4=5

复制代码

8楼

Lisrelchen 发表于 2017-5-11 10:59:18

package com.demo.chapter06
import org.apache.flink.api.scala._
import org.apache.flink.ml.math.Vector
import org.apache.flink.ml.common.LabeledVector
import org.apache.flink.ml.classification.SVM
import org.apache.flink.ml.RichExecutionEnvironment
object MySVMApp {
def main(args: Array[String]) {
// set up the execution environment
val pathToTrainingFile: String = "iris-train.txt"
val pathToTestingFile: String = "iris-train.txt"
val env = ExecutionEnvironment.getExecutionEnvironment
// Read the training dataset, from a LibSVM formatted file
val trainingDS: DataSet[LabeledVector] =
env.readLibSVM(pathToTrainingFile)
// Create the SVM learner
val svm = SVM()
.setBlocks(10)
// Learn the SVM model
svm.fit(trainingDS)
// Read the testing dataset
val testingDS: DataSet[Vector] =
env.readLibSVM(pathToTestingFile).map(_.vector)
// Calculate the predictions for the testing dataset
val predictionDS: DataSet[(Vector, Double)] =
svm.predict(testingDS)
predictionDS.writeAsText("out")
env.execute("Flink SVM App")
}
}

复制代码

9楼

Lisrelchen 发表于 2017-5-11 11:01:16

#!/usr/bin/env python
"""
Convert CSV file to libsvm format. Works only with numeric variables.
Put -1 as label index (argv[3]) if there are no labels in your file.
Expecting no headers. If present, headers can be skipped with argv[4] == 1.
"""
import sys
import csv
from collections import defaultdict
def construct_line( label, line ):
new_line = []
if float( label ) == 0.0:
label = "0"
new_line.append( label )
for i, item in enumerate( line ):
if item == '' or float( item ) == 0.0:
continue
new_item = "%s:%s" % ( i + 1, item )
new_line.append( new_item )
new_line = " ".join( new_line )
new_line += "\n"
return new_line
# ---
input_file = sys.argv[1]
output_file = sys.argv[2]
try:
label_index = int( sys.argv[3] )
except IndexError:
label_index = 0
try:
skip_headers = sys.argv[4]
except IndexError:
skip_headers = 0
i = open( input_file, 'rb' )
o = open( output_file, 'wb' )
reader = csv.reader( i )
if skip_headers:
headers = reader.next()
for line in reader:
if label_index == -1:
label = '1'
else:
label = line.pop( label_index )
new_line = construct_line( label, line )
o.write( new_line )

复制代码

返回列表

发帖

本版微信群

加好友,备注jltj
拉您入交流群

京ICP备16021002号-2 京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明

Learning Apache Flink [推广有奖]

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群