签到
- 苹果/安卓/wp
- 苹果/安卓/wp
客户端
0.0

0.00

人大经济论坛 › 论坛 › 计量经济学与统计论坛五区 › 计量经济学与统计软件 › winbugs及其他软件专版 › 【Apache Flint】Flink DataStream API Programming Gui ...

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

提升主题| 本版置顶| 关闭主题| 变更主题颜色| 抢沙发| 顶贴| 显身卡| 道具中心

楼主: ReneeBK

1381 7

【Apache Flint】Flink DataStream API Programming Guide [推广有奖]

1关注
62粉丝

学术权威

14%

还不是VIP/贵宾

-

TA的文库 其他...

Panel Data Analysis

Experimental Design

0%

威望: 1 级
论坛币: 49492 个
通用积分: 53.3854
学术水平: 370 点
热心指数: 273 点
信用等级: 335 点
经验: 57815 点
帖子: 4006
精华: 21
在线时间: 582 小时
注册时间: 2005-5-8
最后登录: 2023-11-26

楼主

ReneeBK 发表于 2017-3-10 09:45:45 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

本帖隐藏的内容

https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html

Example Program
The following program is a complete, working example of streaming window word count application, that counts the words coming from a web socket in 5 second windows. You can copy & paste the code to run it locally.
Java
Scala
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time
object WindowWordCount {
def main(args: Array[String]) {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val text = env.socketTextStream("localhost", 9999)
val counts = text.flatMap { _.toLowerCase.split("\\W+") filter { _.nonEmpty } }
.map { (_, 1) }
.keyBy(0)
.timeWindow(Time.seconds(5))
.sum(1)
counts.print
env.execute("Window Stream WordCount")
}
}

复制代码

二维码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Programming datastream Program apache Stream

相关帖子

本帖被以下文库推荐

· Data Science NewOccidental|主题: 1233, 订阅: 120

回复

使用道具举报

沙发

franky_sas 发表于 2017-3-10 09:54:36 |只看作者 |坛友微信交流群

Thanks.

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

回复

使用道具举报

藤椅

ReneeBK 发表于 2017-3-10 09:54:55 |只看作者 |坛友微信交流群

Running an example
In order to run a Flink example, we assume you have a running Flink instance available. The “Quickstart” and “Setup” tabs in the navigation describe various ways of starting Flink.
The easiest way is running the ./bin/start-local.sh script, which will start a JobManager locally.
Each binary release of Flink contains an examples directory with jar files for each of the examples on this page.
To run the WordCount example, issue the following command:
./bin/flink run ./examples/batch/WordCount.jar
The other examples can be started in a similar way.
Note that many examples run without passing any arguments for them, by using build-in data. To run WordCount with real data, you have to pass the path to the data:
./bin/flink run ./examples/batch/WordCount.jar --input /path/to/some/text/data --output /path/to/result
Note that non-local file systems require a schema prefix, such as hdfs://.

复制代码

回复

使用道具举报

板凳

ReneeBK 发表于 2017-3-10 09:55:24 |只看作者 |坛友微信交流群

Word Count
WordCount is the “Hello World” of Big Data processing systems. It computes the frequency of words in a text collection. The algorithm works in two steps: First, the texts are splits the text to individual words. Second, the words are grouped and counted.
Java
Scala
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<String> text = env.readTextFile("/path/to/file");
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0)
.sum(1);
counts.writeAsCsv(outputPath, "\n", " ");
// User-defined functions
public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("\\W+");
// emit the pairs
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2<String, Integer>(token, 1));
}
}
}
}
The WordCount example implements the above described algorithm with input parameters: --input <path> --output <path>. As test data, any text file will do.

复制代码

回复

使用道具举报

报纸

ReneeBK 发表于 2017-3-10 09:56:09 |只看作者 |坛友微信交流群

Page Rank
The PageRank algorithm computes the “importance” of pages in a graph defined by links, which point from one pages to another page. It is an iterative graph algorithm, which means that it repeatedly applies the same computation. In each iteration, each page distributes its current rank over all its neighbors, and compute its new rank as a taxed sum of the ranks it received from its neighbors. The PageRank algorithm was popularized by the Google search engine which uses the importance of webpages to rank the results of search queries.
In this simple example, PageRank is implemented with a bulk iteration and a fixed number of iterations.
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
// read the pages and initial ranks by parsing a CSV file
DataSet<Tuple2<Long, Double>> pagesWithRanks = env.readCsvFile(pagesInputPath)
.types(Long.class, Double.class)
// the links are encoded as an adjacency list: (page-id, Array(neighbor-ids))
DataSet<Tuple2<Long, Long[]>> pageLinkLists = getLinksDataSet(env);
// set iterative data set
IterativeDataSet<Tuple2<Long, Double>> iteration = pagesWithRanks.iterate(maxIterations);
DataSet<Tuple2<Long, Double>> newRanks = iteration
// join pages with outgoing edges and distribute rank
.join(pageLinkLists).where(0).equalTo(0).flatMap(new JoinVertexWithEdgesMatch())
// collect and sum ranks
.groupBy(0).sum(1)
// apply dampening factor
.map(new Dampener(DAMPENING_FACTOR, numPages));
DataSet<Tuple2<Long, Double>> finalPageRanks = iteration.closeWith(
newRanks,
newRanks.join(iteration).where(0).equalTo(0)
// termination condition
.filter(new EpsilonFilter()));
finalPageRanks.writeAsCsv(outputPath, "\n", " ");
// User-defined functions
public static final class JoinVertexWithEdgesMatch
implements FlatJoinFunction<Tuple2<Long, Double>, Tuple2<Long, Long[]>,
Tuple2<Long, Double>> {
@Override
public void join(<Tuple2<Long, Double> page, Tuple2<Long, Long[]> adj,
Collector<Tuple2<Long, Double>> out) {
Long[] neighbors = adj.f1;
double rank = page.f1;
double rankToDistribute = rank / ((double) neigbors.length);
for (int i = 0; i < neighbors.length; i++) {
out.collect(new Tuple2<Long, Double>(neighbors[i], rankToDistribute));
}
}
}
public static final class Dampener implements MapFunction<Tuple2<Long,Double>, Tuple2<Long,Double>> {
private final double dampening, randomJump;
public Dampener(double dampening, double numVertices) {
this.dampening = dampening;
this.randomJump = (1 - dampening) / numVertices;
}
@Override
public Tuple2<Long, Double> map(Tuple2<Long, Double> value) {
value.f1 = (value.f1 * dampening) + randomJump;
return value;
}
}
public static final class EpsilonFilter
implements FilterFunction<Tuple2<Tuple2<Long, Double>, Tuple2<Long, Double>>> {
@Override
public boolean filter(Tuple2<Tuple2<Long, Double>, Tuple2<Long, Double>> value) {
return Math.abs(value.f0.f1 - value.f1.f1) > EPSILON;
}
}

复制代码

回复

使用道具举报

地板

学生认证

发表于 2017-3-10 09:56:15 |只看作者 |坛友微信交流群

感谢分享

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

回复

使用道具举报

7楼

ReneeBK 发表于 2017-3-10 10:00:41 |只看作者 |坛友微信交流群

Connected Components
The Connected Components algorithm identifies parts of a larger graph which are connected by assigning all vertices in the same connected part the same component ID. Similar to PageRank, Connected Components is an iterative algorithm. In each step, each vertex propagates its current component ID to all its neighbors. A vertex accepts the component ID from a neighbor, if it is smaller than its own component ID.
This implementation uses a delta iteration: Vertices that have not changed their component ID do not participate in the next step. This yields much better performance, because the later iterations typically deal only with a few outlier vertices.
Java
Scala
// read vertex and edge data
DataSet<Long> vertices = getVertexDataSet(env);
DataSet<Tuple2<Long, Long>> edges = getEdgeDataSet(env).flatMap(new UndirectEdge());
// assign the initial component IDs (equal to the vertex ID)
DataSet<Tuple2<Long, Long>> verticesWithInitialId = vertices.map(new DuplicateValue<Long>());
// open a delta iteration
DeltaIteration<Tuple2<Long, Long>, Tuple2<Long, Long>> iteration =
verticesWithInitialId.iterateDelta(verticesWithInitialId, maxIterations, 0);
// apply the step logic:
DataSet<Tuple2<Long, Long>> changes = iteration.getWorkset()
// join with the edges
.join(edges).where(0).equalTo(0).with(new NeighborWithComponentIDJoin())
// select the minimum neighbor component ID
.groupBy(0).aggregate(Aggregations.MIN, 1)
// update if the component ID of the candidate is smaller
.join(iteration.getSolutionSet()).where(0).equalTo(0)
.flatMap(new ComponentIdFilter());
// close the delta iteration (delta and new workset are identical)
DataSet<Tuple2<Long, Long>> result = iteration.closeWith(changes, changes);
// emit result
result.writeAsCsv(outputPath, "\n", " ");
// User-defined functions
public static final class DuplicateValue<T> implements MapFunction<T, Tuple2<T, T>> {
@Override
public Tuple2<T, T> map(T vertex) {
return new Tuple2<T, T>(vertex, vertex);
}
}
public static final class UndirectEdge
implements FlatMapFunction<Tuple2<Long, Long>, Tuple2<Long, Long>> {
Tuple2<Long, Long> invertedEdge = new Tuple2<Long, Long>();
@Override
public void flatMap(Tuple2<Long, Long> edge, Collector<Tuple2<Long, Long>> out) {
invertedEdge.f0 = edge.f1;
invertedEdge.f1 = edge.f0;
out.collect(edge);
out.collect(invertedEdge);
}
}
public static final class NeighborWithComponentIDJoin
implements JoinFunction<Tuple2<Long, Long>, Tuple2<Long, Long>, Tuple2<Long, Long>> {
@Override
public Tuple2<Long, Long> join(Tuple2<Long, Long> vertexWithComponent, Tuple2<Long, Long> edge) {
return new Tuple2<Long, Long>(edge.f1, vertexWithComponent.f1);
}
}
public static final class ComponentIdFilter
implements FlatMapFunction<Tuple2<Tuple2<Long, Long>, Tuple2<Long, Long>>,
Tuple2<Long, Long>> {
@Override
public void flatMap(Tuple2<Tuple2<Long, Long>, Tuple2<Long, Long>> value,
Collector<Tuple2<Long, Long>> out) {
if (value.f0.f1 < value.f1.f1) {
out.collect(value.f0);
}
}
}

复制代码

回复

使用道具举报

8楼

钱学森64 发表于 2017-3-10 13:53:56 |只看作者 |坛友微信交流群

谢谢分享

已有 1 人评分	论坛币	收起理由
Nicolle	+ 20	鼓励积极发帖讨论

总评分: 论坛币 + 20 查看全部评分

回复

使用道具举报

发帖

本版微信群

加好友,备注jltj
拉您入交流群

如有投资本站、合作意向或投放广告，请联系：13661292478（刘老师）

联系客服

邮箱：service@pinggu.org 投诉或不良信息处理：（010-68466864）

京ICP备16021002-2号京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明