楼主: Lisrelchen
778 6

Calculating Movies Ratings Distribution With Apache Flink [推广有奖]

  • 0关注
  • 62粉丝

VIP

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
49957 个
通用积分
79.5487
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

相似文件 换一批

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
  1. If you’ve been following recent news in the Big Data world, you’ve probably heard about Apache Flink. This platform for batch and stream processing, which is built on a few significant technical innovations, can become a real game changer and it is starting to compete with existing products like Apache Spark.

  2. In this post, I would like to show how to implement a simple batch processing algorithm using Apache Flink. We will work with a dataset of movie ratings and will produce a distribution of user ratings. In the process, I’ll show few tricks that you can use to improve the performance of your Flink applications.
复制代码

本帖隐藏的内容

Calculating Movies Ratings Distribution With Apache Flink.pdf (319.59 KB)



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:distribution Calculating Ratings Movies RATING Movies

本帖被以下文库推荐

沙发
Lisrelchen 发表于 2017-5-17 10:47:23 |只看作者 |坛友微信交流群
  1. Create the Project

  2. Creating an Apache Flink project is pretty straightforward. The Apache Flink developers created a project template for us, so all we need to do is to use the Maven archetype:generate command:

  3. mvn archetype:generate
  4.    -DarchetypeGroupId=org.apache.flink
  5.    -DarchetypeArtifactId=flink-quickstart-java
  6.    -DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
  7.    -DarchetypeVersion=1.1.3

  8. It will generate a pom.xml file and several example Flink applications. To write our own, we need to create a new Java class with a main method. It will work in both development mode and on a Flink cluster.
复制代码

使用道具

藤椅
Lisrelchen 发表于 2017-5-17 10:49:41 |只看作者 |坛友微信交流群
  1. DataSource<String> file = env.readTextFile("ml-latest-small/ratings.csv");

  2. DataSet<Tuple2<IntValue, Integer>> ratings = file.flatMap(new ExtractRating());

  3. private static class ExtractRating implements FlatMapFunction<String, Tuple2<IntValue, Integer>> {

  4.   // Mutable int field to reuse to reduce GC pressure

  5.   IntValue ratingValue = new IntValue();


  6.   // Reuse rating value and result tuple


  7.   Tuple2<IntValue, Integer> result = new Tuple2<>(ratingValue, 1);


  8.   @Override public void flatMap(String s, Collector<Tuple2<IntValue, Integer>> collector) throws Exception {

  9.     // Every line contains comma separated values // user id | item id | rating | timestamp

  10.     String[] split = s.split(",");

  11.     String ratingStr = split[2];

  12.     // Ignore CSV header

  13.     if (!ratingStr.equals("rating")) {

  14.       int rating = (int) Double.parseDouble(split[2]);

  15.       ratingValue.setValue(rating);

  16.       collector.collect(result);

  17.     }

  18.   }

  19. }
复制代码

使用道具

板凳
Lisrelchen 发表于 2017-5-17 10:52:32 |只看作者 |坛友微信交流群
  1. file.flatMap(new ExtractRating()) .groupBy(0) .reduceGroup(new SumRatingCount());

  2. private static class SumRatingCount implements GroupReduceFunction<Tuple2<IntValue, Integer>, Tuple2<IntValue, Integer>> {
  3.   @Override public void reduce(Iterable<Tuple2<IntValue, Integer>> iterable, Collector<Tuple2<IntValue, Integer>> collector) throws Exception {
  4.     IntValue rating = null;
  5.     int ratingsCount = 0;
  6.     for (Tuple2<IntValue, Integer> tuple : iterable) {
  7.       rating = tuple.f0;
  8.       ratingsCount += tuple.f1;

  9.     }
  10.     collector.collect(new Tuple2<>(rating, ratingsCount));

  11.   }

  12. }
复制代码

使用道具

报纸
neuroexplorer 发表于 2017-5-17 11:19:54 |只看作者 |坛友微信交流群
thanks..........

使用道具

地板
MouJack007 发表于 2017-5-17 11:35:13 |只看作者 |坛友微信交流群
谢谢楼主分享!

使用道具

7
MouJack007 发表于 2017-5-17 11:36:42 |只看作者 |坛友微信交流群

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-5-7 22:10