楼主: Lisrelchen
1575 0

[Case Study]Term Frequency-Inverse Document Frequency using Java [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50288 个
通用积分
83.6306
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2015-11-16 02:22:47 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
  1. */

  2. package org.apache.spark.examples.ml;

  3. // $example on$
  4. import java.util.Arrays;

  5. import org.apache.spark.SparkConf;
  6. import org.apache.spark.api.java.JavaRDD;
  7. import org.apache.spark.api.java.JavaSparkContext;
  8. import org.apache.spark.ml.feature.HashingTF;
  9. import org.apache.spark.ml.feature.IDF;
  10. import org.apache.spark.ml.feature.IDFModel;
  11. import org.apache.spark.ml.feature.Tokenizer;
  12. import org.apache.spark.mllib.linalg.Vector;
  13. import org.apache.spark.sql.DataFrame;
  14. import org.apache.spark.sql.Row;
  15. import org.apache.spark.sql.RowFactory;
  16. import org.apache.spark.sql.SQLContext;
  17. import org.apache.spark.sql.types.DataTypes;
  18. import org.apache.spark.sql.types.Metadata;
  19. import org.apache.spark.sql.types.StructField;
  20. import org.apache.spark.sql.types.StructType;
  21. // $example off$

  22. public class JavaTfIdfExample {
  23.   public static void main(String[] args) {
  24.     SparkConf conf = new SparkConf().setAppName("JavaTfIdfExample");
  25.     JavaSparkContext jsc = new JavaSparkContext(conf);
  26.     SQLContext sqlContext = new SQLContext(jsc);

  27.     // $example on$
  28.     JavaRDD<Row> jrdd = jsc.parallelize(Arrays.asList(
  29.       RowFactory.create(0, "Hi I heard about Spark"),
  30.       RowFactory.create(0, "I wish Java could use case classes"),
  31.       RowFactory.create(1, "Logistic regression models are neat")
  32.     ));
  33.     StructType schema = new StructType(new StructField[]{
  34.       new StructField("label", DataTypes.DoubleType, false, Metadata.empty()),
  35.       new StructField("sentence", DataTypes.StringType, false, Metadata.empty())
  36.     });
  37.     DataFrame sentenceData = sqlContext.createDataFrame(jrdd, schema);
  38.     Tokenizer tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words");
  39.     DataFrame wordsData = tokenizer.transform(sentenceData);
  40.     int numFeatures = 20;
  41.     HashingTF hashingTF = new HashingTF()
  42.       .setInputCol("words")
  43.       .setOutputCol("rawFeatures")
  44.       .setNumFeatures(numFeatures);
  45.     DataFrame featurizedData = hashingTF.transform(wordsData);
  46.     IDF idf = new IDF().setInputCol("rawFeatures").setOutputCol("features");
  47.     IDFModel idfModel = idf.fit(featurizedData);
  48.     DataFrame rescaledData = idfModel.transform(featurizedData);
  49.     for (Row r : rescaledData.select("features", "label").take(3)) {
  50.       Vector features = r.getAs(0);
  51.       Double label = r.getDouble(1);
  52.       System.out.println(features);
  53.       System.out.println(label);
  54.     }
  55.     // $example off$

  56.     jsc.stop();
  57.   }
  58. }
复制代码


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Case study Frequency Document inverse DOCUME example package import Java

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
jg-xs1
拉您进交流群
GMT+8, 2025-12-31 08:29