楼主: ReneeBK
1177 0

[Case Study]Simple Text Classification using Scala [推广有奖]

  • 1关注
  • 62粉丝

VIP

已卖:4897份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49635 个
通用积分
55.6937
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57805 点
帖子
4005
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

楼主
ReneeBK 发表于 2015-11-16 00:47:18 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
  1. // scalastyle:off println
  2. package org.apache.spark.examples.ml

  3. import scala.beans.BeanInfo

  4. import org.apache.spark.{SparkConf, SparkContext}
  5. import org.apache.spark.ml.Pipeline
  6. import org.apache.spark.ml.classification.LogisticRegression
  7. import org.apache.spark.ml.feature.{HashingTF, Tokenizer}
  8. import org.apache.spark.mllib.linalg.Vector
  9. import org.apache.spark.sql.{Row, SQLContext}

  10. @BeanInfo
  11. case class LabeledDocument(id: Long, text: String, label: Double)

  12. @BeanInfo
  13. case class Document(id: Long, text: String)

  14. /**
  15. * A simple text classification pipeline that recognizes "spark" from input text. This is to show
  16. * how to create and configure an ML pipeline. Run with
  17. * {{{
  18. * bin/run-example ml.SimpleTextClassificationPipeline
  19. * }}}
  20. */
  21. object SimpleTextClassificationPipeline {

  22.   def main(args: Array[String]) {
  23.     val conf = new SparkConf().setAppName("SimpleTextClassificationPipeline")
  24.     val sc = new SparkContext(conf)
  25.     val sqlContext = new SQLContext(sc)
  26.     import sqlContext.implicits._

  27.     // Prepare training documents, which are labeled.
  28.     val training = sc.parallelize(Seq(
  29.       LabeledDocument(0L, "a b c d e spark", 1.0),
  30.       LabeledDocument(1L, "b d", 0.0),
  31.       LabeledDocument(2L, "spark f g h", 1.0),
  32.       LabeledDocument(3L, "hadoop mapreduce", 0.0)))

  33.     // Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.
  34.     val tokenizer = new Tokenizer()
  35.       .setInputCol("text")
  36.       .setOutputCol("words")
  37.     val hashingTF = new HashingTF()
  38.       .setNumFeatures(1000)
  39.       .setInputCol(tokenizer.getOutputCol)
  40.       .setOutputCol("features")
  41.     val lr = new LogisticRegression()
  42.       .setMaxIter(10)
  43.       .setRegParam(0.001)
  44.     val pipeline = new Pipeline()
  45.       .setStages(Array(tokenizer, hashingTF, lr))

  46.     // Fit the pipeline to training documents.
  47.     val model = pipeline.fit(training.toDF())

  48.     // Prepare test documents, which are unlabeled.
  49.     val test = sc.parallelize(Seq(
  50.       Document(4L, "spark i j k"),
  51.       Document(5L, "l m n"),
  52.       Document(6L, "spark hadoop spark"),
  53.       Document(7L, "apache hadoop")))

  54.     // Make predictions on test documents.
  55.     model.transform(test.toDF())
  56.       .select("id", "text", "probability", "prediction")
  57.       .collect()
  58.       .foreach { case Row(id: Long, text: String, prob: Vector, prediction: Double) =>
  59.         println(s"($id, $text) --> prob=$prob, prediction=$prediction")
  60.       }

  61.     sc.stop()
  62.   }
  63. }
  64. // scalastyle:on println
复制代码


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Case study simple cation Using SCALA import

本帖被以下文库推荐

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
jg-xs1
拉您进交流群
GMT+8, 2025-12-27 10:02