请选择 进入手机版 | 继续访问电脑版
楼主: ReneeBK
1352 0

[Case Study]Logistic Regression Model using Scala [推广有奖]

  • 1关注
  • 62粉丝

VIP

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49392 个
通用积分
51.6904
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57815 点
帖子
4006
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

ReneeBK 发表于 2015-11-16 00:38:17 |显示全部楼层 |坛友微信交流群

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
  1. // scalastyle:off println
  2. package org.apache.spark.examples.ml

  3. import scala.collection.mutable
  4. import scala.language.reflectiveCalls

  5. import scopt.OptionParser

  6. import org.apache.spark.{SparkConf, SparkContext}
  7. import org.apache.spark.examples.mllib.AbstractParams
  8. import org.apache.spark.ml.{Pipeline, PipelineStage}
  9. import org.apache.spark.ml.classification.{LogisticRegression, LogisticRegressionModel}
  10. import org.apache.spark.ml.feature.StringIndexer
  11. import org.apache.spark.sql.DataFrame

  12. /**
  13. * An example runner for logistic regression with elastic-net (mixing L1/L2) regularization.
  14. * Run with
  15. * {{{
  16. * bin/run-example ml.LogisticRegressionExample [options]
  17. * }}}
  18. * A synthetic dataset can be found at `data/mllib/sample_libsvm_data.txt` which can be
  19. * trained by
  20. * {{{
  21. * bin/run-example ml.LogisticRegressionExample --regParam 0.3 --elasticNetParam 0.8 \
  22. *   data/mllib/sample_libsvm_data.txt
  23. * }}}
  24. * If you use it as a template to create your own app, please use `spark-submit` to submit your app.
  25. */
  26. object LogisticRegressionExample {

  27.   case class Params(
  28.       input: String = null,
  29.       testInput: String = "",
  30.       dataFormat: String = "libsvm",
  31.       regParam: Double = 0.0,
  32.       elasticNetParam: Double = 0.0,
  33.       maxIter: Int = 100,
  34.       fitIntercept: Boolean = true,
  35.       tol: Double = 1E-6,
  36.       fracTest: Double = 0.2) extends AbstractParams[Params]

  37.   def main(args: Array[String]) {
  38.     val defaultParams = Params()

  39.     val parser = new OptionParser[Params]("LogisticRegressionExample") {
  40.       head("LogisticRegressionExample: an example Logistic Regression with Elastic-Net app.")
  41.       opt[Double]("regParam")
  42.         .text(s"regularization parameter, default: ${defaultParams.regParam}")
  43.         .action((x, c) => c.copy(regParam = x))
  44.       opt[Double]("elasticNetParam")
  45.         .text(s"ElasticNet mixing parameter. For alpha = 0, the penalty is an L2 penalty. " +
  46.         s"For alpha = 1, it is an L1 penalty. For 0 < alpha < 1, the penalty is a combination of " +
  47.         s"L1 and L2, default: ${defaultParams.elasticNetParam}")
  48.         .action((x, c) => c.copy(elasticNetParam = x))
  49.       opt[Int]("maxIter")
  50.         .text(s"maximum number of iterations, default: ${defaultParams.maxIter}")
  51.         .action((x, c) => c.copy(maxIter = x))
  52.       opt[Boolean]("fitIntercept")
  53.         .text(s"whether to fit an intercept term, default: ${defaultParams.fitIntercept}")
  54.         .action((x, c) => c.copy(fitIntercept = x))
  55.       opt[Double]("tol")
  56.         .text(s"the convergence tolerance of iterations, Smaller value will lead " +
  57.         s"to higher accuracy with the cost of more iterations, default: ${defaultParams.tol}")
  58.         .action((x, c) => c.copy(tol = x))
  59.       opt[Double]("fracTest")
  60.         .text(s"fraction of data to hold out for testing.  If given option testInput, " +
  61.         s"this option is ignored. default: ${defaultParams.fracTest}")
  62.         .action((x, c) => c.copy(fracTest = x))
  63.       opt[String]("testInput")
  64.         .text(s"input path to test dataset.  If given, option fracTest is ignored." +
  65.         s" default: ${defaultParams.testInput}")
  66.         .action((x, c) => c.copy(testInput = x))
  67.       opt[String]("dataFormat")
  68.         .text("data format: libsvm (default), dense (deprecated in Spark v1.1)")
  69.         .action((x, c) => c.copy(dataFormat = x))
  70.       arg[String]("<input>")
  71.         .text("input path to labeled examples")
  72.         .required()
  73.         .action((x, c) => c.copy(input = x))
  74.       checkConfig { params =>
  75.         if (params.fracTest < 0 || params.fracTest >= 1) {
  76.           failure(s"fracTest ${params.fracTest} value incorrect; should be in [0,1).")
  77.         } else {
  78.           success
  79.         }
  80.       }
  81.     }

  82.     parser.parse(args, defaultParams).map { params =>
  83.       run(params)
  84.     }.getOrElse {
  85.       sys.exit(1)
  86.     }
  87.   }

  88.   def run(params: Params) {
  89.     val conf = new SparkConf().setAppName(s"LogisticRegressionExample with $params")
  90.     val sc = new SparkContext(conf)

  91.     println(s"LogisticRegressionExample with parameters:\n$params")

  92.     // Load training and test data and cache it.
  93.     val (training: DataFrame, test: DataFrame) = DecisionTreeExample.loadDatasets(sc, params.input,
  94.       params.dataFormat, params.testInput, "classification", params.fracTest)

  95.     // Set up Pipeline
  96.     val stages = new mutable.ArrayBuffer[PipelineStage]()

  97.     val labelIndexer = new StringIndexer()
  98.       .setInputCol("label")
  99.       .setOutputCol("indexedLabel")
  100.     stages += labelIndexer

  101.     val lor = new LogisticRegression()
  102.       .setFeaturesCol("features")
  103.       .setLabelCol("indexedLabel")
  104.       .setRegParam(params.regParam)
  105.       .setElasticNetParam(params.elasticNetParam)
  106.       .setMaxIter(params.maxIter)
  107.       .setTol(params.tol)
  108.       .setFitIntercept(params.fitIntercept)

  109.     stages += lor
  110.     val pipeline = new Pipeline().setStages(stages.toArray)

  111.     // Fit the Pipeline
  112.     val startTime = System.nanoTime()
  113.     val pipelineModel = pipeline.fit(training)
  114.     val elapsedTime = (System.nanoTime() - startTime) / 1e9
  115.     println(s"Training time: $elapsedTime seconds")

  116.     val lorModel = pipelineModel.stages.last.asInstanceOf[LogisticRegressionModel]
  117.     // Print the weights and intercept for logistic regression.
  118.     println(s"Weights: ${lorModel.coefficients} Intercept: ${lorModel.intercept}")

  119.     println("Training data results:")
  120.     DecisionTreeExample.evaluateClassificationModel(pipelineModel, training, "indexedLabel")
  121.     println("Test data results:")
  122.     DecisionTreeExample.evaluateClassificationModel(pipelineModel, test, "indexedLabel")

  123.     sc.stop()
  124.   }
  125. }
  126. // scalastyle:on println
复制代码


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:regression Case study regressio logistic regress import

本帖被以下文库推荐

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加JingGuanBbs
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-3-28 17:12