楼主: ReneeBK
1198 0

[Case Study]Decision Tree Classification using Scala [推广有奖]

  • 1关注
  • 62粉丝

VIP

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49407 个
通用积分
51.8704
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57815 点
帖子
4006
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
  1. package org.apache.spark.examples.ml
  2. import org.apache.spark.sql.SQLContext
  3. import org.apache.spark.{SparkContext, SparkConf}
  4. // $example on$
  5. import org.apache.spark.ml.Pipeline
  6. import org.apache.spark.ml.classification.DecisionTreeClassifier
  7. import org.apache.spark.ml.classification.DecisionTreeClassificationModel
  8. import org.apache.spark.ml.feature.{StringIndexer, IndexToString, VectorIndexer}
  9. import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
  10. // $example off$

  11. object DecisionTreeClassificationExample {
  12.   def main(args: Array[String]): Unit = {
  13.     val conf = new SparkConf().setAppName("DecisionTreeClassificationExample")
  14.     val sc = new SparkContext(conf)
  15.     val sqlContext = new SQLContext(sc)
  16.     // $example on$
  17.     // Load the data stored in LIBSVM format as a DataFrame.
  18.     val data = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")

  19.     // Index labels, adding metadata to the label column.
  20.     // Fit on whole dataset to include all labels in index.
  21.     val labelIndexer = new StringIndexer()
  22.       .setInputCol("label")
  23.       .setOutputCol("indexedLabel")
  24.       .fit(data)
  25.     // Automatically identify categorical features, and index them.
  26.     val featureIndexer = new VectorIndexer()
  27.       .setInputCol("features")
  28.       .setOutputCol("indexedFeatures")
  29.       .setMaxCategories(4) // features with > 4 distinct values are treated as continuous
  30.       .fit(data)

  31.     // Split the data into training and test sets (30% held out for testing)
  32.     val Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3))

  33.     // Train a DecisionTree model.
  34.     val dt = new DecisionTreeClassifier()
  35.       .setLabelCol("indexedLabel")
  36.       .setFeaturesCol("indexedFeatures")

  37.     // Convert indexed labels back to original labels.
  38.     val labelConverter = new IndexToString()
  39.       .setInputCol("prediction")
  40.       .setOutputCol("predictedLabel")
  41.       .setLabels(labelIndexer.labels)

  42.     // Chain indexers and tree in a Pipeline
  43.     val pipeline = new Pipeline()
  44.       .setStages(Array(labelIndexer, featureIndexer, dt, labelConverter))

  45.     // Train model.  This also runs the indexers.
  46.     val model = pipeline.fit(trainingData)

  47.     // Make predictions.
  48.     val predictions = model.transform(testData)

  49.     // Select example rows to display.
  50.     predictions.select("predictedLabel", "label", "features").show(5)

  51.     // Select (prediction, true label) and compute test error
  52.     val evaluator = new MulticlassClassificationEvaluator()
  53.       .setLabelCol("indexedLabel")
  54.       .setPredictionCol("prediction")
  55.       .setMetricName("precision")
  56.     val accuracy = evaluator.evaluate(predictions)
  57.     println("Test Error = " + (1.0 - accuracy))

  58.     val treeModel = model.stages(2).asInstanceOf[DecisionTreeClassificationModel]
  59.     println("Learned classification tree model:\n" + treeModel.toDebugString)
  60.     // $example off$
  61.   }
  62. }
复制代码


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Case study Decision cation Using SCALA example package import

本帖被以下文库推荐

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加JingGuanBbs
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-5-1 05:03