楼主: ReneeBK
1029 0

[Case Study]Decision Tree Classification using Python [推广有奖]

  • 1关注
  • 62粉丝

VIP

已卖:4897份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49635 个
通用积分
55.7537
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57805 点
帖子
4005
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

楼主
ReneeBK 发表于 2015-11-16 06:55:49 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
  1. from __future__ import print_function

  2. import sys

  3. # $example on$
  4. from pyspark import SparkContext, SQLContext
  5. from pyspark.ml import Pipeline
  6. from pyspark.ml.classification import DecisionTreeClassifier
  7. from pyspark.ml.feature import StringIndexer, VectorIndexer
  8. from pyspark.ml.evaluation import MulticlassClassificationEvaluator
  9. # $example off$

  10. if __name__ == "__main__":
  11.     sc = SparkContext(appName="decision_tree_classification_example")
  12.     sqlContext = SQLContext(sc)

  13.     # $example on$
  14.     # Load the data stored in LIBSVM format as a DataFrame.
  15.     data = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")

  16.     # Index labels, adding metadata to the label column.
  17.     # Fit on whole dataset to include all labels in index.
  18.     labelIndexer = StringIndexer(inputCol="label", outputCol="indexedLabel").fit(data)
  19.     # Automatically identify categorical features, and index them.
  20.     # We specify maxCategories so features with > 4 distinct values are treated as continuous.
  21.     featureIndexer =\
  22.         VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=4).fit(data)

  23.     # Split the data into training and test sets (30% held out for testing)
  24.     (trainingData, testData) = data.randomSplit([0.7, 0.3])

  25.     # Train a DecisionTree model.
  26.     dt = DecisionTreeClassifier(labelCol="indexedLabel", featuresCol="indexedFeatures")

  27.     # Chain indexers and tree in a Pipeline
  28.     pipeline = Pipeline(stages=[labelIndexer, featureIndexer, dt])

  29.     # Train model.  This also runs the indexers.
  30.     model = pipeline.fit(trainingData)

  31.     # Make predictions.
  32.     predictions = model.transform(testData)

  33.     # Select example rows to display.
  34.     predictions.select("prediction", "indexedLabel", "features").show(5)

  35.     # Select (prediction, true label) and compute test error
  36.     evaluator = MulticlassClassificationEvaluator(
  37.         labelCol="indexedLabel", predictionCol="prediction", metricName="precision")
  38.     accuracy = evaluator.evaluate(predictions)
  39.     print("Test Error = %g " % (1.0 - accuracy))

  40.     treeModel = model.stages[2]
  41.     # summary only
  42.     print(treeModel)
  43.     # $example off$
复制代码


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Case study Decision cation python Using example future import

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-1 21:25