- 阅读权限
- 255
- 威望
- 1 级
- 论坛币
- 49635 个
- 通用积分
- 55.7537
- 学术水平
- 370 点
- 热心指数
- 273 点
- 信用等级
- 335 点
- 经验
- 57805 点
- 帖子
- 4005
- 精华
- 21
- 在线时间
- 582 小时
- 注册时间
- 2005-5-8
- 最后登录
- 2023-11-26
已卖:4897份资源
学术权威
还不是VIP/贵宾
TA的文库 其他... R资源总汇
Panel Data Analysis
Experimental Design
- 威望
- 1 级
- 论坛币
 - 49635 个
- 通用积分
- 55.7537
- 学术水平
- 370 点
- 热心指数
- 273 点
- 信用等级
- 335 点
- 经验
- 57805 点
- 帖子
- 4005
- 精华
- 21
- 在线时间
- 582 小时
- 注册时间
- 2005-5-8
- 最后登录
- 2023-11-26
 | 开心 2017-10-21 10:25:33 |
|---|
签到天数: 1 天 连续签到: 1 天 [LV.1]初来乍到
|
经管之家送您一份
应届毕业生专属福利!
求职就业群
感谢您参与论坛问题回答
经管之家送您两个论坛币!
+2 论坛币
- from __future__ import print_function
- import sys
- # $example on$
- from pyspark import SparkContext, SQLContext
- from pyspark.ml import Pipeline
- from pyspark.ml.classification import DecisionTreeClassifier
- from pyspark.ml.feature import StringIndexer, VectorIndexer
- from pyspark.ml.evaluation import MulticlassClassificationEvaluator
- # $example off$
- if __name__ == "__main__":
- sc = SparkContext(appName="decision_tree_classification_example")
- sqlContext = SQLContext(sc)
- # $example on$
- # Load the data stored in LIBSVM format as a DataFrame.
- data = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
- # Index labels, adding metadata to the label column.
- # Fit on whole dataset to include all labels in index.
- labelIndexer = StringIndexer(inputCol="label", outputCol="indexedLabel").fit(data)
- # Automatically identify categorical features, and index them.
- # We specify maxCategories so features with > 4 distinct values are treated as continuous.
- featureIndexer =\
- VectorIndexer(inputCol="features", outputCol="indexedFeatures", maxCategories=4).fit(data)
- # Split the data into training and test sets (30% held out for testing)
- (trainingData, testData) = data.randomSplit([0.7, 0.3])
- # Train a DecisionTree model.
- dt = DecisionTreeClassifier(labelCol="indexedLabel", featuresCol="indexedFeatures")
- # Chain indexers and tree in a Pipeline
- pipeline = Pipeline(stages=[labelIndexer, featureIndexer, dt])
- # Train model. This also runs the indexers.
- model = pipeline.fit(trainingData)
- # Make predictions.
- predictions = model.transform(testData)
- # Select example rows to display.
- predictions.select("prediction", "indexedLabel", "features").show(5)
- # Select (prediction, true label) and compute test error
- evaluator = MulticlassClassificationEvaluator(
- labelCol="indexedLabel", predictionCol="prediction", metricName="precision")
- accuracy = evaluator.evaluate(predictions)
- print("Test Error = %g " % (1.0 - accuracy))
- treeModel = model.stages[2]
- # summary only
- print(treeModel)
- # $example off$
复制代码
扫码加我 拉你入群
请注明:姓名-公司-职位
以便审核进群资格,未注明则拒绝
|
|
|