楼主: 孙悟充
509 2

[分布式系统架构] 运行pyspark的logistic回归报错 [推广有奖]

  • 0关注
  • 1粉丝

大专生

75%

还不是VIP/贵宾

-

威望
0
论坛币
88 个
学术水平
0 点
热心指数
0 点
信用等级
0 点
经验
701 点
帖子
44
精华
0
在线时间
42 小时
注册时间
2018-3-28
最后登录
2019-1-17

孙悟充 学生认证  发表于 2018-12-28 14:06:30 |显示全部楼层
  1. data = RandomRDDs.normalVectorRDD(sc, 100000, 10, seed=2)


  2. def tologisticregressiondata(x):
  3.     return LabeledPoint(rd.randint(0,1), x)

  4. dataforlogisticregression = data.map(tologisticregressiondata)

  5. cdata = dataforlogisticregression.randomSplit([0.8, 0.2])
  6. ctrain = cdata[0]
  7. ctest = cdata[1]

  8. cmodel = LogisticRegressionWithLBFGS.train(ctrain)
复制代码
-------------------------------------------------------------------------------------------------------------------------------------------
18/12/28 13:57:29 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
18/12/28 13:57:29 WARN metastore: Failed to connect to the MetaStore Server...
18/12/28 13:57:30 WARN metastore: Failed to connect to the MetaStore Server...
18/12/28 13:57:31 WARN metastore: Failed to connect to the MetaStore Server...
18/12/28 13:57:32 WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate o                      rg.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
请问为什么会这样呢,其他方法,比如线性回归,SVM,随机森林等等都没有问题,只有logistic回归会报这个错误

stata SPSS
孙悟充 学生认证  发表于 2018-12-28 14:38:19 |显示全部楼层
尝试用随机梯度就没有问题
回复

使用道具 举报

孙悟充 学生认证  发表于 2018-12-28 15:02:43 |显示全部楼层
>>> multi_class_data = [
...     LabeledPoint(0.0, [0.0, 1.0, 0.0]),
...     LabeledPoint(1.0, [1.0, 0.0, 0.0]),
...     LabeledPoint(2.0, [0.0, 0.0, 1.0])
... ]
>>> data = sc.parallelize(multi_class_data)
>>> mcm = LogisticRegressionWithLBFGS.train(data, iterations=10, numClasses=3)

=================================

这里会报这个错
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/spark-current/python/pyspark/mllib/classification.py", line 398, in train
    return _regression_train_wrapper(train, LogisticRegressionModel, data, initialWeights)
  File "/usr/lib/spark-current/python/pyspark/mllib/regression.py", line 216, in _regression_train_wrapper
    return modelClass(weights, intercept, numFeatures, numClasses)
  File "/usr/lib/spark-current/python/pyspark/mllib/classification.py", line 176, in __init__
    self._dataWithBiasSize)
TypeError: 'float' object cannot be interpreted as an integer
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 我要注册

GMT+8, 2019-1-19 06:56