楼主: Nicolle
814 39

Artificial Intelligence for Big Data [推广有奖]

版主

巨擘

0%

还不是VIP/贵宾

-

TA的文库  其他...

Python Programming

SAS Programming

Structural Equation Modeling

威望
15
论坛币
12233652 个
学术水平
2723 点
热心指数
2620 点
信用等级
2539 点
经验
418476 点
帖子
17216
精华
75
在线时间
6535 小时
注册时间
2005-4-23
最后登录
2018-6-23

Nicolle 学生认证  发表于 2018-6-13 04:12:48 |显示全部楼层
1论坛币

本帖隐藏的内容

Artificial-Intelligence-for-Big-Data-master.zip (26.97 KB)





stata SPSS
Nicolle 学生认证  发表于 2018-6-13 04:14:10 |显示全部楼层
  1. import org.apache.spark.ml.feature.LabeledPoint
  2. import org.apache.spark.ml.linalg.Vectors
  3. import org.apache.spark.ml.regression.LinearRegression

  4. val linearRegrsssionSampleData = sc.textFile("aibd/linear_regression_sample.txt")

  5. val labeledData = linearRegrsssionSampleData.map { line =>
  6.   val parts = line.split(',')
  7.   LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).toDouble))
  8. }.cache().toDF

  9. val lr = new LinearRegression()

  10. val model = lr.fit(labeledData)

  11. val summary = model.summary
  12. println("R-squared = "+ summary.r2)
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-6-13 04:14:57 |显示全部楼层
  1. import org.apache.spark.ml.classification.LogisticRegression

  2. // Load training data
  3. val training = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")

  4. val lr = new LogisticRegression()
  5.   .setMaxIter(10)
  6.   .setRegParam(0.3)
  7.   .setElasticNetParam(0.8)

  8. // Fit the model
  9. val lrModel = lr.fit(training)

  10. // Print the coefficients and intercept for logistic regression
  11. println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")

  12. // We can also use the multinomial family for binary classification
  13. val mlr = new LogisticRegression()
  14.   .setMaxIter(10)
  15.   .setRegParam(0.3)
  16.   .setElasticNetParam(0.8)
  17.   .setFamily("multinomial")

  18. val mlrModel = mlr.fit(training)

  19. // Print the coefficients and intercepts for logistic regression with multinomial family
  20. println(s"Multinomial coefficients: ${mlrModel.coefficientMatrix}")
  21. println(s"Multinomial intercepts: ${mlrModel.interceptVector}")
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-6-13 04:15:26 |显示全部楼层
  1. import org.apache.spark.ml.feature.LabeledPoint
  2. import org.apache.spark.ml.linalg.Vectors
  3. import org.apache.spark.ml.clustering.KMeans

  4. val kmeansSampleData = sc.textFile("aibd/k-means-sample.txt")

  5. val labeledData = kmeansSampleData.map { line =>
  6.   val parts = line.split(',')
  7.   LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).toDouble, parts(2).toDouble))
  8. }.cache().toDF


  9. val kmeans = new KMeans()
  10. .setK(2) // default value is 2
  11. .setFeaturesCol("features")
  12. .setMaxIter(3) // default Max Iteration is 20
  13. .setPredictionCol("prediction")
  14. .setSeed(1L)

  15. val model = kmeans.fit(labeledData)

  16. summary.predictions.show
  17. model.clusterCenters.foreach(println)
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-6-13 04:16:11 |显示全部楼层
  1. import org.apache.spark.mllib.linalg.Matrix
  2. import org.apache.spark.mllib.linalg.Vectors
  3. import org.apache.spark.mllib.linalg.distributed.RowMatrix

  4. val data = Array(Vectors.dense(2.0, 1.0, 75.0, 18.0, 1.0,2),
  5. Vectors.dense(0.0, 1.0, 21.0, 28.0, 2.0,4),
  6. Vectors.dense(0.0, 1.0, 32.0, 61.0, 5.0,10),
  7. Vectors.dense(0.0, 1.0, 56.0, 39.0, 2.0,4),
  8. Vectors.dense(1.0, 1.0, 73.0, 81.0, 3.0,6),
  9. Vectors.dense(0.0, 1.0, 97.0, 59.0, 7.0,14))

  10. val rows = sc.parallelize(data)

  11. val mat: RowMatrix = new RowMatrix(rows)

  12. // Principal components are stored in a local dense matrix.
  13. val pc: Matrix = mat.computePrincipalComponents(2)

  14. // Project the rows to the linear space spanned by the top 2 principal components.
  15. val projected: RowMatrix = mat.multiply(pc)

  16. projected.rows.foreach(println)
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-6-13 04:16:48 |显示全部楼层
  1. import org.apache.spark.mllib.linalg.Matrix
  2. import org.apache.spark.mllib.linalg.Vectors
  3. import org.apache.spark.mllib.linalg.Vector
  4. import org.apache.spark.mllib.linalg.distributed.RowMatrix
  5. import org.apache.spark.mllib.linalg.SingularValueDecomposition

  6. val data = Array(Vectors.dense(2.0, 1.0, 75.0, 18.0, 1.0,2),
  7. Vectors.dense(0.0, 1.0, 21.0, 28.0, 2.0,4),
  8. Vectors.dense(0.0, 1.0, 32.0, 61.0, 5.0,10),
  9. Vectors.dense(0.0, 1.0, 56.0, 39.0, 2.0,4),
  10. Vectors.dense(1.0, 1.0, 73.0, 81.0, 3.0,6),
  11. Vectors.dense(0.0, 1.0, 97.0, 59.0, 7.0,14))

  12. val rows = sc.parallelize(data)

  13. val mat: RowMatrix = new RowMatrix(rows)

  14. val svd: SingularValueDecomposition[RowMatrix, Matrix] = mat.computeSVD(3, computeU = true)

  15. val U: RowMatrix = svd.U // The U factor is stored as a row matrix
  16. val s: Vector = svd.s         // The sigma factor is stored as a singular vector
  17. val V: Matrix = svd.V         // The V factor is stored as a local dense matrix
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-6-13 04:17:27 |显示全部楼层
  1. object FeedForwardNetworkWithSpark {

  2.         def main(args:Array[String]): Unit ={
  3.         val recordReader:RecordReader = new CSVRecordReader(0,",")
  4.         val conf = new SparkConf()
  5.         .setMaster("spark://master:7077")
  6.         .setAppName("FeedForwardNetwork-Iris")
  7.         val sc = new SparkContext(conf)

  8.         val numInputs:Int = 4
  9.         val outputNum = 3
  10.         val iterations =1
  11.         val multiLayerConfig:MultiLayerConfiguration = new
  12.         NeuralNetConfiguration.Builder()
  13.                 .seed(12345)
  14.                 .iterations(iterations)
  15.                 .optimizationAlgo(OptimizationAlgorithm
  16.                 .STOCHASTIC_GRADIENT_DESCENT)
  17.                 .learningRate(1e-1)
  18.                 .l1(0.01).regularization(true).l2(1e-3)
  19.                 .list(3)
  20.                 .layer(0, new DenseLayer.Builder().nIn(numInputs).nOut(3)
  21.                 .activation("tanh")
  22.                 .weightInit(WeightInit.XAVIER)
  23.                 .build())
  24.                 .layer(1, new DenseLayer.Builder().nIn(3).nOut(2)
  25.                 .activation("tanh")
  26.                 .weightInit(WeightInit.XAVIER)
  27.                 .build())
  28.                 .layer(2, new
  29.                 OutputLayer.Builder(LossFunctions.LossFunction.MCXENT)
  30.                 .weightInit(WeightInit.XAVIER)
  31.                 .activation("softmax")
  32.                 .nIn(2).nOut(outputNum).build())
  33.                 .backprop(true).pretrain(false)
  34.                 .build

  35.                 val network:MultiLayerNetwork = new        MultiLayerNetwork(multiLayerConfig)
  36.                 network.init
  37.                 network.setUpdater(null)
  38.                 val sparkNetwork:SparkDl4jMultiLayer = new
  39.                 SparkDl4jMultiLayer(sc,network)
  40.                 val nEpochs:Int = 6
  41.                 val listBuffer = new ListBuffer[Array[Float]]()
  42.                 (0 until nEpochs).foreach{i =>
  43.                 val net:MultiLayerNetwork =
  44.                 sparkNetwork.fit("file:///<path>/
  45.                 iris_shuffled_normalized_csv.txt",4,recordReader)
  46.                 listBuffer +=(net.params.data.asFloat().clone())
  47.                 }
  48.                 println("Parameters vs. iteration Output: ")
  49.                 (0 until listBuffer.size).foreach{i =>
  50.                 println(i+"\t"+listBuffer(i).mkString)}
  51.                 }
  52.         }
复制代码
回复

使用道具 举报

Nicolle 学生认证  发表于 2018-6-13 04:18:19 |显示全部楼层
  1. import org.apache.spark.mllib.linalg.Matrix
  2. import org.apache.spark.mllib.linalg.Vectors
  3. import org.apache.spark.mllib.linalg.distributed.RowMatrix
  4. import scala.util.Random
  5. import org.apache.spark.mllib.clustering._
  6. import org.apache.spark.ml.clustering._
  7. import org.apache.spark.mllib.clustering.KMeans
  8. import org.apache.spark.mllib.clustering.FuzzyCMeans
  9. import org.apache.spark.mllib.clustering.FuzzyCMeans._
  10. import org.apache.spark.mllib.clustering.FuzzyCMeansModel


  11. val points = Seq(
  12.       Vectors.dense(0.0, 0.0),
  13.       Vectors.dense(0.0, 0.1),
  14.       Vectors.dense(0.1, 0.0),
  15.       Vectors.dense(9.0, 0.0),
  16.       Vectors.dense(9.0, 0.2),
  17.       Vectors.dense(9.2, 0.0)
  18.     )
  19.     val rdd = sc.parallelize(points, 3).cache()

  20.     for (initMode <- Seq(KMeans.RANDOM, KMeans.K_MEANS_PARALLEL)) {

  21.       (1 to 10).map(_ * 2) foreach { fuzzifier =>

  22.         val model = org.apache.spark.mllib.clustering.FuzzyCMeans.train(rdd, k = 2, maxIterations = 10, runs = 10, initMode,
  23.           seed = 26031979L, m = fuzzifier)

  24.         val fuzzyPredicts = model.fuzzyPredict(rdd).collect()
  25.         
  26.         rdd.collect() zip fuzzyPredicts foreach { fuzzyPredict =>
  27.           println(s" Point ${fuzzyPredict._1}")
  28.           fuzzyPredict._2 foreach{clusterAndProbability =>
  29.             println(s"Probability to belong to cluster ${clusterAndProbability._1} " +
  30.               s"is ${"%.2f".format(clusterAndProbability._2)}")
  31.           }
  32.         }
  33.       }
  34.     }
  35.        
复制代码
回复

使用道具 举报

edmcheng 发表于 2018-6-13 05:51:44 |显示全部楼层
Thanks
回复

使用道具 举报

jgw1213 发表于 2018-6-13 06:13:52 |显示全部楼层
Thanks
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 我要注册

GMT+8, 2018-6-23 20:14