楼主: lion003
9413 50

Scala for Machine Learning [推广有奖]

21
Lisrelchen 发表于 2016-4-20 09:07:21 |只看作者 |坛友微信交流群
  1. Initializing clusters

  2. The initialization of the cluster centroids is important to ensure fast convergence of K-means. Solutions range from the simple random generation of centroids to the application of genetic algorithms to evaluate the fitness of centroid candidates. We selected an efficient and fast initialization algorithm developed by M. Agha and W. Ashour [4:4].

  3. The steps of the initialization are as follows:

  4. Compute the standard deviation of the set of observations.
  5. Select the dimension k {xk,0, xk,1 … xk,n} with maximum standard deviation.
  6. Rank the observations by their increasing value of standard deviation for the dimension k.
  7. Divide the ranked observations set equally into K sets {Sm}.
  8. Find the median values, size (Sm)/2.
  9. Use the corresponding observations as centroids.
  10. The initialization algorithm is implemented by the private initialize method:

  11.   def initialize(xt:XTSeries[Array[T]]): List[Cluster[T]]={
  12.    val stats = statistics(xt) //1
  13.    val maxSDevDim = Range(0,stats.size).maxBy (stats( _ ).stdDev)//2
  14.    val rankedObs = xt.zipWithIndex
  15.                      .map(x=> (x._1(maxSDevDim), x._2)) //2
  16.                      .sortWith( _._1  < _._1) //3
  17.    val halfSegSize = ((rankedObs.size>>1)/K).floor.toInt //4
  18.    val centroids = rankedObs.filter(isContained( _, halfSegSize, rankedObs.size) ).map(n => xt(n._2)) //6
  19.    Range(0, K).foldLeft(List[Cluster[T]]())((xs, i) => Cluster[T](centroids(i)) :: xs) //7
  20. }
复制代码

使用道具

22
Lisrelchen 发表于 2016-4-20 09:09:11 |只看作者 |坛友微信交流群
  1. Step 2 – cluster assignment
  2. The second step in the K-means algorithm is the assignment of the observations to the clusters for which the centroids have been initialized in step 1. This feat is accomplished by the private assignToClusters method:

  3. def assignToClusters(xt: XTSeries[Array[T]], clusters: List[Cluster[T]], membership: Array[Int]): Int = {
  4.   xt.toArray
  5.     .zipWithIndex
  6.     .filter(x => { //1
  7.        val nearestCluster = getNearestCluster(clusters, x._1)//2
  8.        val reassigned = nearestCluster != membership(x._2)
  9.        clusters(nearestCluster) += x._2 //3
  10.        membership(x._2) = nearestCluster //4
  11.        reassigned
  12.      }).size
  13. }
  14. The core of the assignment of observations to each cluster is the filter on the time series (line 1). The filter computes the index of the closest cluster and checks whether the observation is to be reassigned (line 2). The observation at the index x._2 is added to the nearest cluster, clusters(nearestCluster) (line 3). The current membership of the observations is then updated (line 4).

  15. The cluster closest to an observation data is computed by the getNearestCluster method as follows:

  16. def getNearestCluster(clusters: List[Cluster[T]], x:Array[T]): Int={
  17.   clusters.zipWithIndex..foldLeft((Double.MaxValue,0))((p,c) => {
  18.       val measure = distance(c._1.center, x)
  19.       if(measure < p._1) (measure, c._2) else p
  20.    })._2
复制代码

使用道具

23
Lisrelchen 发表于 2016-4-20 09:10:17 |只看作者 |坛友微信交流群
  1. Step 3 – iterative reconstruction
  2. The final step is to implement the iterative computation of the reconstruction error. In this implementation, the iteration terminates when no more observations are reassigned to different clusters. As with other data processing units, the extraction of K-means clusters is encapsulated by the pipe operator |>, so that clustering can be integrated into a workflow using dependency injection described in the Dependency Injection section in Chapter 2, Hello World!.

  3. The generation of the K clusters is executed by the data transformation |>:

  4. def |> :PartialFunction[XTSeries[Array[T]], List[Cluster[T]]] = {
  5.   case xt: XTSeries[Array[T]] if(xt.size>2 && xt(0).size>0) => {
  6.     val clusters = initialize(xt)  //1

  7.     if( clusters.isEmpty) List.empty
  8.     else  {
  9.       val membership = Array.fill(xt.size)(0)
  10.       val reassigned = assignToClusters(xt,clusters,membership)//2
  11.       var newClusters: List[Cluster[T]] = List.empty
  12.       Range(0, maxIters).find( _ => {
  13.         newClusters = clusters.map( c => {
  14.           if( c.size > 0) c.moveCenter(xt, dimension(xt))
  15.           else clusters.filter( _.size > 0)
  16.                        .maxBy( _.stdDev(xt, distance))
  17.         }) //3
  18.         assignToClusters(xt, newClusters, membership) == 0
  19.       }) match {
  20.         case Some(index) => newClusters
  21.         case None => { … }
  22.     } //4
  23.   }
  24. }
复制代码

使用道具

24
Lisrelchen 发表于 2016-4-20 09:17:49 |只看作者 |坛友微信交流群
  1. /**
  2. * Copyright (c) 2013-2015  Patrick Nicolas - Scala for Machine Learning - All rights reserved
  3. *
  4. * The source code in this file is provided by the author for the sole purpose of illustrating the
  5. * concepts and algorithms presented in "Scala for Machine Learning". It should not be used to
  6. * build commercial applications.
  7. * ISBN: 978-1-783355-874-2 Packt Publishing.
  8. * Unless required by applicable law or agreed to in writing, software is distributed on an "AS IS" BASIS,
  9. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  10. *
  11. * Version 0.98
  12. */
  13. package org.scalaml.supervised.bayes

  14. // Scala classes
  15. import scala.collection.mutable.ArraySeq
  16. import scala.annotation.implicitNotFound
  17. import scala.util.{Try, Success, Failure}

  18. import org.apache.log4j.Logger

  19. import org.scalaml.core.XTSeries
  20. import org.scalaml.core.Types.ScalaMl._
  21. import org.scalaml.stats.{Validation, ClassValidation, Stats}
  22. import org.scalaml.core.Design.PipeOperator
  23. import org.scalaml.supervised.Supervised
  24. import org.scalaml.util.DisplayUtils
  25. import NaiveBayesModel._, XTSeries._, Stats._

  26.    
  27.                 /**
  28.                  * <p>Generic Binomial Naive Bayes classification class. The class is used for both training
  29.                  * and run-time classification. The training of the model is executed during the instantiation
  30.                  * of the class to avoid having an uninitialized model. A conversion from a parameterized
  31.                  * array, Array[T] to an array of double, DblVector, has to be implicitly defined for
  32.                  * training the model.<br>
  33.                  * As a classifier, the method implement the generic data transformation PipeOperator and the
  34.                  * Supervised interface.<br>
  35.                  * <pre><span style="font-size:9pt;color: #351c75;font-family: &quot;Helvetica Neue&quot;
  36.                  * ,Arial,Helvetica,sans-serif;">
  37.                  *  Naive Bayes formula: p(C}x) = p(x|C).p(C)/p(x) => p(C|x) = p(x1|C).p(x2|C). .. p(xn|C).p(C)</span></pre></p>
  38.                  * @constructor Instantiate a parameterized NaiveBayes model
  39.                  * @param smoothing Laplace or Lidstone smoothing factor
  40.                  * @param xt  Input labeled time series used for training
  41.                  * @param density Density function used to compute the discriminant
  42.                  *
  43.                  * @throws IllegalArgumentException if one of the class parameters is undefined
  44.                  * @author Patrick Nicolas
  45.                  * @since February 13, 2014
  46.                  * @note Scala for Machine learning Chapter 5 Naive Bayes Models / Naive Bayes classifiers
  47.                  */
  48. final class NaiveBayes[T <% Double](
  49.                 smoothing: Double,
  50.                 xt: XTSeries[(Array[T], Int)],
  51.                 density: Density)        extends PipeOperator[XTSeries[Array[T]], Array[Int]] with Supervised[T] {

  52.         import NaiveBayes._
  53.         check(smoothing, xt)
  54.        
  55.         private val logger = Logger.getLogger("NaiveBayes")
  56.        
  57.                 // The model is instantiated during training for both
  58.                 // classes if the training is successful. It is None otherwise
  59.         private[this] val model: Option[BinNaiveBayesModel[T]] =
  60.                         Try(BinNaiveBayesModel[T](train(1), train(0), density)) match {
  61.                 case Success(nb) => Some(nb)
  62.                 case Failure(e) => DisplayUtils.none("NaiveBayes.model", logger, e)
  63.         }
  64.                
  65.                 /**
  66.                  * <p>Run-time classification of a time series using the Naive Bayes model. The method invoke
  67.                  * the actual classification method in one of the NaiveBayes models.</p>
  68.                  * @throws MatchError if the input time series is undefined or have no elements or the
  69.                  * model was not properly trained
  70.                  * @return PartialFunction of time series of elements of type T as input to the Naive Bayes
  71.                  * and array of class indices as output
  72.                  */
  73.         override def |> : PartialFunction[XTSeries[Array[T]], Array[Int]] = {
  74.                 case xt: XTSeries[Array[T]] if( !xt.isEmpty && model != None) =>
  75.                         xt.toArray.map( model.get.classify( _))
  76.         }
  77.        
  78.                 /**
  79.                  * <p>Compute the F1 statistics for the Naive Bayes.</p>
  80.                  * @param xt Time series of features of type Array[T], and class indices as labels
  81.                  * @param index of the class, the time series or observation should belong to
  82.                  * @return F1 measure if the model has been properly trained (!= None), None otherwise
  83.                  */
  84.         override def validate(xt: XTSeries[(Array[T], Int)], index: Int): Option[Double] = model match {
  85.                 case Some(m) => Some(ClassValidation(xt.map(x =>(m.classify(x._1), x._2)) , index).f1)
  86.                 case None => DisplayUtils.none("NaiveBayes Model undefined", logger)
  87.         }


  88.                 /**
  89.                  * Textual representation of the Naive Bayes classifier with labels for features.
  90.                  * It returns "No Naive Bayes model" if no model exists
  91.                  * @return Stringized features with their label if model exists.
  92.                  */
  93.         def toString(labels: Array[String]): String =
  94.                 model.map(m => if( labels.isEmpty ) m.toString else m.toString(labels))
  95.                                 .getOrElse("No Naive Bayes model")

  96.                 /**
  97.                  * Default textual representation of the Naive Bayes classifier with labels for features.
  98.                  * It returns "No Naive Bayes model" if no model exists
  99.                  * @return Stringized features with their label if model exists.
  100.                  */
  101.         override def toString: String = toString(Array.empty)
  102.    
  103.                 /**
  104.                  * Train the Naive Bayes model on one of the two classes (positive = 1) or negative (=0)
  105.                  */
  106.         @implicitNotFound("NaiveBayes; Conversion from array[T] to DblVector is undefined")
  107.         private def train(label: Int)(implicit f: Array[T] => DblVector): Likelihood[T] = {
  108.                 val xi = xt.toArray
  109.                                 // Extract then filter each observation to be associated to a specific label.
  110.                                 // The implicit conversion from Array of type T to Array of type Double is invoked
  111.                 val values = xi.filter( _._2 == label).map(x => f(x._1))
  112.                 assert( !values.isEmpty, "NaiveBayes.train Filtered value is undefined")
  113.                
  114.                         // Gets the dimension of a feature
  115.                 val dim = xi(0)._1.size
  116.                 val vSeries = XTSeries[DblVector](values)
  117.        
  118.                         // Create a likelihood instance for this class 'label'. The
  119.                         // tuple (mean, standard deviation) (2nd argument) is computed
  120.                         // by invoking XTSeries.statistics then the Lidstone mean adjustment.
  121.                         // The last argument, class likelihood p(C) is computed as the ratio of the
  122.                         // number of observations associated to this class/label over total number of observations.
  123.                 Likelihood(label,
  124.                                 statistics(vSeries).map(stat => (stat.lidstoneMean(smoothing, dim), stat.stdDev) ),
  125.                                 values.size.toDouble/xi.size)
  126.         }
  127. }

  128.                 /**
  129.                  * Singleton that define the constructors for the NaiveBayes classifier and
  130.                  * validate its parameters
  131.                  * @author Patrick Nicolas
  132.                  * @since February 13, 2014
  133.                  * @note Scala for Machine learning Chapter 5 Naive Bayes Model
  134.                  */
  135. object NaiveBayes {       
  136.                 /**
  137.                  * Default constructor for the NaiveBayes class
  138.                  * @param smoothing Laplace or Lidstone smoothing factor
  139.                  * @param xt Input labeled time series used for training
  140.                  * @param density Density function used to compute the discriminant
  141.                  */
  142.         def apply[T <% Double](
  143.                         smoothing: Double,
  144.                         xt: XTSeries[(Array[T], Int)],
  145.                         density: Density): NaiveBayes[T] = new NaiveBayes[T](smoothing, xt, density)
  146.                
  147.                 /**
  148.                  * Constructor for the NaiveBayes class with a Laplace smoothing function and
  149.                  * a Gaussian density function.
  150.                  * @param xt  Input labeled time series used for training
  151.                  */
  152.         def apply[T <% Double](xt: XTSeries[(Array[T], Int)]): NaiveBayes[T] =
  153.                         new NaiveBayes[T](1.0, xt, gauss)
  154.                
  155.         /*
  156.         def |>[T <% Double](): PartialFunction[XTSeries[Array[T]], Array[Int]] = {
  157.                 case xt: XTSeries[Array[T]] if(!xt.isEmpty && {
  158.                         model =
  159.                   model != None }) =>
  160.                         xt.toArray.map( model.get.classify( _))
  161.         }
  162.         *
  163.         */
  164.         def |>[T <% Double](model: Option[BinNaiveBayesModel[T]]): PartialFunction[XTSeries[Array[T]], Array[Int]] = {
  165.                 case xt: XTSeries[Array[T]] if( !xt.isEmpty && model != None) =>
  166.                         xt.toArray.map( model.get.classify( _))
  167.         }
  168.        
  169.        
  170.        
  171.         private def check[T <% Double](smoothing: Double, xt: XTSeries[(Array[T], Int)]): Unit = {
  172.                 require(smoothing > 0.0 && smoothing <= 1.0,
  173.                           s"NaiveBayes: Laplace or Lidstone smoothing factor $smoothing is out of range")
  174.                 require( !xt.isEmpty,
  175.                                 "NaiveBayes: Time series input for training Naive Bayes is undefined")
  176.         }
  177. }


  178. // ------------------------------  EOF --------------------------------------------
复制代码

使用道具

25
Lisrelchen 发表于 2016-4-20 09:18:49 |只看作者 |坛友微信交流群
  1. /**
  2. * Copyright (c) 2013-2015  Patrick Nicolas - Scala for Machine Learning - All rights reserved
  3. *
  4. * The source code in this file is provided by the author for the sole purpose of illustrating the
  5. * concepts and algorithms presented in "Scala for Machine Learning". It should not be used to
  6. * build commercial applications.
  7. * ISBN: 978-1-783355-874-2 Packt Publishing.
  8. * Unless required by applicable law or agreed to in writing, software is distributed on an "AS IS" BASIS,
  9. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  10. *
  11. * Version 0.98
  12. */
  13. package org.scalaml.supervised.bayes

  14. import org.scalaml.stats.Stats
  15. import org.scalaml.util.{FormatUtils, DisplayUtils}
  16. import org.scalaml.core.Types.ScalaMl.XYTSeries
  17. import NaiveBayesModel._

  18.                 /**
  19.                  * <p>Class that represents a likelihood for each feature for Naive Bayes classifier.<br>
  20.                  * The prior consists of a label (index), the mean of the prior of each dimension of the model,
  21.                  * the standard deviation of the prior of each dimension of the model and the class likeliHood.<br>
  22.                  * The Naive Bayes assume that the dimension of the model are independent, making the log of
  23.                  * the prior additive.</p>
  24.                  * @constructor Create a likelihood for a specific class.
  25.                  * @throws IllegalArgumentException if the array of mean and standard deviation of the
  26.                  * likelihood is undefined or if the class likelihood (prior) is out of range ]0,1]
  27.                  * @param label  Name or label of the class or prior for which the likelihood is computed.
  28.                  * @param muSigma Array of tuples (mean, standard deviation) of the prior observations for the model
  29.                  * @param prior  Probability of occurrence for the class specified by the label.
  30.                  *
  31.                  * @author Patrick Nicolas
  32.                  * @since March 11, 2014
  33.                  * @note Scala for Machine Learning Chapter 5 Naive Bayes Models
  34.                  */
  35. protected class Likelihood[T <% Double](val label: Int, val muSigma: XYTSeries, val prior: Double) {
  36.         import Stats._, Likelihood._
  37.   
  38.         check(muSigma, prior)
  39.   
  40.                 /**
  41.                  * <p>Compute the log p(C|x of log of the conditional probability of the class given an
  42.                  * observation, obs and a probability density distribution.<br>
  43.                  * The default density probability function is Normal(0, 1)</p>
  44.                  * @param obs parameterized observation
  45.                  * @param density probability density function (default Gauss)
  46.                  * @throws IllegalArgumentException if the observations are undefined
  47.                  * @return log of the conditional probability p(C|x)
  48.                  */
  49.         final def score(obs: Array[T], density: Density): Double = {
  50.                 require( !obs.isEmpty, "Likelihood.score Undefined observations")
  51.                
  52.                         // Compute the Log of sum of the likelihood and the class prior probability
  53.                         // The log likelihood is computed by adding the log of the density for each dimension.
  54.                         // Sum {log p(xi|C) }
  55.                 (obs, muSigma).zipped.foldLeft(0.0)((post, xms) => {
  56.                         val mean = xms._2._1                // mean
  57.                         val stdDev = xms._2._2        // standard deviation
  58.                         val _obs = xms._1
  59.                         val logLikelihood = density(mean, stdDev, _obs)
  60.                                
  61.                                 // Avoid large value by setting a minimum value for the density probability
  62.                         post + Math.log(if(logLikelihood <  MINLOGARG) MINLOGVALUE else logLikelihood)
  63.                 }) + Math.log(prior) // Add the class likelihood p(C)
  64.         }
  65.        
  66.                 /**
  67.                  * <p>DisplayUtils the content of this Likelihood class with associated labels.</p>
  68.                  * @param labels Label of variables used to display content
  69.                  */
  70.         def toString(labels: Array[String]): String = {
  71.                 import org.scalaml.core.Types.ScalaMl
  72.                
  73.                 val muSigmaStr = muSigma.map(musig => (musig._1, if(musig._2 > 0.0) musig._2 else -1.0))
  74.                         // Format the tuple muSigma= (mean, standard deviation) and the Prior
  75.                 FormatUtils.format(muSigma, "Label\tMeans", "Standard Deviation", FormatUtils.MediumFormat, labels) +
  76.                 FormatUtils.format(prior, "Class likelihood", FormatUtils.MediumFormat)
  77.         }
  78.        
  79.         override def toString: String = toString(Array.empty)
  80. }


  81.                 /**
  82.                  * <p>Companion object for the Naive Bayes Likelihood class. The singleton
  83.                  * is used to define the constructor apply for the class.</p>
  84.                  * @author Patrick Nicolas
  85.                  * @since March 11, 2014
  86.                  * @note Scala for Machine Learning Chapter 5 Naive Bayes Models
  87.                  */
  88. object Likelihood {
  89.         private val MINLOGARG = 1e-32
  90.         private val MINLOGVALUE = -MINLOGARG

  91.                 /**
  92.                  * Default constructor for he class Likelihood.
  93.                  * @param label  Name or label of the class or prior for which the likelihood is computed.
  94.                  * @param muSigma Array of tuples (mean, standard deviation) of the prior observations
  95.                  * for the model
  96.                  * @param prior  Probability of occurrence for the class specified by the label.
  97.                  */
  98.         def apply[T <% Double](label: Int, muSigma: XYTSeries, prior: Double): Likelihood[T] =
  99.                 new Likelihood[T](label, muSigma, prior)
  100.    
  101.         private def check(muSigma: XYTSeries, prior: Double): Unit =  {
  102.                 require( !muSigma.isEmpty,
  103.                                 "Likelihood.check Historical mean and standard deviation is undefined")
  104.                 require(prior > 0.0  && prior <= 1.0,
  105.                                 s"Likelihood.check Prior for the NB prior $prior is out of range")
  106.         }
  107. }


  108. // --------------------------------  EOF --------------------------------------------------------------
复制代码

使用道具

26
Lisrelchen 发表于 2016-4-20 09:27:04 |只看作者 |坛友微信交流群
  1. package org.scalaml.supervised.bayes

  2. import org.scalaml.stats.Stats
  3. import org.scalaml.core.Design.Model
  4. import NaiveBayesModel._

  5. abstract class NaiveBayesModel[T <% Double](val density: Density) extends Model {

  6.         def classify(x: Array[T]): Int
  7.         def toString(labels: Array[String]): String
  8. }

  9. object NaiveBayesModel {

  10.         type Density= (Double*) => Double
  11. }


  12. protected class BinNaiveBayesModel[T <% Double](
  13.                 positives: Likelihood[T],
  14.                 negatives: Likelihood[T],
  15.                 density: Density) extends NaiveBayesModel[T](density) {

  16.         override def classify(x: Array[T]): Int = {
  17.                 require( !x.isEmpty,
  18.                                 "BinNaiveBayesModel.classify Undefined observations")
  19.                
  20.                 // Simply select one of the two classes with the highest log posterior probability
  21.                 if (positives.score(x, density) > negatives.score(x, density)) 1 else 0
  22.         }

  23.                        
  24.                        
  25.        
  26.         override def toString(labels: Array[String]): String = {
  27.                 require( !labels.isEmpty, "BinNaiveBayesModel.toString Undefined labels")
  28.                 s"\nPositive class\n${positives.toString(labels)}\nNegative class\n${negatives.toString(labels)}"
  29.         }
  30.        
  31.         override def toString: String =
  32.                         s"\nPositive\n${positives.toString}\nNegative\n${negatives.toString}"
  33. }


  34. object BinNaiveBayesModel {

  35.         def apply[T <% Double](
  36.                         positives: Likelihood[T],
  37.                         negatives: Likelihood[T],
  38.                         density: Density): BinNaiveBayesModel[T] =
  39.                                         new BinNaiveBayesModel(positives, negatives, density)
  40. }

  41. protected class MultiNaiveBayesModel[T <% Double](
  42.                 likelihoodSet: List[Likelihood[T]],
  43.                 density: Density) extends NaiveBayesModel[T](density) {
  44.   
  45.         require(!likelihoodSet.isEmpty,
  46.                         "MultiNaiveBayesModel Cannot classify using Multi-NB with undefined classes")
  47.   
  48.         override def classify(x: Array[T]): Int = {
  49.                 require( !x.isEmpty, "MultiNaiveBayesModel.classify Vector input is undefined")
  50.                
  51.                         // The classification is performed by ordering the class according to the
  52.                         // log of their posterior probability and selecting the top one (highest
  53.                         // posterior probability)
  54.                 likelihoodSet.sortWith((p1, p2) => p1.score(x, density) > p2.score(x, density)).head.label
  55.         }

  56.        
  57.        
  58.         override def toString(labels: Array[String]): String = {
  59.                 require( !labels.isEmpty, "MultiNaiveBayesModel.toString Vector input is undefined")
  60.                          
  61.                 val buf = new StringBuilder
  62.                 likelihoodSet.zipWithIndex.foreach(l => {
  63.                         buf.append(s"\nclass ${l._2}: ${l._1.toString(labels)}")
  64.                 })
  65.                 buf.toString
  66.         }
  67. }


  68. object MultiNaiveBayesModel {

  69.         def apply[T <% Double](
  70.                         likelihoodSet: List[Likelihood[T]],
  71.                         density: Density): MultiNaiveBayesModel[T] =
  72.                                         new MultiNaiveBayesModel[T](likelihoodSet, density)
  73. }


  74. // --------------------------------  EOF --------------------------------------------------------------
复制代码

使用道具

27
Lisrelchen 发表于 2016-4-20 09:31:56 |只看作者 |坛友微信交流群
  1. package org.scalaml.supervised.crf
  2.        
  3.         // Scala standard library
  4. import scala.util.{Try, Success, Failure}
  5.         // 3rd party frameworks or libraries
  6. import iitb.CRF.{CRF, CrfParams, DataSequence, DataIter, FeatureGenerator}
  7. import iitb.Model.{FeatureGenImpl, CompleteModel}
  8. import org.apache.log4j.Logger
  9.         // ScalaMl classes
  10. import org.scalaml.core.XTSeries
  11. import org.scalaml.core.Types.ScalaMl._
  12. import org.scalaml.core.Design.{PipeOperator, Model}
  13. import org.scalaml.supervised.Supervised
  14. import org.scalaml.util.DisplayUtils
  15. import org.scalaml.workflow.data.DataSource
  16. import CrfConfig._


  17. final protected class CrfModel(val weights: DblVector) extends Model {
  18.         require(!weights.isEmpty, "CrfModel Cannot create a model with undefined weights")

  19.         def this(className: String) =
  20.                         this({ Model.read(className).map( _.split(",").map(_.toDouble)).getOrElse(Array.empty) })
  21.        

  22.         override def >> : Boolean = write(weights.mkString(","))
  23. }


  24. final class Crf(
  25.                 nLabels: Int,
  26.                 config: CrfConfig,
  27.                 delims: CrfSeqDelimiter,
  28.                 taggedObs: String) extends PipeOperator[String, Double] {
  29.         import Crf._
  30.         check(nLabels)
  31.   
  32.         private val logger = Logger.getLogger("Crf")

  33.         class TaggingGenerator(nLabels: Int)        
  34.                         extends FeatureGenImpl(new CompleteModel(nLabels), nLabels, true)


  35.         private[this] val features = new TaggingGenerator(nLabels)
  36.         private[this] val crf = new CRF(nLabels, features, config.params)


  37.         private val model: Option[CrfModel] = train match {
  38.                 case Success(model) => Some(model)
  39.                 case Failure(e) => DisplayUtils.none("Crf.model could not be created", logger, e)
  40.         }
  41.   

  42.         override def |> : PartialFunction[String, Double] = {
  43.                 case obs: String if( !obs.isEmpty && model != None) => {
  44.                         val dataSeq =  new CrfTrainingSet(nLabels, obs, delims.obsDelim)
  45.                         crf.apply(dataSeq)
  46.                 }
  47.         }
  48.        

  49.         final def weights: Option[DblVector] = model.map( _.weights)


  50.         private def train: Try[CrfModel] = {
  51.                 val seqIter = CrfSeqIter(nLabels, taggedObs, delims)
  52.                 Try {
  53.                         features.train(seqIter)
  54.                         new CrfModel(crf.train(seqIter))
  55.                 }
  56.         }
  57. }


  58. object Crf {
  59.         final val NUM_LABELS_LIMITS = (1, 512)
  60.        

  61.         def apply(nLabels: Int, state: CrfConfig, delims: CrfSeqDelimiter, taggedObs: String): Crf =
  62.                 new Crf(nLabels, state, delims, taggedObs)
  63.   
  64.   
  65.         private def check(nLabels: Int): Unit = {
  66.                 require(nLabels > NUM_LABELS_LIMITS._1 && nLabels < NUM_LABELS_LIMITS._2,
  67.                                 s"Number of labels for generating tags for CRF $nLabels is out of range")
  68.         }
  69. }


  70. // ---------------------------- EOF ------------------------------------------------------
复制代码

使用道具

28
Lisrelchen 发表于 2016-4-20 09:34:39 |只看作者 |坛友微信交流群
  1. package org.scalaml.supervised.crf

  2. import iitb.CRF.{CRF, CrfParams, DataSequence, DataIter, FeatureGenerator}
  3. import iitb.Model.{FeatureGenImpl, CompleteModel}
  4. import org.scalaml.core.XTSeries
  5. import org.scalaml.workflow.data.DataSource
  6. import org.scalaml.core.Design.{PipeOperator, Config}
  7. import java.io.IOException
  8. import org.scalaml.core.Types.ScalaMl._


  9. protected class CrfConfig(w0: Double, maxIters: Int, lambda: Double, eps: Double) extends Config {
  10.         import CrfConfig._
  11.         check(w0, maxIters, lambda, eps)
  12.                

  13.         val params = s"initValue ${String.valueOf(w0)} maxIters ${String.valueOf(maxIters)} " +
  14.                         s"lambda ${String.valueOf(lambda)} scale true eps $eps"
  15. }


  16. object CrfConfig {
  17.         private val INIT_WEIGHTS_LIMITS = (0.1, 2.5)
  18.         private val MAX_ITERS_LIMITS = (10, 250)
  19.         private val LAMBDA_LIMITS = (1e-15, 1.5)
  20.         private val EPS_LIMITS = (1e-5, 0.2)
  21.        

  22.         def apply(w0: Double, maxIters: Int, lambda: Double, eps:Double): CrfConfig =
  23.                 new CrfConfig(w0, maxIters, lambda, eps)
  24.        
  25.        
  26.         private def check(w0: Double, maxIters: Int, lambda: Double,  eps: Double): Unit = {
  27.                 require(w0 >= INIT_WEIGHTS_LIMITS._1 && w0 <= INIT_WEIGHTS_LIMITS._2,
  28.                                 s"Initialization of the CRF weights $w0 is out of range")
  29.                 require( maxIters >= MAX_ITERS_LIMITS._1 && maxIters <= MAX_ITERS_LIMITS._2,
  30.                                 s"Maximum number of iterations for CRF training $maxIters is out of range")
  31.                 require( lambda >= LAMBDA_LIMITS._1 && lambda <= LAMBDA_LIMITS._2,
  32.                                 s"The factor for the L2 penalty for CRF $lambda is out of range")
  33.                 require( eps > EPS_LIMITS._1 && eps<= EPS_LIMITS._2,
  34.                                  s"The convergence criteria for the CRF training $eps is out of range")
  35.     }
  36. }



  37. // ---------------------------- EOF ------------------------------------------------------
复制代码

使用道具

29
Lisrelchen 发表于 2016-4-20 09:38:51 |只看作者 |坛友微信交流群
  1. package org.scalaml.supervised.crf

  2.         // IITB library classes
  3. import iitb.CRF.{CRF, CrfParams, DataSequence, DataIter}
  4. import iitb.Model.FeatureImpl
  5. import iitb.Segment.{DataCruncher, LabelMap}
  6.         // ScalaMl classes
  7. import org.scalaml.core.Types

  8. class CrfSeqDelimiter(val obsDelim: String, val labelsDelim: String, val trainingDelim: String) {
  9.         require(obsDelim != Types.nullString,
  10.                         "Delimiter for observations in CRF training sequence is undefined")
  11.         require(labelsDelim != Types.nullString,
  12.                         "Delimiter for labels in CRF training sequence is undefined")
  13.         require(trainingDelim != Types.nullString,
  14.                         "Delimiter for training sequences in CRF training sequence is undefined")
  15. }

  16. class CrfSeqIter(val nLabels: Int, val input: String, val delim: CrfSeqDelimiter) extends DataIter {
  17.         import CrfSeqIter._
  18.         check(nLabels, input, delim)
  19.        

  20.         lazy val trainData = DataCruncher.readTagged(nLabels, input, input, delim.obsDelim,
  21.                         delim.labelsDelim, delim.trainingDelim, new LabelMap)
  22.    

  23.         override def hasNext: Boolean = trainData.hasNext
  24.      

  25.         override def next: DataSequence = trainData.next
  26.    

  27.         override def startScan: Unit = trainData.startScan
  28. }


  29. object CrfSeqIter {
  30.         private val MAX_NUM_LABELS = 1000
  31.         private val DEFAULT_SEQ_DELIMITER = new CrfSeqDelimiter(",\t/ -():.;'?#`&_", "//", "\n")


  32.         def apply(
  33.                         nLabels: Int,
  34.                         input: String,
  35.                         delim: CrfSeqDelimiter): CrfSeqIter = new CrfSeqIter(nLabels, input, delim)
  36.        
  37.         def apply(nLabels: Int, input: String): CrfSeqIter =        
  38.                         new CrfSeqIter(nLabels, input, DEFAULT_SEQ_DELIMITER)
  39.    
  40.         private def check(nLabels: Int, input: String, delim: CrfSeqDelimiter): Unit = {
  41.                 require(nLabels > 0 && nLabels < MAX_NUM_LABELS,
  42.                                 s"CrfSeqIter.check Number of labels for the CRF model $nLabels is out of range")
  43.                 require(input != Types.nullString,
  44.                                 "CrfSeqIter.check  input for the CRF training files is undefined")
  45.         }
  46. }

  47. // ---------------------------- EOF ------------------------------------------------------
复制代码

使用道具

30
Lisrelchen 发表于 2016-4-20 09:40:36 |只看作者 |坛友微信交流群
  1. package org.scalaml.supervised.crf

  2. import iitb.CRF.{CRF, CrfParams, DataSequence, DataIter}
  3. import java.util.Properties
  4. import iitb.Model.FeatureImpl
  5. import iitb.Segment.{DataCruncher, LabelMap}


  6. class CrfTrainingSet(val nLabels: Int, val entry: String, val delim: String) extends DataSequence {
  7.         import CrfTrainingSet._
  8.        
  9.         check(nLabels, entry, delim)
  10.        
  11.         private[this] val words: Array[String] = entry.split(delim)
  12.         private[this] val map: Array[Int] = new Array[Int](nLabels)
  13.    
  14.         override def set_y(k: Int, label: Int): Unit = map(k) = label
  15.         override def y(k: Int): Int = map(k)
  16.         override def length: Int = words.size
  17.         override def x(k: Int): Object = words(k)
  18. }


  19. object CrfTrainingSet {
  20.         import Crf._
  21.        
  22.         private def check(nLabels: Int, entry: String, delim: String): Unit = {
  23.                 require(nLabels >= NUM_LABELS_LIMITS._1 && nLabels < NUM_LABELS_LIMITS._2)
  24.         }
  25. }


  26. // ---------------------------- EOF ------------------------------------------------------
复制代码

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加JingGuanBbs
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-24 06:02