Scala for Machine Learning - 第3页

21楼

Lisrelchen(真实交易用户) 发表于 2016-4-20 09:07:21

Initializing clusters
The initialization of the cluster centroids is important to ensure fast convergence of K-means. Solutions range from the simple random generation of centroids to the application of genetic algorithms to evaluate the fitness of centroid candidates. We selected an efficient and fast initialization algorithm developed by M. Agha and W. Ashour [4:4].
The steps of the initialization are as follows:
Compute the standard deviation of the set of observations.
Select the dimension k {xk,0, xk,1 … xk,n} with maximum standard deviation.
Rank the observations by their increasing value of standard deviation for the dimension k.
Divide the ranked observations set equally into K sets {Sm}.
Find the median values, size (Sm)/2.
Use the corresponding observations as centroids.
The initialization algorithm is implemented by the private initialize method:
def initialize(xt:XTSeries[Array[T]]): List[Cluster[T]]={
val stats = statistics(xt) //1
val maxSDevDim = Range(0,stats.size).maxBy (stats( _ ).stdDev)//2
val rankedObs = xt.zipWithIndex
.map(x=> (x._1(maxSDevDim), x._2)) //2
.sortWith( _._1 < _._1) //3
val halfSegSize = ((rankedObs.size>>1)/K).floor.toInt //4
val centroids = rankedObs.filter(isContained( _, halfSegSize, rankedObs.size) ).map(n => xt(n._2)) //6
Range(0, K).foldLeft(List[Cluster[T]]())((xs, i) => Cluster[T](centroids(i)) :: xs) //7
}

复制代码

22楼

Lisrelchen(真实交易用户) 发表于 2016-4-20 09:09:11

Step 2 – cluster assignment
The second step in the K-means algorithm is the assignment of the observations to the clusters for which the centroids have been initialized in step 1. This feat is accomplished by the private assignToClusters method:
def assignToClusters(xt: XTSeries[Array[T]], clusters: List[Cluster[T]], membership: Array[Int]): Int = {
xt.toArray
.zipWithIndex
.filter(x => { //1
val nearestCluster = getNearestCluster(clusters, x._1)//2
val reassigned = nearestCluster != membership(x._2)
clusters(nearestCluster) += x._2 //3
membership(x._2) = nearestCluster //4
reassigned
}).size
}
The core of the assignment of observations to each cluster is the filter on the time series (line 1). The filter computes the index of the closest cluster and checks whether the observation is to be reassigned (line 2). The observation at the index x._2 is added to the nearest cluster, clusters(nearestCluster) (line 3). The current membership of the observations is then updated (line 4).
The cluster closest to an observation data is computed by the getNearestCluster method as follows:
def getNearestCluster(clusters: List[Cluster[T]], x:Array[T]): Int={
clusters.zipWithIndex..foldLeft((Double.MaxValue,0))((p,c) => {
val measure = distance(c._1.center, x)
if(measure < p._1) (measure, c._2) else p
})._2

复制代码

23楼

Lisrelchen(真实交易用户) 发表于 2016-4-20 09:10:17

Step 3 – iterative reconstruction
The final step is to implement the iterative computation of the reconstruction error. In this implementation, the iteration terminates when no more observations are reassigned to different clusters. As with other data processing units, the extraction of K-means clusters is encapsulated by the pipe operator |>, so that clustering can be integrated into a workflow using dependency injection described in the Dependency Injection section in Chapter 2, Hello World!.
The generation of the K clusters is executed by the data transformation |>:
def |> :PartialFunction[XTSeries[Array[T]], List[Cluster[T]]] = {
case xt: XTSeries[Array[T]] if(xt.size>2 && xt(0).size>0) => {
val clusters = initialize(xt) //1
if( clusters.isEmpty) List.empty
else {
val membership = Array.fill(xt.size)(0)
val reassigned = assignToClusters(xt,clusters,membership)//2
var newClusters: List[Cluster[T]] = List.empty
Range(0, maxIters).find( _ => {
newClusters = clusters.map( c => {
if( c.size > 0) c.moveCenter(xt, dimension(xt))
else clusters.filter( _.size > 0)
.maxBy( _.stdDev(xt, distance))
}) //3
assignToClusters(xt, newClusters, membership) == 0
}) match {
case Some(index) => newClusters
case None => { … }
} //4
}
}

复制代码

24楼

Lisrelchen(真实交易用户) 发表于 2016-4-20 09:17:49

/**
* Copyright (c) 2013-2015 Patrick Nicolas - Scala for Machine Learning - All rights reserved
*
* The source code in this file is provided by the author for the sole purpose of illustrating the
* concepts and algorithms presented in "Scala for Machine Learning". It should not be used to
* build commercial applications.
* ISBN: 978-1-783355-874-2 Packt Publishing.
* Unless required by applicable law or agreed to in writing, software is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
*
* Version 0.98
*/
package org.scalaml.supervised.bayes
// Scala classes
import scala.collection.mutable.ArraySeq
import scala.annotation.implicitNotFound
import scala.util.{Try, Success, Failure}
import org.apache.log4j.Logger
import org.scalaml.core.XTSeries
import org.scalaml.core.Types.ScalaMl._
import org.scalaml.stats.{Validation, ClassValidation, Stats}
import org.scalaml.core.Design.PipeOperator
import org.scalaml.supervised.Supervised
import org.scalaml.util.DisplayUtils
import NaiveBayesModel._, XTSeries._, Stats._
/**
* Generic Binomial Naive Bayes classification class. The class is used for both training
* and run-time classification. The training of the model is executed during the instantiation
* of the class to avoid having an uninitialized model. A conversion from a parameterized
* array, Array[T] to an array of double, DblVector, has to be implicitly defined for
* training the model.
* As a classifier, the method implement the generic data transformation PipeOperator and the
* Supervised interface.
* <pre><span style="font-size:9pt;color: #351c75;font-family: "Helvetica Neue"
* ,Arial,Helvetica,sans-serif;">
* Naive Bayes formula: p(C}x) = p(x|C).p(C)/p(x) => p(C|x) = p(x1|C).p(x2|C). .. p(xn|C).p(C)</pre>
* @constructor Instantiate a parameterized NaiveBayes model
* @param smoothing Laplace or Lidstone smoothing factor
* @param xt Input labeled time series used for training
* @param density Density function used to compute the discriminant
*
* @throws IllegalArgumentException if one of the class parameters is undefined
* @author Patrick Nicolas
* @since February 13, 2014
* @note Scala for Machine learning Chapter 5 Naive Bayes Models / Naive Bayes classifiers
*/
final class NaiveBayes[T <% Double](
smoothing: Double,
xt: XTSeries[(Array[T], Int)],
density: Density) extends PipeOperator[XTSeries[Array[T]], Array[Int]] with Supervised[T] {
import NaiveBayes._
check(smoothing, xt)
private val logger = Logger.getLogger("NaiveBayes")
// The model is instantiated during training for both
// classes if the training is successful. It is None otherwise
private[this] val model: Option[BinNaiveBayesModel[T]] =
Try(BinNaiveBayesModel[T](train(1), train(0), density)) match {
case Success(nb) => Some(nb)
case Failure(e) => DisplayUtils.none("NaiveBayes.model", logger, e)
}
/**
* Run-time classification of a time series using the Naive Bayes model. The method invoke
* the actual classification method in one of the NaiveBayes models.
* @throws MatchError if the input time series is undefined or have no elements or the
* model was not properly trained
* @return PartialFunction of time series of elements of type T as input to the Naive Bayes
* and array of class indices as output
*/
override def |> : PartialFunction[XTSeries[Array[T]], Array[Int]] = {
case xt: XTSeries[Array[T]] if( !xt.isEmpty && model != None) =>
xt.toArray.map( model.get.classify( _))
}
/**
* Compute the F1 statistics for the Naive Bayes.
* @param xt Time series of features of type Array[T], and class indices as labels
* @param index of the class, the time series or observation should belong to
* @return F1 measure if the model has been properly trained (!= None), None otherwise
*/
override def validate(xt: XTSeries[(Array[T], Int)], index: Int): Option[Double] = model match {
case Some(m) => Some(ClassValidation(xt.map(x =>(m.classify(x._1), x._2)) , index).f1)
case None => DisplayUtils.none("NaiveBayes Model undefined", logger)
}
/**
* Textual representation of the Naive Bayes classifier with labels for features.
* It returns "No Naive Bayes model" if no model exists
* @return Stringized features with their label if model exists.
*/
def toString(labels: Array[String]): String =
model.map(m => if( labels.isEmpty ) m.toString else m.toString(labels))
.getOrElse("No Naive Bayes model")
/**
* Default textual representation of the Naive Bayes classifier with labels for features.
* It returns "No Naive Bayes model" if no model exists
* @return Stringized features with their label if model exists.
*/
override def toString: String = toString(Array.empty)
/**
* Train the Naive Bayes model on one of the two classes (positive = 1) or negative (=0)
*/
@implicitNotFound("NaiveBayes; Conversion from array[T] to DblVector is undefined")
private def train(label: Int)(implicit f: Array[T] => DblVector): Likelihood[T] = {
val xi = xt.toArray
// Extract then filter each observation to be associated to a specific label.
// The implicit conversion from Array of type T to Array of type Double is invoked
val values = xi.filter( _._2 == label).map(x => f(x._1))
assert( !values.isEmpty, "NaiveBayes.train Filtered value is undefined")
// Gets the dimension of a feature
val dim = xi(0)._1.size
val vSeries = XTSeries[DblVector](values)
// Create a likelihood instance for this class 'label'. The
// tuple (mean, standard deviation) (2nd argument) is computed
// by invoking XTSeries.statistics then the Lidstone mean adjustment.
// The last argument, class likelihood p(C) is computed as the ratio of the
// number of observations associated to this class/label over total number of observations.
Likelihood(label,
statistics(vSeries).map(stat => (stat.lidstoneMean(smoothing, dim), stat.stdDev) ),
values.size.toDouble/xi.size)
}
}
/**
* Singleton that define the constructors for the NaiveBayes classifier and
* validate its parameters
* @author Patrick Nicolas
* @since February 13, 2014
* @note Scala for Machine learning Chapter 5 Naive Bayes Model
*/
object NaiveBayes {
/**
* Default constructor for the NaiveBayes class
* @param smoothing Laplace or Lidstone smoothing factor
* @param xt Input labeled time series used for training
* @param density Density function used to compute the discriminant
*/
def apply[T <% Double](
smoothing: Double,
xt: XTSeries[(Array[T], Int)],
density: Density): NaiveBayes[T] = new NaiveBayes[T](smoothing, xt, density)
/**
* Constructor for the NaiveBayes class with a Laplace smoothing function and
* a Gaussian density function.
* @param xt Input labeled time series used for training
*/
def apply[T <% Double](xt: XTSeries[(Array[T], Int)]): NaiveBayes[T] =
new NaiveBayes[T](1.0, xt, gauss)
/*
def |>[T <% Double](): PartialFunction[XTSeries[Array[T]], Array[Int]] = {
case xt: XTSeries[Array[T]] if(!xt.isEmpty && {
model =
model != None }) =>
xt.toArray.map( model.get.classify( _))
}
*
*/
def |>[T <% Double](model: Option[BinNaiveBayesModel[T]]): PartialFunction[XTSeries[Array[T]], Array[Int]] = {
case xt: XTSeries[Array[T]] if( !xt.isEmpty && model != None) =>
xt.toArray.map( model.get.classify( _))
}
private def check[T <% Double](smoothing: Double, xt: XTSeries[(Array[T], Int)]): Unit = {
require(smoothing > 0.0 && smoothing <= 1.0,
s"NaiveBayes: Laplace or Lidstone smoothing factor $smoothing is out of range")
require( !xt.isEmpty,
"NaiveBayes: Time series input for training Naive Bayes is undefined")
}
}
// ------------------------------ EOF --------------------------------------------

复制代码

25楼

Lisrelchen(真实交易用户) 发表于 2016-4-20 09:18:49

/**
* Copyright (c) 2013-2015 Patrick Nicolas - Scala for Machine Learning - All rights reserved
*
* The source code in this file is provided by the author for the sole purpose of illustrating the
* concepts and algorithms presented in "Scala for Machine Learning". It should not be used to
* build commercial applications.
* ISBN: 978-1-783355-874-2 Packt Publishing.
* Unless required by applicable law or agreed to in writing, software is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
*
* Version 0.98
*/
package org.scalaml.supervised.bayes
import org.scalaml.stats.Stats
import org.scalaml.util.{FormatUtils, DisplayUtils}
import org.scalaml.core.Types.ScalaMl.XYTSeries
import NaiveBayesModel._
/**
* Class that represents a likelihood for each feature for Naive Bayes classifier.
* The prior consists of a label (index), the mean of the prior of each dimension of the model,
* the standard deviation of the prior of each dimension of the model and the class likeliHood.
* The Naive Bayes assume that the dimension of the model are independent, making the log of
* the prior additive.
* @constructor Create a likelihood for a specific class.
* @throws IllegalArgumentException if the array of mean and standard deviation of the
* likelihood is undefined or if the class likelihood (prior) is out of range ]0,1]
* @param label Name or label of the class or prior for which the likelihood is computed.
* @param muSigma Array of tuples (mean, standard deviation) of the prior observations for the model
* @param prior Probability of occurrence for the class specified by the label.
*
* @author Patrick Nicolas
* @since March 11, 2014
* @note Scala for Machine Learning Chapter 5 Naive Bayes Models
*/
protected class Likelihood[T <% Double](val label: Int, val muSigma: XYTSeries, val prior: Double) {
import Stats._, Likelihood._
check(muSigma, prior)
/**
* Compute the log p(C|x of log of the conditional probability of the class given an
* observation, obs and a probability density distribution.
* The default density probability function is Normal(0, 1)
* @param obs parameterized observation
* @param density probability density function (default Gauss)
* @throws IllegalArgumentException if the observations are undefined
* @return log of the conditional probability p(C|x)
*/
final def score(obs: Array[T], density: Density): Double = {
require( !obs.isEmpty, "Likelihood.score Undefined observations")
// Compute the Log of sum of the likelihood and the class prior probability
// The log likelihood is computed by adding the log of the density for each dimension.
// Sum {log p(xi|C) }
(obs, muSigma).zipped.foldLeft(0.0)((post, xms) => {
val mean = xms._2._1 // mean
val stdDev = xms._2._2 // standard deviation
val _obs = xms._1
val logLikelihood = density(mean, stdDev, _obs)
// Avoid large value by setting a minimum value for the density probability
post + Math.log(if(logLikelihood < MINLOGARG) MINLOGVALUE else logLikelihood)
}) + Math.log(prior) // Add the class likelihood p(C)
}
/**
* DisplayUtils the content of this Likelihood class with associated labels.
* @param labels Label of variables used to display content
*/
def toString(labels: Array[String]): String = {
import org.scalaml.core.Types.ScalaMl
val muSigmaStr = muSigma.map(musig => (musig._1, if(musig._2 > 0.0) musig._2 else -1.0))
// Format the tuple muSigma= (mean, standard deviation) and the Prior
FormatUtils.format(muSigma, "Label\tMeans", "Standard Deviation", FormatUtils.MediumFormat, labels) +
FormatUtils.format(prior, "Class likelihood", FormatUtils.MediumFormat)
}
override def toString: String = toString(Array.empty)
}
/**
* Companion object for the Naive Bayes Likelihood class. The singleton
* is used to define the constructor apply for the class.
* @author Patrick Nicolas
* @since March 11, 2014
* @note Scala for Machine Learning Chapter 5 Naive Bayes Models
*/
object Likelihood {
private val MINLOGARG = 1e-32
private val MINLOGVALUE = -MINLOGARG
/**
* Default constructor for he class Likelihood.
* @param label Name or label of the class or prior for which the likelihood is computed.
* @param muSigma Array of tuples (mean, standard deviation) of the prior observations
* for the model
* @param prior Probability of occurrence for the class specified by the label.
*/
def apply[T <% Double](label: Int, muSigma: XYTSeries, prior: Double): Likelihood[T] =
new Likelihood[T](label, muSigma, prior)
private def check(muSigma: XYTSeries, prior: Double): Unit = {
require( !muSigma.isEmpty,
"Likelihood.check Historical mean and standard deviation is undefined")
require(prior > 0.0 && prior <= 1.0,
s"Likelihood.check Prior for the NB prior $prior is out of range")
}
}
// -------------------------------- EOF --------------------------------------------------------------

复制代码

26楼

Lisrelchen(真实交易用户) 发表于 2016-4-20 09:27:04

package org.scalaml.supervised.bayes
import org.scalaml.stats.Stats
import org.scalaml.core.Design.Model
import NaiveBayesModel._
abstract class NaiveBayesModel[T <% Double](val density: Density) extends Model {
def classify(x: Array[T]): Int
def toString(labels: Array[String]): String
}
object NaiveBayesModel {
type Density= (Double*) => Double
}
protected class BinNaiveBayesModel[T <% Double](
positives: Likelihood[T],
negatives: Likelihood[T],
density: Density) extends NaiveBayesModel[T](density) {
override def classify(x: Array[T]): Int = {
require( !x.isEmpty,
"BinNaiveBayesModel.classify Undefined observations")
// Simply select one of the two classes with the highest log posterior probability
if (positives.score(x, density) > negatives.score(x, density)) 1 else 0
}
override def toString(labels: Array[String]): String = {
require( !labels.isEmpty, "BinNaiveBayesModel.toString Undefined labels")
s"\nPositive class\n${positives.toString(labels)}\nNegative class\n${negatives.toString(labels)}"
}
override def toString: String =
s"\nPositive\n${positives.toString}\nNegative\n${negatives.toString}"
}
object BinNaiveBayesModel {
def apply[T <% Double](
positives: Likelihood[T],
negatives: Likelihood[T],
density: Density): BinNaiveBayesModel[T] =
new BinNaiveBayesModel(positives, negatives, density)
}
protected class MultiNaiveBayesModel[T <% Double](
likelihoodSet: List[Likelihood[T]],
density: Density) extends NaiveBayesModel[T](density) {
require(!likelihoodSet.isEmpty,
"MultiNaiveBayesModel Cannot classify using Multi-NB with undefined classes")
override def classify(x: Array[T]): Int = {
require( !x.isEmpty, "MultiNaiveBayesModel.classify Vector input is undefined")
// The classification is performed by ordering the class according to the
// log of their posterior probability and selecting the top one (highest
// posterior probability)
likelihoodSet.sortWith((p1, p2) => p1.score(x, density) > p2.score(x, density)).head.label
}
override def toString(labels: Array[String]): String = {
require( !labels.isEmpty, "MultiNaiveBayesModel.toString Vector input is undefined")
val buf = new StringBuilder
likelihoodSet.zipWithIndex.foreach(l => {
buf.append(s"\nclass ${l._2}: ${l._1.toString(labels)}")
})
buf.toString
}
}
object MultiNaiveBayesModel {
def apply[T <% Double](
likelihoodSet: List[Likelihood[T]],
density: Density): MultiNaiveBayesModel[T] =
new MultiNaiveBayesModel[T](likelihoodSet, density)
}
// -------------------------------- EOF --------------------------------------------------------------

复制代码

27楼

Lisrelchen(真实交易用户) 发表于 2016-4-20 09:31:56

package org.scalaml.supervised.crf
// Scala standard library
import scala.util.{Try, Success, Failure}
// 3rd party frameworks or libraries
import iitb.CRF.{CRF, CrfParams, DataSequence, DataIter, FeatureGenerator}
import iitb.Model.{FeatureGenImpl, CompleteModel}
import org.apache.log4j.Logger
// ScalaMl classes
import org.scalaml.core.XTSeries
import org.scalaml.core.Types.ScalaMl._
import org.scalaml.core.Design.{PipeOperator, Model}
import org.scalaml.supervised.Supervised
import org.scalaml.util.DisplayUtils
import org.scalaml.workflow.data.DataSource
import CrfConfig._
final protected class CrfModel(val weights: DblVector) extends Model {
require(!weights.isEmpty, "CrfModel Cannot create a model with undefined weights")
def this(className: String) =
this({ Model.read(className).map( _.split(",").map(_.toDouble)).getOrElse(Array.empty) })
override def >> : Boolean = write(weights.mkString(","))
}
final class Crf(
nLabels: Int,
config: CrfConfig,
delims: CrfSeqDelimiter,
taggedObs: String) extends PipeOperator[String, Double] {
import Crf._
check(nLabels)
private val logger = Logger.getLogger("Crf")
class TaggingGenerator(nLabels: Int)
extends FeatureGenImpl(new CompleteModel(nLabels), nLabels, true)
private[this] val features = new TaggingGenerator(nLabels)
private[this] val crf = new CRF(nLabels, features, config.params)
private val model: Option[CrfModel] = train match {
case Success(model) => Some(model)
case Failure(e) => DisplayUtils.none("Crf.model could not be created", logger, e)
}
override def |> : PartialFunction[String, Double] = {
case obs: String if( !obs.isEmpty && model != None) => {
val dataSeq = new CrfTrainingSet(nLabels, obs, delims.obsDelim)
crf.apply(dataSeq)
}
}
final def weights: Option[DblVector] = model.map( _.weights)
private def train: Try[CrfModel] = {
val seqIter = CrfSeqIter(nLabels, taggedObs, delims)
Try {
features.train(seqIter)
new CrfModel(crf.train(seqIter))
}
}
}
object Crf {
final val NUM_LABELS_LIMITS = (1, 512)
def apply(nLabels: Int, state: CrfConfig, delims: CrfSeqDelimiter, taggedObs: String): Crf =
new Crf(nLabels, state, delims, taggedObs)
private def check(nLabels: Int): Unit = {
require(nLabels > NUM_LABELS_LIMITS._1 && nLabels < NUM_LABELS_LIMITS._2,
s"Number of labels for generating tags for CRF $nLabels is out of range")
}
}
// ---------------------------- EOF ------------------------------------------------------

复制代码

28楼

Lisrelchen(真实交易用户) 发表于 2016-4-20 09:34:39

package org.scalaml.supervised.crf
import iitb.CRF.{CRF, CrfParams, DataSequence, DataIter, FeatureGenerator}
import iitb.Model.{FeatureGenImpl, CompleteModel}
import org.scalaml.core.XTSeries
import org.scalaml.workflow.data.DataSource
import org.scalaml.core.Design.{PipeOperator, Config}
import java.io.IOException
import org.scalaml.core.Types.ScalaMl._
protected class CrfConfig(w0: Double, maxIters: Int, lambda: Double, eps: Double) extends Config {
import CrfConfig._
check(w0, maxIters, lambda, eps)
val params = s"initValue ${String.valueOf(w0)} maxIters ${String.valueOf(maxIters)} " +
s"lambda ${String.valueOf(lambda)} scale true eps $eps"
}
object CrfConfig {
private val INIT_WEIGHTS_LIMITS = (0.1, 2.5)
private val MAX_ITERS_LIMITS = (10, 250)
private val LAMBDA_LIMITS = (1e-15, 1.5)
private val EPS_LIMITS = (1e-5, 0.2)
def apply(w0: Double, maxIters: Int, lambda: Double, eps:Double): CrfConfig =
new CrfConfig(w0, maxIters, lambda, eps)
private def check(w0: Double, maxIters: Int, lambda: Double, eps: Double): Unit = {
require(w0 >= INIT_WEIGHTS_LIMITS._1 && w0 <= INIT_WEIGHTS_LIMITS._2,
s"Initialization of the CRF weights $w0 is out of range")
require( maxIters >= MAX_ITERS_LIMITS._1 && maxIters <= MAX_ITERS_LIMITS._2,
s"Maximum number of iterations for CRF training $maxIters is out of range")
require( lambda >= LAMBDA_LIMITS._1 && lambda <= LAMBDA_LIMITS._2,
s"The factor for the L2 penalty for CRF $lambda is out of range")
require( eps > EPS_LIMITS._1 && eps<= EPS_LIMITS._2,
s"The convergence criteria for the CRF training $eps is out of range")
}
}
// ---------------------------- EOF ------------------------------------------------------

复制代码

29楼

Lisrelchen(真实交易用户) 发表于 2016-4-20 09:38:51

package org.scalaml.supervised.crf
// IITB library classes
import iitb.CRF.{CRF, CrfParams, DataSequence, DataIter}
import iitb.Model.FeatureImpl
import iitb.Segment.{DataCruncher, LabelMap}
// ScalaMl classes
import org.scalaml.core.Types
class CrfSeqDelimiter(val obsDelim: String, val labelsDelim: String, val trainingDelim: String) {
require(obsDelim != Types.nullString,
"Delimiter for observations in CRF training sequence is undefined")
require(labelsDelim != Types.nullString,
"Delimiter for labels in CRF training sequence is undefined")
require(trainingDelim != Types.nullString,
"Delimiter for training sequences in CRF training sequence is undefined")
}
class CrfSeqIter(val nLabels: Int, val input: String, val delim: CrfSeqDelimiter) extends DataIter {
import CrfSeqIter._
check(nLabels, input, delim)
lazy val trainData = DataCruncher.readTagged(nLabels, input, input, delim.obsDelim,
delim.labelsDelim, delim.trainingDelim, new LabelMap)
override def hasNext: Boolean = trainData.hasNext
override def next: DataSequence = trainData.next
override def startScan: Unit = trainData.startScan
}
object CrfSeqIter {
private val MAX_NUM_LABELS = 1000
private val DEFAULT_SEQ_DELIMITER = new CrfSeqDelimiter(",\t/ -():.;'?#`&_", "//", "\n")
def apply(
nLabels: Int,
input: String,
delim: CrfSeqDelimiter): CrfSeqIter = new CrfSeqIter(nLabels, input, delim)
def apply(nLabels: Int, input: String): CrfSeqIter =
new CrfSeqIter(nLabels, input, DEFAULT_SEQ_DELIMITER)
private def check(nLabels: Int, input: String, delim: CrfSeqDelimiter): Unit = {
require(nLabels > 0 && nLabels < MAX_NUM_LABELS,
s"CrfSeqIter.check Number of labels for the CRF model $nLabels is out of range")
require(input != Types.nullString,
"CrfSeqIter.check input for the CRF training files is undefined")
}
}
// ---------------------------- EOF ------------------------------------------------------

复制代码

30楼

Lisrelchen(真实交易用户) 发表于 2016-4-20 09:40:36

package org.scalaml.supervised.crf
import iitb.CRF.{CRF, CrfParams, DataSequence, DataIter}
import java.util.Properties
import iitb.Model.FeatureImpl
import iitb.Segment.{DataCruncher, LabelMap}
class CrfTrainingSet(val nLabels: Int, val entry: String, val delim: String) extends DataSequence {
import CrfTrainingSet._
check(nLabels, entry, delim)
private[this] val words: Array[String] = entry.split(delim)
private[this] val map: Array[Int] = new Array[Int](nLabels)
override def set_y(k: Int, label: Int): Unit = map(k) = label
override def y(k: Int): Int = map(k)
override def length: Int = words.size
override def x(k: Int): Object = words(k)
}
object CrfTrainingSet {
import Crf._
private def check(nLabels: Int, entry: String, delim: String): Unit = {
require(nLabels >= NUM_LABELS_LIMITS._1 && nLabels < NUM_LABELS_LIMITS._2)
}
}
// ---------------------------- EOF ------------------------------------------------------

复制代码

Scala for Machine Learning [推广有奖]

浏览过的帖子

浏览过的版块

本版微信群