楼主: 中国文学
1833 4

JSTAT:Java Statistical Analysis Tool [推广有奖]

  • 0关注
  • 0粉丝

小学生

50%

还不是VIP/贵宾

-

TA的文库  其他...

XML NewOccidental

JavaScript NewOccidental

PostgreSQL

威望
0
论坛币
1040 个
通用积分
1.0000
学术水平
0 点
热心指数
0 点
信用等级
0 点
经验
79 点
帖子
14
精华
0
在线时间
0 小时
注册时间
2015-10-9
最后登录
2016-6-27

相似文件 换一批

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Java Statistical Analysis Tool

JSAT is a library for quickly getting started with Machine Learning problems. It is developed in my free time, and made available for use under the GPL 3. Part of the library is for self education, as such - all code is self contained. JSAT has no external dependencies, and is pure Java. I also aim to make the library suitably fast for small to medium size problems. As such, much of the code supports parallel execution.

Get JSAT

You can download JSAT from maven central, add the below to your pom file

<dependencies>  <dependency>    <groupId>com.edwardraff</groupId>    <artifactId>JSAT</artifactId>    <version>0.0.5</version>  </dependency></dependencies>

I will also host a snapshot directory, to access it - change "maven-repo" to "maven-snapshot-repo" for the "<url>" tag.

Why use JSAT?

For reasarch and specialized needs, JSAT has one of the largest collections of algorithms available in any framework. See an incomplete list here.

Additional, there are unfortunately not as many ML tools for Java as there are for other lanagues. Compared to Weka, JSAT is usually faster.

If you want to use JSAT and the GPL is not something that will work for you, let me know and we can discuss the issue.

See the wiki for more information as well as some examples on how to use JSAT.

Note

Updates to JSAT may be slowed as I begin a PhD program in Computer Science. The project isn’t abandoned! I just have limited free time, and will be balancing my PhD work with a full time job. If you discover more hours in the day, please let me know!

本帖隐藏的内容

JSAT-master.zip (1.48 MB)


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Statistical statistica statistic Analysis Analysi framework execution available developed education

本帖被以下文库推荐

沙发
中国文学 发表于 2016-6-27 02:01:37 |只看作者 |坛友微信交流群

Multinomial Naive Bayes Model using Java

  1. package jsat.classifiers.bayesian;

  2. import static java.lang.Math.exp;
  3. import java.util.Arrays;
  4. import java.util.List;
  5. import java.util.concurrent.ExecutorService;
  6. import jsat.classifiers.*;
  7. import jsat.exceptions.FailedToFitException;
  8. import jsat.exceptions.UntrainedModelException;
  9. import jsat.linear.IndexValue;
  10. import jsat.linear.Vec;
  11. import jsat.math.MathTricks;
  12. import jsat.parameters.Parameter;
  13. import jsat.parameters.Parameterized;

  14. /**
  15. * An implementation of the Multinomial Naive Bayes model (MNB). In this model,
  16. * vectors are implicitly assumed to be sparse and that zero values can be
  17. * skipped. This model requires that all numeric features be non negative, any
  18. * negative value will be treated as a zero. <br>
  19. * <br>Note: the is no reason to ever use more than one
  20. * {@link #setEpochs(int) epoch} for MNB<br>
  21. * <br>MNB requires taking the log probabilities to perform predictions, which
  22. * created a trade off. Updating the classifier requires the non log form, but
  23. * updates require the log form, making classification take considerably longer
  24. * to take the logs of the probabilities. This can be reduced by
  25. * {@link #finalizeModel() finalizing} the model. This prevents the model from
  26. * being updated further, but reduces classification time. By default, this will
  27. * be done after a call to
  28. * {@link #trainC(jsat.classifiers.ClassificationDataSet) } but not after
  29. * {@link #update(jsat.classifiers.DataPoint, int) }
  30. *
  31. * @author Edward Raff
  32. */
  33. public class MultinomialNaiveBayes extends BaseUpdateableClassifier implements Parameterized
  34. {

  35.         private static final long serialVersionUID = -469977945722725478L;
  36.         private double[][][] apriori;
  37.     private double[][] wordCounts;
  38.     private double[] totalWords;
  39.    
  40.     private double priorSum = 0;
  41.     private double[] priors;
  42.     /**
  43.      * Smoothing correction.
  44.      * Added in classification instead of addition
  45.      */
  46.     private double smoothing;
  47.    
  48.     private boolean finalizeAfterTraining = true;
  49.     /**
  50.      * No more training
  51.      */
  52.     private boolean finalized;

  53.     /**
  54.      * Creates a new Multinomial model with laplace smoothing
  55.      */
  56.     public MultinomialNaiveBayes()
  57.     {
  58.         this(1.0);
  59.     }
  60.    
  61.     /**
  62.      * Creates a new Multinomial model with the given amount of smoothing
  63.      * @param smoothing the amount of smoothing to apply
  64.      */
  65.     public MultinomialNaiveBayes(double smoothing)
  66.     {
  67.         setSmoothing(smoothing);
  68.         setEpochs(1);
  69.     }

  70.     /**
  71.      * Copy constructor
  72.      * @param other the one to copy
  73.      */
  74.     protected MultinomialNaiveBayes(MultinomialNaiveBayes other)
  75.     {
  76.         this(other.smoothing);
  77.         if(other.apriori != null)
  78.         {
  79.             this.apriori = new double[other.apriori.length][][];
  80.             this.wordCounts = new double[other.wordCounts.length][];
  81.             this.totalWords = Arrays.copyOf(other.totalWords, other.totalWords.length);
  82.             this.priors = Arrays.copyOf(other.priors, other.priors.length);
  83.             this.priorSum = other.priorSum;

  84.             
  85.             for(int c = 0; c < other.apriori.length; c++)
  86.             {
  87.                 this.apriori[c] = new double[other.apriori[c].length][];
  88.                 for(int j = 0; j < other.apriori[c].length; j++)
  89.                     this.apriori[c][j] = Arrays.copyOf(other.apriori[c][j],
  90.                             other.apriori[c][j].length);
  91.                 this.wordCounts[c] = Arrays.copyOf(other.wordCounts[c], other.wordCounts[c].length);
  92.             }
  93.             
  94.             this.priorSum = other.priorSum;
  95.             this.priors = Arrays.copyOf(other.priors, other.priors.length);
  96.         }
  97.         this.finalizeAfterTraining = other.finalizeAfterTraining;
  98.         this.finalized = other.finalized;
  99.     }

  100.     /**
  101.      * Sets the amount of smoothing applied to the model. <br>
  102.      * Using a value of 1.0 is equivalent to laplace smoothing
  103.      * <br><br>
  104.      * The smoothing can be changed after the model has already been trained
  105.      * without needed to re-train the model for the change to take effect.
  106.      *
  107.      * @param smoothing the positive smoothing constant
  108.      */
  109.     public void setSmoothing(double smoothing)
  110.     {
  111.         if(Double.isNaN(smoothing) || Double.isInfinite(smoothing) || smoothing <= 0)
  112.             throw new IllegalArgumentException("Smoothing constant must be in range (0,Inf), not " + smoothing);
  113.         this.smoothing = smoothing;
  114.     }

  115.     /**
  116.      *
  117.      * @return the smoothing applied to categorical counts
  118.      */
  119.     public double getSmoothing()
  120.     {
  121.         return smoothing;
  122.     }

  123.     /**
  124.      * If set {@code true}, the model will be finalized after a call to
  125.      * {@link #trainC(jsat.classifiers.ClassificationDataSet) }. This prevents
  126.      * the model from being updated in an online fashion for an reduction in
  127.      * classification time.
  128.      *
  129.      * @param finalizeAfterTraining {@code true} to finalize after a call to
  130.      * train, {@code false} to keep the model updatable.
  131.      */
  132.     public void setFinalizeAfterTraining(boolean finalizeAfterTraining)
  133.     {
  134.         this.finalizeAfterTraining = finalizeAfterTraining;
  135.     }

  136.     /**
  137.      * Returns {@code true} if the model will be finalized after batch training.
  138.      * {@code false} if it will be left in an updatable state.
  139.      * @return {@code true} if the model will be finalized after batch training.
  140.      */
  141.     public boolean isFinalizeAfterTraining()
  142.     {
  143.         return finalizeAfterTraining;
  144.     }
  145.    
  146.     @Override
  147.     public MultinomialNaiveBayes clone()
  148.     {
  149.         return new MultinomialNaiveBayes(this);
  150.     }

  151.     @Override
  152.     public void trainC(ClassificationDataSet dataSet, ExecutorService threadPool)
  153.     {
  154.         super.trainC(dataSet, threadPool);
  155.         if(finalizeAfterTraining)
  156.             finalizeModel();
  157.     }
  158.    
  159.     @Override
  160.     public void trainC(ClassificationDataSet dataSet)
  161.     {
  162.         super.trainC(dataSet);
  163.         if(finalizeAfterTraining)
  164.             finalizeModel();
  165.     }

  166.     /**
  167.      * Finalizes the current model. This prevents the model from being updated
  168.      * further, causing {@link #update(jsat.classifiers.DataPoint, int) } to
  169.      * throw an exception. This finalization reduces the cost of calling
  170.      * {@link #classify(jsat.classifiers.DataPoint) }
  171.      */
  172.     public void finalizeModel()
  173.     {
  174.         if(finalized)
  175.             return;
  176.         final double priorSumSmooth = priorSum + priors.length * smoothing;

  177.         for (int c = 0; c < priors.length; c++)
  178.         {
  179.             double logProb = Math.log((priors[c] + smoothing) / priorSumSmooth);
  180.             priors[c] = logProb;

  181.             double[] counts = wordCounts[c];
  182.             double logTotalCounts = Math.log(totalWords[c] + smoothing * counts.length);

  183.             for(int i = 0; i < counts.length; i++)
  184.             {
  185.                 //(n/N)^obv
  186.                 counts[i] = Math.log(counts[i] + smoothing) - logTotalCounts;
  187.             }

  188.             for (int j = 0; j < apriori[c].length; j++)
  189.             {
  190.                 double sum = 0;
  191.                 for (int z = 0; z < apriori[c][j].length; z++)
  192.                     sum += apriori[c][j][z] + smoothing;
  193.                 for (int z = 0; z < apriori[c][j].length; z++)
  194.                     apriori[c][j][z] = Math.log( (apriori[c][j][z]+smoothing)/sum);
  195.             }
  196.         }
  197.         finalized = true;
  198.     }
  199.    
  200.    
  201.     @Override
  202.     public void setUp(CategoricalData[] categoricalAttributes, int numericAttributes, CategoricalData predicting)
  203.     {
  204.         final int nCat = predicting.getNumOfCategories();
  205.         apriori = new double[nCat][categoricalAttributes.length][];
  206.         wordCounts = new double[nCat][numericAttributes];
  207.         totalWords = new double[nCat];
  208.         priors = new double[nCat];
  209.         priorSum = 0.0;
  210.         
  211.         for (int i = 0; i < nCat; i++)
  212.             for (int j = 0; j < categoricalAttributes.length; j++)
  213.                 apriori[i][j] = new double[categoricalAttributes[j].getNumOfCategories()];
  214.         finalized = false;
  215.     }

  216.     @Override
  217.     public void update(DataPoint dataPoint, int targetClass)
  218.     {
  219.         if(finalized)
  220.             throw new FailedToFitException("Model has already been finalized, and can no longer be updated");
  221.         final double weight = dataPoint.getWeight();
  222.         final Vec x = dataPoint.getNumericalValues();
  223.         
  224.         //Categorical value updates
  225.         int[] catValues = dataPoint.getCategoricalValues();
  226.         for(int j = 0; j < apriori[targetClass].length; j++)
  227.             apriori[targetClass][j][catValues[j]]+=weight;
  228.         double localCountsAdded = 0;
  229.         for(IndexValue iv : x)
  230.         {
  231.             final double v = iv.getValue();
  232.             if(v < 0)
  233.                 continue;
  234.             wordCounts[targetClass][iv.getIndex()] += v*weight;
  235.             localCountsAdded += v*weight;
  236.         }
  237.         totalWords[targetClass] += localCountsAdded;
  238.         priors[targetClass] += weight;
  239.         priorSum += weight;
  240.     }

  241.     @Override
  242.     public CategoricalResults classify(DataPoint data)
  243.     {
  244.         if(apriori == null)
  245.             throw new UntrainedModelException("Model has not been intialized");
  246.         CategoricalResults results = new CategoricalResults(apriori.length);
  247.         double[] logProbs = new double[apriori.length];
  248.         double maxLogProg = Double.NEGATIVE_INFINITY;
  249.         Vec numVals = data.getNumericalValues();
  250.         if(finalized)
  251.         {
  252.             for(int c = 0; c < priors.length; c++)
  253.             {
  254.                 double logProb = priors[c];

  255.                 double[] counts = wordCounts[c];

  256.                 for (IndexValue iv : numVals)
  257.                 {
  258.                     //(n/N)^obv
  259.                     logProb += iv.getValue() * counts[iv.getIndex()];
  260.                 }

  261.                 for (int j = 0; j < apriori[c].length; j++)
  262.                 {
  263.                     logProb += apriori[c][j][data.getCategoricalValue(j)];
  264.                 }

  265.                 logProbs[c] = logProb;
  266.                 maxLogProg = Math.max(maxLogProg, logProb);
  267.             }
  268.         }
  269.         else
  270.         {
  271.             final double priorSumSmooth = priorSum+logProbs.length*smoothing;
  272.             for(int c = 0; c < priors.length; c++)
  273.             {
  274.                 double logProb = Math.log((priors[c]+smoothing)/priorSumSmooth);

  275.                 double[] counts = wordCounts[c];
  276.                 double logTotalCounts = Math.log(totalWords[c]+smoothing*counts.length);

  277.                 for (IndexValue iv : numVals)
  278.                 {
  279.                     //(n/N)^obv
  280.                     logProb += iv.getValue() * (Math.log(counts[iv.getIndex()]+smoothing) - logTotalCounts);
  281.                 }

  282.                 for (int j = 0; j < apriori[c].length; j++)
  283.                 {
  284.                     double sum = 0;
  285.                     for (int z = 0; z < apriori[c][j].length; z++)
  286.                         sum += apriori[c][j][z]+smoothing;
  287.                     double p = apriori[c][j][data.getCategoricalValue(j)]+smoothing;
  288.                     logProb += Math.log(p / sum);
  289.                 }

  290.                 logProbs[c] = logProb;
  291.                 maxLogProg = Math.max(maxLogProg, logProb);
  292.             }
  293.         }
  294.         double denom = MathTricks.logSumExp(logProbs, maxLogProg);

  295.         for (int i = 0; i < results.size(); i++)
  296.             results.setProb(i, exp(logProbs[i] - denom));
  297.         results.normalize();
  298.         return results;
  299.     }

  300.     @Override
  301.     public boolean supportsWeightedData()
  302.     {
  303.         return true;
  304.     }

  305.     @Override
  306.     public List<Parameter> getParameters()
  307.     {
  308.         return Parameter.getParamsFromMethods(this);
  309.     }

  310.     @Override
  311.     public Parameter getParameter(String paramName)
  312.     {
  313.         return Parameter.toParameterMap(getParameters()).get(paramName);
  314.     }
  315.    
  316. }
复制代码

使用道具

藤椅
Nicolle 学生认证  发表于 2016-6-27 02:14:55 |只看作者 |坛友微信交流群
提示: 作者被禁止或删除 内容自动屏蔽

使用道具

板凳
bbslover 发表于 2016-6-27 05:36:37 |只看作者 |坛友微信交流群
thanks for sharing

使用道具

报纸
mike68097 发表于 2016-6-28 00:18:45 |只看作者 |坛友微信交流群

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加JingGuanBbs
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-27 21:18