人大经济论坛 › 论坛 › 数据科学与人工智能 › IT基础 › JAVA语言开发 › JSTAT:Java Statistical Analysis Tool

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

发帖

楼主: 中国文学

1833 4

JSTAT:Java Statistical Analysis Tool [推广有奖]

0关注
0粉丝

小学生

50%

还不是VIP/贵宾

TA的文库 其他...

XML NewOccidental

JavaScript NewOccidental

PostgreSQL

威望: 0 级
论坛币: 1040 个
通用积分: 1.0000
学术水平: 0 点
热心指数: 0 点
信用等级: 0 点
经验: 79 点
帖子: 14
精华: 0
在线时间: 0 小时
注册时间: 2015-10-9
最后登录: 2016-6-27

楼主

中国文学 发表于 2016-6-27 01:59:22 |只看作者 |坛友微信交流群|倒序 |AI写论文

相似文件

换一批

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Java Statistical Analysis Tool

JSAT is a library for quickly getting started with Machine Learning problems. It is developed in my free time, and made available for use under the GPL 3. Part of the library is for self education, as such - all code is self contained. JSAT has no external dependencies, and is pure Java. I also aim to make the library suitably fast for small to medium size problems. As such, much of the code supports parallel execution.

Get JSAT

You can download JSAT from maven central, add the below to your pom file

<dependencies> <dependency> <groupId>com.edwardraff</groupId> <artifactId>JSAT</artifactId> <version>0.0.5</version> </dependency></dependencies>

I will also host a snapshot directory, to access it - change "maven-repo" to "maven-snapshot-repo" for the "<url>" tag.

Why use JSAT?

For reasarch and specialized needs, JSAT has one of the largest collections of algorithms available in any framework. See an incomplete list here.

Additional, there are unfortunately not as many ML tools for Java as there are for other lanagues. Compared to Weka, JSAT is usually faster.

If you want to use JSAT and the GPL is not something that will work for you, let me know and we can discuss the issue.

See the wiki for more information as well as some examples on how to use JSAT.

Note

Updates to JSAT may be slowed as I begin a PhD program in Computer Science. The project isn’t abandoned! I just have limited free time, and will be balancing my PhD work with a full time job. If you discover more hours in the day, please let me know!

本帖隐藏的内容

JSAT-master.zip (1.48 MB)

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Statistical statistica statistic Analysis Analysi framework execution available developed education

本帖被以下文库推荐

· 编程语言(Coding Languages)|主题: 3936, 订阅: 126
· 东西方数据挖掘|主题: 1798, 订阅: 171

使用道具举报

沙发

中国文学 发表于 2016-6-27 02:01:37 |只看作者 |坛友微信交流群

Multinomial Naive Bayes Model using Java

package jsat.classifiers.bayesian;
import static java.lang.Math.exp;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.ExecutorService;
import jsat.classifiers.*;
import jsat.exceptions.FailedToFitException;
import jsat.exceptions.UntrainedModelException;
import jsat.linear.IndexValue;
import jsat.linear.Vec;
import jsat.math.MathTricks;
import jsat.parameters.Parameter;
import jsat.parameters.Parameterized;
/**
* An implementation of the Multinomial Naive Bayes model (MNB). In this model,
* vectors are implicitly assumed to be sparse and that zero values can be
* skipped. This model requires that all numeric features be non negative, any
* negative value will be treated as a zero.
* Note: the is no reason to ever use more than one
* {@link #setEpochs(int) epoch} for MNB
* MNB requires taking the log probabilities to perform predictions, which
* created a trade off. Updating the classifier requires the non log form, but
* updates require the log form, making classification take considerably longer
* to take the logs of the probabilities. This can be reduced by
* {@link #finalizeModel() finalizing} the model. This prevents the model from
* being updated further, but reduces classification time. By default, this will
* be done after a call to
* {@link #trainC(jsat.classifiers.ClassificationDataSet) } but not after
* {@link #update(jsat.classifiers.DataPoint, int) }
*
* @author Edward Raff
*/
public class MultinomialNaiveBayes extends BaseUpdateableClassifier implements Parameterized
{
private static final long serialVersionUID = -469977945722725478L;
private double[][][] apriori;
private double[][] wordCounts;
private double[] totalWords;
private double priorSum = 0;
private double[] priors;
/**
* Smoothing correction.
* Added in classification instead of addition
*/
private double smoothing;
private boolean finalizeAfterTraining = true;
/**
* No more training
*/
private boolean finalized;
/**
* Creates a new Multinomial model with laplace smoothing
*/
public MultinomialNaiveBayes()
{
this(1.0);
}
/**
* Creates a new Multinomial model with the given amount of smoothing
* @param smoothing the amount of smoothing to apply
*/
public MultinomialNaiveBayes(double smoothing)
{
setSmoothing(smoothing);
setEpochs(1);
}
/**
* Copy constructor
* @param other the one to copy
*/
protected MultinomialNaiveBayes(MultinomialNaiveBayes other)
{
this(other.smoothing);
if(other.apriori != null)
{
this.apriori = new double[other.apriori.length][][];
this.wordCounts = new double[other.wordCounts.length][];
this.totalWords = Arrays.copyOf(other.totalWords, other.totalWords.length);
this.priors = Arrays.copyOf(other.priors, other.priors.length);
this.priorSum = other.priorSum;
for(int c = 0; c < other.apriori.length; c++)
{
this.apriori[c] = new double[other.apriori[c].length][];
for(int j = 0; j < other.apriori[c].length; j++)
this.apriori[c][j] = Arrays.copyOf(other.apriori[c][j],
other.apriori[c][j].length);
this.wordCounts[c] = Arrays.copyOf(other.wordCounts[c], other.wordCounts[c].length);
}
this.priorSum = other.priorSum;
this.priors = Arrays.copyOf(other.priors, other.priors.length);
}
this.finalizeAfterTraining = other.finalizeAfterTraining;
this.finalized = other.finalized;
}
/**
* Sets the amount of smoothing applied to the model.
* Using a value of 1.0 is equivalent to laplace smoothing
*
* The smoothing can be changed after the model has already been trained
* without needed to re-train the model for the change to take effect.
*
* @param smoothing the positive smoothing constant
*/
public void setSmoothing(double smoothing)
{
if(Double.isNaN(smoothing) || Double.isInfinite(smoothing) || smoothing <= 0)
throw new IllegalArgumentException("Smoothing constant must be in range (0,Inf), not " + smoothing);
this.smoothing = smoothing;
}
/**
*
* @return the smoothing applied to categorical counts
*/
public double getSmoothing()
{
return smoothing;
}
/**
* If set {@code true}, the model will be finalized after a call to
* {@link #trainC(jsat.classifiers.ClassificationDataSet) }. This prevents
* the model from being updated in an online fashion for an reduction in
* classification time.
*
* @param finalizeAfterTraining {@code true} to finalize after a call to
* train, {@code false} to keep the model updatable.
*/
public void setFinalizeAfterTraining(boolean finalizeAfterTraining)
{
this.finalizeAfterTraining = finalizeAfterTraining;
}
/**
* Returns {@code true} if the model will be finalized after batch training.
* {@code false} if it will be left in an updatable state.
* @return {@code true} if the model will be finalized after batch training.
*/
public boolean isFinalizeAfterTraining()
{
return finalizeAfterTraining;
}
@Override
public MultinomialNaiveBayes clone()
{
return new MultinomialNaiveBayes(this);
}
@Override
public void trainC(ClassificationDataSet dataSet, ExecutorService threadPool)
{
super.trainC(dataSet, threadPool);
if(finalizeAfterTraining)
finalizeModel();
}
@Override
public void trainC(ClassificationDataSet dataSet)
{
super.trainC(dataSet);
if(finalizeAfterTraining)
finalizeModel();
}
/**
* Finalizes the current model. This prevents the model from being updated
* further, causing {@link #update(jsat.classifiers.DataPoint, int) } to
* throw an exception. This finalization reduces the cost of calling
* {@link #classify(jsat.classifiers.DataPoint) }
*/
public void finalizeModel()
{
if(finalized)
return;
final double priorSumSmooth = priorSum + priors.length * smoothing;
for (int c = 0; c < priors.length; c++)
{
double logProb = Math.log((priors[c] + smoothing) / priorSumSmooth);
priors[c] = logProb;
double[] counts = wordCounts[c];
double logTotalCounts = Math.log(totalWords[c] + smoothing * counts.length);
for(int i = 0; i < counts.length; i++)
{
//(n/N)^obv
counts[i] = Math.log(counts[i] + smoothing) - logTotalCounts;
}
for (int j = 0; j < apriori[c].length; j++)
{
double sum = 0;
for (int z = 0; z < apriori[c][j].length; z++)
sum += apriori[c][j][z] + smoothing;
for (int z = 0; z < apriori[c][j].length; z++)
apriori[c][j][z] = Math.log( (apriori[c][j][z]+smoothing)/sum);
}
}
finalized = true;
}
@Override
public void setUp(CategoricalData[] categoricalAttributes, int numericAttributes, CategoricalData predicting)
{
final int nCat = predicting.getNumOfCategories();
apriori = new double[nCat][categoricalAttributes.length][];
wordCounts = new double[nCat][numericAttributes];
totalWords = new double[nCat];
priors = new double[nCat];
priorSum = 0.0;
for (int i = 0; i < nCat; i++)
for (int j = 0; j < categoricalAttributes.length; j++)
apriori[i][j] = new double[categoricalAttributes[j].getNumOfCategories()];
finalized = false;
}
@Override
public void update(DataPoint dataPoint, int targetClass)
{
if(finalized)
throw new FailedToFitException("Model has already been finalized, and can no longer be updated");
final double weight = dataPoint.getWeight();
final Vec x = dataPoint.getNumericalValues();
//Categorical value updates
int[] catValues = dataPoint.getCategoricalValues();
for(int j = 0; j < apriori[targetClass].length; j++)
apriori[targetClass][j][catValues[j]]+=weight;
double localCountsAdded = 0;
for(IndexValue iv : x)
{
final double v = iv.getValue();
if(v < 0)
continue;
wordCounts[targetClass][iv.getIndex()] += v*weight;
localCountsAdded += v*weight;
}
totalWords[targetClass] += localCountsAdded;
priors[targetClass] += weight;
priorSum += weight;
}
@Override
public CategoricalResults classify(DataPoint data)
{
if(apriori == null)
throw new UntrainedModelException("Model has not been intialized");
CategoricalResults results = new CategoricalResults(apriori.length);
double[] logProbs = new double[apriori.length];
double maxLogProg = Double.NEGATIVE_INFINITY;
Vec numVals = data.getNumericalValues();
if(finalized)
{
for(int c = 0; c < priors.length; c++)
{
double logProb = priors[c];
double[] counts = wordCounts[c];
for (IndexValue iv : numVals)
{
//(n/N)^obv
logProb += iv.getValue() * counts[iv.getIndex()];
}
for (int j = 0; j < apriori[c].length; j++)
{
logProb += apriori[c][j][data.getCategoricalValue(j)];
}
logProbs[c] = logProb;
maxLogProg = Math.max(maxLogProg, logProb);
}
}
else
{
final double priorSumSmooth = priorSum+logProbs.length*smoothing;
for(int c = 0; c < priors.length; c++)
{
double logProb = Math.log((priors[c]+smoothing)/priorSumSmooth);
double[] counts = wordCounts[c];
double logTotalCounts = Math.log(totalWords[c]+smoothing*counts.length);
for (IndexValue iv : numVals)
{
//(n/N)^obv
logProb += iv.getValue() * (Math.log(counts[iv.getIndex()]+smoothing) - logTotalCounts);
}
for (int j = 0; j < apriori[c].length; j++)
{
double sum = 0;
for (int z = 0; z < apriori[c][j].length; z++)
sum += apriori[c][j][z]+smoothing;
double p = apriori[c][j][data.getCategoricalValue(j)]+smoothing;
logProb += Math.log(p / sum);
}
logProbs[c] = logProb;
maxLogProg = Math.max(maxLogProg, logProb);
}
}
double denom = MathTricks.logSumExp(logProbs, maxLogProg);
for (int i = 0; i < results.size(); i++)
results.setProb(i, exp(logProbs[i] - denom));
results.normalize();
return results;
}
@Override
public boolean supportsWeightedData()
{
return true;
}
@Override
public List<Parameter> getParameters()
{
return Parameter.getParamsFromMethods(this);
}
@Override
public Parameter getParameter(String paramName)
{
return Parameter.toParameterMap(getParameters()).get(paramName);
}
}

复制代码

使用道具举报

加关注串个门加好友发消息 0关注 463 粉丝巨擘 Nicolle 当前离线阅读权限 255 威望 16 级论坛币 12402328 个通用积分 1620.9215 学术水平 3305 点热心指数 3329 点信用等级 3095 点经验 477211 点帖子 23879 精华 91 在线时间 9878 小时注册时间 2005-4-23 最后登录 2022-3-6 雷达卡	藤椅 Nicolle 发表于 2016-6-27 02:14:55 \|只看作者 \|坛友微信交流群提示: 作者被禁止或删除内容自动屏蔽

	回复使用道具举报显身卡