楼主: ReneeBK
3007 16

[GitHub]Mastering Java for Data Science [推广有奖]

  • 1关注
  • 62粉丝

VIP

已卖:4897份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49635 个
通用积分
55.7537
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57805 点
帖子
4005
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

楼主
ReneeBK 发表于 2017-7-7 03:44:03 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Mastering Java for Data Science

本帖隐藏的内容

Mastering-Java-for-Data-Science-master.zip (1.48 MB)


This is the code repository for Mastering Java for Data Science, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.

This is the official code repository for the book. You can find the unofficial one at https://github.com/alexeygrigorev/mastering-java-data-science. The unofficial repository contains the latest code fixes and is maintained by the author.

About the Book

Java is the most popular programming language, according to the TIOBE index, and it is a typical choice for running production systems in many companies, both in the startup world and among large enterprises.

Not surprisingly, it is also a common choice for creating data science applications: it is fast and has a great set of data processing tools, both built-in and external. What is more, choosing Java for data science allows you to easily integrate solutions with existing software, and bring data science into production with less effort.

This book will teach you how to create data science applications with Java. First, we will revise the most important things when starting a data science application, and then brush up the basics of Java and machine learning before diving into more advanced topics. We start by going over the existing libraries for data processing and libraries with machine learning algorithms. After that, we cover topics such as classification and regression, dimensionality reduction and clustering, information retrieval and natural language processing, and deep learning and big data.

Finally, we finish the book by talking about the ways to deploy the model and evaluate it in production settings.

Instructions and Navigation

All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.

The code will look like the following:

List<String> list = new ArrayList<>();list.add("alpha");list.add("beta");list.add("beta");list.add("gamma");System.out.println(list);

You need to have any latest system with at least 2GB RAM and a Windows 7 /Ubuntu 14.04/Mac OS X operating system. Further, you will need to have Java 1.8.0 or above and Maven 3.0.0 or above installed.

Related Products
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Data Science Mastering Science Master GitHub

已有 2 人评分经验 论坛币 学术水平 热心指数 信用等级 收起 理由
aclyang + 40 精彩帖子
nuomin + 60 + 1 + 1 + 1 精彩帖子

总评分: 经验 + 60  论坛币 + 40  学术水平 + 1  热心指数 + 1  信用等级 + 1   查看全部评分

本帖被以下文库推荐

沙发
ReneeBK 发表于 2017-7-7 03:46:21
  1. package chapter03;

  2. import java.beans.BeanInfo;
  3. import java.beans.Introspector;
  4. import java.beans.PropertyDescriptor;
  5. import java.lang.reflect.Method;
  6. import java.util.List;
  7. import java.util.Map;
  8. import java.util.stream.Collectors;
  9. import java.util.stream.IntStream;

  10. import com.google.common.collect.Lists;
  11. import com.google.common.collect.Maps;

  12. import joinery.DataFrame;

  13. public class BeanToJoinery {

  14.     public static <E> DataFrame<Object> convert(List<E> beans, Class<E> beanClass) {
  15.         try {
  16.             return doConvert(beans, beanClass);
  17.         } catch (Exception e) {
  18.             throw new RuntimeException(e);
  19.         }
  20.     }

  21.     private static <E> DataFrame<Object> doConvert(List<E> beans, Class<E> beanClass) throws Exception {
  22.         int nrow = beans.size();
  23.         BeanInfo info = Introspector.getBeanInfo(beanClass);
  24.         PropertyDescriptor[] properties = info.getPropertyDescriptors();

  25.         Map<String, List<Object>> columns = Maps.newLinkedHashMap();

  26.         for (PropertyDescriptor pd : properties) {
  27.             String name = pd.getName();
  28.             if ("class".equals(name)) {
  29.                 continue;
  30.             }

  31.             Method getter = pd.getReadMethod();
  32.             if (getter == null) {
  33.                 continue;
  34.             }

  35.             List<Object> column = Lists.newArrayListWithCapacity(nrow);
  36.             for (E e : beans) {
  37.                 Object value = getter.invoke(e);
  38.                 column.add(value);
  39.             }

  40.             columns.put(name, column);
  41.         }

  42.         List<Integer> index = IntStream.range(0, nrow)
  43.                 .mapToObj(Integer::valueOf)
  44.                 .collect(Collectors.toList());

  45.         List<String> columnNames = Lists.newArrayList(columns.keySet());
  46.         List<List<Object>> data = Lists.newArrayList(columns.values());
  47.         return new DataFrame<>(index, columnNames, data);
  48.     }

  49. }
复制代码

藤椅
ReneeBK 发表于 2017-7-7 03:46:49
  1. package chapter03;

  2. import java.io.IOException;
  3. import java.util.List;
  4. import java.util.Map;
  5. import java.util.Map.Entry;
  6. import java.util.function.Function;
  7. import java.util.function.ToDoubleFunction;
  8. import java.util.stream.Collectors;

  9. import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;
  10. import org.apache.commons.math3.stat.descriptive.SummaryStatistics;

  11. import com.google.common.collect.Maps;

  12. public class CommonsMathExample {

  13.     public static void main(String[] args) throws IOException {
  14.         List<RankedPage> data = Data.readRankedPages();
  15.         summaryStats(data);
  16.         descStats(data);
  17.         proportion(data);
  18.         groupByDescStats(data);
  19.     }

  20.     private static void summaryStats(List<RankedPage> data) {
  21.         SummaryStatistics summary = new SummaryStatistics();
  22.         data.stream().mapToDouble(RankedPage::getBodyContentLength).forEach(summary::addValue);
  23.         System.out.println(summary.getSummary());
  24.     }

  25.     private static void descStats(List<RankedPage> data) {
  26.         double[] dataArray = data.stream().mapToDouble(RankedPage::getBodyContentLength).toArray();
  27.         DescriptiveStatistics desc = new DescriptiveStatistics(dataArray);
  28.         System.out.printf("min: %9.1f%n", desc.getMin());
  29.         System.out.printf("p05: %9.1f%n", desc.getPercentile(5));
  30.         System.out.printf("p25: %9.1f%n", desc.getPercentile(25));
  31.         System.out.printf("p50: %9.1f%n", desc.getPercentile(50));
  32.         System.out.printf("p75: %9.1f%n", desc.getPercentile(75));
  33.         System.out.printf("p95: %9.1f%n", desc.getPercentile(95));
  34.         System.out.printf("max: %9.1f%n", desc.getMax());
  35.     }

  36.     private static void proportion(List<RankedPage> data) {
  37.         double proportion = data.stream()
  38.                 .mapToInt(p -> p.getBodyContentLength() == 0 ? 1 : 0)
  39.                 .average().getAsDouble();
  40.         System.out.printf("proportion of zero content length: %.5f%n", proportion);
  41.     }

  42.     private static void groupByDescStats(List<RankedPage> data) {
  43.         System.out.println();

  44.         Map<Integer, List<RankedPage>> byPage = data.stream()
  45.                 .filter(p -> p.getBodyContentLength() != 0)
  46.                 .collect(Collectors.groupingBy(RankedPage::getPage));

  47.         List<DescriptiveStatistics> stats = byPage.entrySet().stream()
  48.                 .sorted(Map.Entry.comparingByKey())
  49.                 .map(e -> calculate(e.getValue(), RankedPage::getBodyContentLength))
  50.                 .collect(Collectors.toList());

  51.         Map<String, Function<DescriptiveStatistics, Double>> functions = Maps.newLinkedHashMap();
  52.         functions.put("min", d -> d.getMin());
  53.         functions.put("p05", d -> d.getPercentile(5));
  54.         functions.put("p25", d -> d.getPercentile(25));
  55.         functions.put("p50", d -> d.getPercentile(50));
  56.         functions.put("p75", d -> d.getPercentile(75));
  57.         functions.put("p95", d -> d.getPercentile(95));
  58.         functions.put("max", d -> d.getMax());

  59.         System.out.print("page");
  60.         for (Integer page : byPage.keySet()) {
  61.             System.out.printf("%9d ", page);
  62.         }
  63.         System.out.println();

  64.         for (Entry<String, Function<DescriptiveStatistics, Double>> pair : functions.entrySet()) {
  65.             System.out.print(pair.getKey());
  66.             Function<DescriptiveStatistics, Double> function = pair.getValue();
  67.             System.out.print(" ");
  68.             for (DescriptiveStatistics ds : stats) {
  69.                 System.out.printf("%9.1f ", function.apply(ds));
  70.             }
  71.             System.out.println();
  72.         }
  73.     }

  74.     private static DescriptiveStatistics calculate(List<RankedPage> data, ToDoubleFunction<RankedPage> getter) {
  75.         double[] dataArray = data.stream().mapToDouble(getter).toArray();
  76.         return new DescriptiveStatistics(dataArray);
  77.     }

  78. }
复制代码

板凳
ReneeBK 发表于 2017-7-7 03:50:07
  1. package chapter04.classification;

  2. import java.io.PrintStream;
  3. import java.util.List;
  4. import java.util.function.Function;

  5. import org.apache.commons.io.output.NullOutputStream;
  6. import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;

  7. import chapter04.cv.Dataset;
  8. import chapter04.cv.Split;
  9. import de.bwaldvogel.liblinear.Feature;
  10. import de.bwaldvogel.liblinear.FeatureNode;
  11. import de.bwaldvogel.liblinear.Linear;
  12. import de.bwaldvogel.liblinear.Model;
  13. import de.bwaldvogel.liblinear.Parameter;
  14. import de.bwaldvogel.liblinear.Problem;

  15. public class LibLinear {

  16.     public static void mute() {
  17.         PrintStream devNull = new PrintStream(new NullOutputStream());
  18.         Linear.setDebugOutput(devNull);
  19.     }

  20.     public static DescriptiveStatistics crossValidate(List<Split> folds, Function<Dataset, Model> trainer) {
  21.         double[] aucs = folds.parallelStream().mapToDouble(fold -> {
  22.             Dataset foldTrain = fold.getTrain();
  23.             Dataset foldValidation = fold.getTest();
  24.             Model model = trainer.apply(foldTrain);
  25.             return auc(model, foldValidation);
  26.         }).toArray();

  27.         return new DescriptiveStatistics(aucs);
  28.     }

  29.     public static Model train(Dataset dataset, Parameter param) {
  30.         Problem problem = wrapDataset(dataset);
  31.         return Linear.train(problem, param);
  32.     }

  33.     private static Problem wrapDataset(Dataset dataset) {
  34.         double[][] X = dataset.getX();
  35.         double[] y = dataset.getY();

  36.         Problem problem = new Problem();
  37.         problem.x = wrapMatrix(X);
  38.         problem.y = y;
  39.         problem.n = X[0].length + 1;
  40.         problem.l = X.length;

  41.         return problem;
  42.     }

  43.     public static double auc(Model model, Dataset dataset) {
  44.         double[] scores;
  45.         if (model.isProbabilityModel()) {
  46.             scores = predictProba(model, dataset);
  47.         } else {
  48.             scores = predictValues(model, dataset);
  49.             scores = sigmoid(scores);
  50.         }

  51.         return Metrics.auc(dataset.getY(), scores);
  52.     }

  53.     public static double[] predictProba(Model model, Dataset dataset) {
  54.         int n = dataset.length();

  55.         double[][] X = dataset.getX();
  56.         double[] results = new double[n];
  57.         double[] probs = new double[2];

  58.         for (int i = 0; i < n; i++) {
  59.             Feature[] row = wrapRow(X[i]);
  60.             Linear.predictProbability(model, row, probs);
  61.             results[i] = probs[1];
  62.         }

  63.         return results;
  64.     }

  65.     public static double[] predictValues(Model model, Dataset dataset) {
  66.         int n = dataset.length();

  67.         double[][] X = dataset.getX();
  68.         double[] results = new double[n];
  69.         double[] values = new double[1];

  70.         for (int i = 0; i < n; i++) {
  71.             Feature[] row = wrapRow(X[i]);
  72.             Linear.predictValues(model, row, values);
  73.             results[i] = values[0];
  74.         }

  75.         return results;
  76.     }

  77.     public static double[] sigmoid(double[] scores) {
  78.         double[] result = new double[scores.length];

  79.         for (int i = 0; i < result.length; i++) {
  80.             result[i] = 1 / (1 + Math.exp(-scores[i]));
  81.         }

  82.         return result;
  83.     }

  84.     private static Feature[][] wrapMatrix(double[][] X) {
  85.         int n = X.length;
  86.         Feature[][] matrix = new Feature[n][];
  87.         for (int i = 0; i < n; i++) {
  88.             matrix[i] = wrapRow(X[i]);
  89.         }
  90.         return matrix;
  91.     }

  92.     private static Feature[] wrapRow(double[] row) {
  93.         int m = row.length;
  94.         Feature[] result = new Feature[m];

  95.         for (int i = 0; i < m; i++) {
  96.             result[i] = new FeatureNode(i + 1, row[i]);
  97.         }

  98.         return result;
  99.     }
  100. }
复制代码

报纸
bocm 发表于 2017-7-7 04:17:18
THANKS FOR SHARING

地板
dst1213 发表于 2017-7-7 06:04:29
谢谢分享                  

7
flywindbird 发表于 2017-7-7 08:30:22
看看,似乎不错

8
franky_sas 发表于 2017-7-7 09:11:28

9
bail 发表于 2017-7-7 09:21:56
thanks for sharing

10
maxine2001 发表于 2017-7-8 06:20:28

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-2 09:55