楼主: cmwei333
2547 9

【实用数据分析】Practical Data Analysis (2016, 2nd Edition) [推广有奖]

贵宾

已卖:205124份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库  其他...

【历史+心理学+社会自然科学】

【数学+统计+计算机编程】

【金融+经济+商学+国际政治】

威望
6
论坛币
3606565 个
通用积分
1126.1851
学术水平
4327 点
热心指数
4650 点
信用等级
3957 点
经验
363248 点
帖子
9795
精华
9
在线时间
2842 小时
注册时间
2015-2-9
最后登录
2017-1-29

初级热心勋章 中级热心勋章 高级热心勋章 初级信用勋章 中级信用勋章 初级学术勋章 特级热心勋章 中级学术勋章 高级信用勋章 高级学术勋章 特级学术勋章 特级信用勋章

楼主
cmwei333 发表于 2016-10-10 04:26:42 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Practical Data Analysis - Second Edition

Hector Cuesta, Dr. Sampath Kumar

cover.jpg

A practical guide to obtaining, transforming, exploring, and analyzing data using Python, MongoDB, and Apache Spark

Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service.

This book explains the basic data algorithms without the theoretical jargon, and you’ll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark.

Table of Contents

1: GETTING STARTED
2: PREPROCESSING DATA
3: GETTING TO GRIPS WITH VISUALIZATION
4: TEXT CLASSIFICATION
5: SIMILARITY-BASED IMAGE RETRIEVAL
6: SIMULATION OF STOCK PRICES
7: PREDICTING GOLD PRICES
8: WORKING WITH SUPPORT VECTOR MACHINES
9: MODELING INFECTIOUS DISEASES WITH CELLULAR AUTOMATA
10: WORKING WITH SOCIAL GRAPHS
11: WORKING WITH TWITTER DATA
12: DATA PROCESSING AND AGGREGATION WITH MONGODB
13: WORKING WITH MAPREDUCE
14: ONLINE DATA ANALYSIS WITH JUPYTER AND WAKARI
15: UNDERSTANDING DATA PROCESSING USING APACHE SPARK

Practical Data Analysis (2nd Edition).pdf (24.73 MB, 需要: 20 个论坛币)
Practical Data Analysis (2nd Edition).mobi (39.14 MB, 需要: 20 个论坛币)
Practical Data Analysis (2nd Edition).epub (28.22 MB, 需要: 20 个论坛币)

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Practical Analysis Analysi Edition practic businesses obtaining practical products discover

本帖被以下文库推荐

bbs.pinggu.org/forum.php?mod=collection&action=view&ctid=3257
bbs.pinggu.org/forum.php?mod=collection&action=view&ctid=3258
bbs.pinggu.org/forum.php?mod=collection&action=view&ctid=3259

沙发
jjxm20060807(真实交易用户) 发表于 2016-10-10 07:04:52
谢谢分享

藤椅
20115326(真实交易用户) 学生认证  发表于 2016-10-10 08:37:09
学习了,哈哈

板凳
待琢璞玉(未真实交易用户) 学生认证  发表于 2016-10-19 22:50:21

报纸
铁齿铜牙纪晓岗(未真实交易用户) 发表于 2017-6-27 11:30:05 来自手机
有中文版吗?

地板
ReneeBK(未真实交易用户) 发表于 2017-9-6 09:27:00
  1. Processing the image dataset

  2. The image set used in this chapter is the Caltech-256 obtained from the Computational Vision Lab at CALTECH. We can download the collection of all 30607 images and 256 categories from the following link:

  3. http://www.vision.caltech.edu/Image_Datasets/Caltech256/

  4. In order to implement the DTW first, we need to extract a time series (pixel sequences) from each image. The time series will have a length of 768 values and will add the 256 values of each color in the RGB (Red, Green, and Blue) color model of each image. The following code implements the Image.open("Image.jpg") function and casts it into an array, and then simply adds the three vectors of color in the list:

  5. from PIL import Image
  6. img = Image.open("Image.jpg")
  7. arr = array(img)
  8. list = []
  9. for n in arr: list.append(n[0][0]) #R
  10. for n in arr: list.append(n[0][1]) #G
  11. for n in arr: list.append(n[0][2]) #B
复制代码

7
ReneeBK(未真实交易用户) 发表于 2017-9-6 09:27:59
  1. Implementing DTW

  2. In this example, we will look for a similarity in 684 images from eight categories. We will use four imports of PIL, numpy, mlpy, and collections:

  3. from PIL import Image
  4. from numpy import array
  5. import mlpy
  6. from collections import OrderedDict
  7. Tip
  8. First, we need to obtain the time series representation of the images and store it in a dictionary (data) with the number of the image and its time series data[fn] = list:

  9. data = {}
  10.   
  11. for fn in range(1,685):
  12.     img = Image.open("ImgFolder\\{0}.jpg".format(fn))
  13.     arr = array(img)
  14.     list = []
  15.     for n in arr: list.append(n[0][0])
  16.     for n in arr: list.append(n[0][1])
  17.     for n in arr: list.append(n[0][2])
  18.     data[fn] = list
  19. Tip
  20. The performance of this process will lie in the number of images processed, so beware of the use of memory with large datasets.

  21. Then, we need to select an image for reference, which will be compared to all the other images in the dictionary data:

  22. reference = data[31]
  23. Now we need to apply the mlpy.dtw_std function to all the elements and store the distance in the result dictionary:

  24. result ={}
  25. for x, y in data.items():
  26.     #print("{0} --------------- {1}".format(x,y))
  27.     dist = mlpy.dtw_std(reference, y, dist_only=True)
  28.     result[x] = dist
  29. Finally, we need to sort the result in order to find the closest elements using the function OrderedDict and then we can print the ordered result:

  30. sortedRes = OrderedDict(sorted(result.items(), key=lambda x: x[1]))
  31. for a,b in sortedRes.items():
  32.     print("{0}-{1}".format(a,b))
  33. In the following screenshot, we can see the result and we can observe that the result is accurate with the first element (reference time series). The first result presents a distance of 0.0 because it's exactly the same as the image we used as a reference.
复制代码

8
Nicolle(真实交易用户) 学生认证  发表于 2017-9-6 09:30:34
提示: 作者被禁止或删除 内容自动屏蔽

9
Nicolle(真实交易用户) 学生认证  发表于 2017-9-6 09:34:11
提示: 作者被禁止或删除 内容自动屏蔽

10
eeabcde(真实交易用户) 发表于 2017-10-12 07:53:15
谢谢分享

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jr
拉您进交流群
GMT+8, 2026-1-1 03:15