【实用数据分析】Practical Data Analysis (2016, 2nd Edition)

6关注
3514
粉丝

贵宾

已卖：205323份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库 其他...

【历史+心理学+社会自然科学】

【数学+统计+计算机编程】

【金融+经济+商学+国际政治】

0%

威望: 6 级
论坛币: 3609178 个
通用积分: 1140.2170
学术水平: 4327 点
热心指数: 4650 点
信用等级: 3957 点
经验: 363438 点
帖子: 9795
精华: 9
在线时间: 2842 小时
注册时间: 2015-2-9
最后登录: 2017-1-29

楼主

cmwei333 发表于 2016-10-10 04:26:42 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Practical Data Analysis - Second Edition

Hector Cuesta, Dr. Sampath Kumar

A practical guide to obtaining, transforming, exploring, and analyzing data using Python, MongoDB, and Apache Spark

Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service.

This book explains the basic data algorithms without the theoretical jargon, and you’ll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark.

Table of Contents

1: GETTING STARTED
2: PREPROCESSING DATA
3: GETTING TO GRIPS WITH VISUALIZATION
4: TEXT CLASSIFICATION
5: SIMILARITY-BASED IMAGE RETRIEVAL
6: SIMULATION OF STOCK PRICES
7: PREDICTING GOLD PRICES
8: WORKING WITH SUPPORT VECTOR MACHINES
9: MODELING INFECTIOUS DISEASES WITH CELLULAR AUTOMATA
10: WORKING WITH SOCIAL GRAPHS
11: WORKING WITH TWITTER DATA
12: DATA PROCESSING AND AGGREGATION WITH MONGODB
13: WORKING WITH MAPREDUCE
14: ONLINE DATA ANALYSIS WITH JUPYTER AND WAKARI
15: UNDERSTANDING DATA PROCESSING USING APACHE SPARK

Practical Data Analysis (2nd Edition).pdf (24.73 MB, 需要: 20 个论坛币)

Practical Data Analysis (2nd Edition).mobi (39.14 MB, 需要: 20 个论坛币)

Practical Data Analysis (2nd Edition).epub (28.22 MB, 需要: 20 个论坛币)

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏3 回帖

关键词：Practical Analysis Analysi Edition practic businesses obtaining practical products discover

本帖被以下文库推荐

· 编程语言(Coding Languages)|主题: 3936, 订阅: 126
· 东西方数据挖掘|主题: 1798, 订阅: 171
· 2万+全球顶级名校/投行英文文献 |主题: 21710, 订阅: 2698

bbs.pinggu.org/forum.php?mod=collection&action=view&ctid=3257
bbs.pinggu.org/forum.php?mod=collection&action=view&ctid=3258
bbs.pinggu.org/forum.php?mod=collection&action=view&ctid=3259

沙发

jjxm20060807(真实交易用户) 发表于 2016-10-10 07:04:52

谢谢分享

藤椅

20115326(真实交易用户)

发表于 2016-10-10 08:37:09

学习了，哈哈

板凳

待琢璞玉(未真实交易用户)

发表于 2016-10-19 22:50:21

报纸

铁齿铜牙纪晓岗(未真实交易用户) 发表于 2017-6-27 11:30:05 来自手机

有中文版吗？

地板

ReneeBK(未真实交易用户) 发表于 2017-9-6 09:27:00

Processing the image dataset
The image set used in this chapter is the Caltech-256 obtained from the Computational Vision Lab at CALTECH. We can download the collection of all 30607 images and 256 categories from the following link:
http://www.vision.caltech.edu/Image_Datasets/Caltech256/
In order to implement the DTW first, we need to extract a time series (pixel sequences) from each image. The time series will have a length of 768 values and will add the 256 values of each color in the RGB (Red, Green, and Blue) color model of each image. The following code implements the Image.open("Image.jpg") function and casts it into an array, and then simply adds the three vectors of color in the list:
from PIL import Image
img = Image.open("Image.jpg")
arr = array(img)
list = []
for n in arr: list.append(n[0][0]) #R
for n in arr: list.append(n[0][1]) #G
for n in arr: list.append(n[0][2]) #B

复制代码

7楼

ReneeBK(未真实交易用户) 发表于 2017-9-6 09:27:59

Implementing DTW
In this example, we will look for a similarity in 684 images from eight categories. We will use four imports of PIL, numpy, mlpy, and collections:
from PIL import Image
from numpy import array
import mlpy
from collections import OrderedDict
Tip
First, we need to obtain the time series representation of the images and store it in a dictionary (data) with the number of the image and its time series data[fn] = list:
data = {}
for fn in range(1,685):
img = Image.open("ImgFolder\\{0}.jpg".format(fn))
arr = array(img)
list = []
for n in arr: list.append(n[0][0])
for n in arr: list.append(n[0][1])
for n in arr: list.append(n[0][2])
data[fn] = list
Tip
The performance of this process will lie in the number of images processed, so beware of the use of memory with large datasets.
Then, we need to select an image for reference, which will be compared to all the other images in the dictionary data:
reference = data[31]
Now we need to apply the mlpy.dtw_std function to all the elements and store the distance in the result dictionary:
result ={}
for x, y in data.items():
#print("{0} --------------- {1}".format(x,y))
dist = mlpy.dtw_std(reference, y, dist_only=True)
result[x] = dist
Finally, we need to sort the result in order to find the closest elements using the function OrderedDict and then we can print the ordered result:
sortedRes = OrderedDict(sorted(result.items(), key=lambda x: x[1]))
for a,b in sortedRes.items():
print("{0}-{1}".format(a,b))
In the following screenshot, we can see the result and we can observe that the result is accurate with the first element (reference time series). The first result presents a distance of 0.0 because it's exactly the same as the image we used as a reference.

复制代码

加关注串个门加好友发消息 0关注 463 粉丝巨擘 Nicolle 当前离线阅读权限 255 威望 16 级论坛币 12403159 个通用积分 1639.2732 学术水平 3305 点热心指数 3329 点信用等级 3095 点经验 476993 点帖子 23839 精华 91 在线时间 9878 小时注册时间 2005-4-23 最后登录 2022-3-6 雷达卡	8楼 Nicolle(真实交易用户) 发表于 2017-9-6 09:30:34 提示: 作者被禁止或删除内容自动屏蔽

	回复举报

加关注串个门加好友发消息 0关注 463 粉丝巨擘 Nicolle 当前离线阅读权限 255 威望 16 级论坛币 12403159 个通用积分 1639.2732 学术水平 3305 点热心指数 3329 点信用等级 3095 点经验 476993 点帖子 23839 精华 91 在线时间 9878 小时注册时间 2005-4-23 最后登录 2022-3-6 雷达卡	9楼 Nicolle(真实交易用户) 发表于 2017-9-6 09:34:11 提示: 作者被禁止或删除内容自动屏蔽

	回复举报

10楼

eeabcde(真实交易用户) 发表于 2017-10-12 07:53:15

谢谢分享

【实用数据分析】Practical Data Analysis (2016, 2nd Edition) [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

初级热心勋章

中级热心勋章

高级热心勋章

初级信用勋章

中级信用勋章

初级学术勋章

特级热心勋章

中级学术勋章

高级信用勋章

高级学术勋章

特级学术勋章

特级信用勋章

本版微信群

【实用数据分析】Practical Data Analysis (2016, 2nd Edition) [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

初级热心勋章

中级热心勋章

高级热心勋章

初级信用勋章

中级信用勋章

初级学术勋章

特级热心勋章

中级学术勋章

高级信用勋章

高级学术勋章

特级学术勋章

特级信用勋章

本版微信群

扫码加我拉你入群