人大经济论坛 › 论坛 › 计量经济学与统计论坛五区 › 计量经济学与统计软件 › winbugs及其他软件专版 › 【GitHub】Data Science Algorithms in a Week

发帖

楼主: Lisrelchen

2490 11

【GitHub】Data Science Algorithms in a Week [推广有奖]

0关注
62粉丝

VIP

已卖：4196份资源

院士

67%

还不是VIP/贵宾

TA的文库 其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望: 0 级
论坛币: 50294 个
通用积分: 83.8106
学术水平: 253 点
热心指数: 300 点
信用等级: 208 点
经验: 41518 点
帖子: 3256
精华: 14
在线时间: 766 小时
注册时间: 2006-5-4
最后登录: 2022-11-6

楼主

Lisrelchen 发表于 2017-8-13 02:49:21 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Data Science Algorithms in a Week

本帖隐藏的内容

https://github.com/PacktPublishing/Data-Science-Algorithms-in-a-Week

This is the code repository for Data Science Algorithms in a Week, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.

About the Book

This book will address the problems related to accurate and efficient data classification and prediction. Over the course of 7 days, you will be introduced to seven algorithms, along with exercises that will help you learn different aspects of machine learning. You will see how to pre-cluster your data to optimize and classify it for large datasets. You will then find out how to predict data based on the existing trends in your datasets.

Related Products

Suggestions and Feedback

Click here if you have any feedback or suggestions.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏1 回帖

关键词：Data Science Algorithms Algorithm Science Data

本帖被以下文库推荐

· 编程语言(Coding Languages)|主题: 3936, 订阅: 126

沙发

Lisrelchen 发表于 2017-8-13 02:50:24

# A program that reads the CSV file with the data and returns
# the Bayesian probability for the unknown value denoted by ? to
# belong to a certain class.
# An input CSV file should be of the following format:
# 1. items in a row should be separated by a comma ','
# 2. the first row should be a heading - should contain a name for each
# column of the data.
# 3. the remaining rows should contain the data itself - rows with
# complete and rows with the incomplete data.
# A row with complete data is the row that has a non-empty and
# non-question mark value for each column. A row with incomplete data is
# the row that has the last column with the value of a question mark ?.
# Please, run this file on the example chess.csv to understand this help
# better:
# $ python naive_bayes.py chess.csv
import imp
import sys
sys.path.append('../common')
import common # noqa
# Calculates the Baysian probability for the rows of incomplete data and
# returns them completed by the Bayesian probabilities. complete_data
# are the rows with the data that is complete and are used to calculate
# the conditional probabilities to complete the incomplete data.
def bayes_probability(heading, complete_data, incomplete_data,
enquired_column):
conditional_counts = {}
enquired_column_classes = {}
for data_item in complete_data:
common.dic_inc(enquired_column_classes,
data_item[enquired_column])
for i in range(0, len(heading)):
if i != enquired_column:
common.dic_inc(
conditional_counts, (
heading[i], data_item[i],
data_item[enquired_column]))
completed_items = []
for incomplete_item in incomplete_data:
partial_probs = {}
complete_probs = {}
probs_sum = 0
for enquired_group in enquired_column_classes.items():
# For each class in the of the enquired variable A calculate
# the probability P(A)*P(B1|A)*P(B2|A)*...*P(Bn|A) where
# B1,...,Bn are the remaining variables.
probability = float(common.dic_key_count(
enquired_column_classes,
enquired_group[0])) / len(complete_data)
for i in range(0, len(heading)):
if i != enquired_column:
probability = probability * (float(
common.dic_key_count(
conditional_counts, (
heading[i], incomplete_item[i],
enquired_group[0]))) / (
common.dic_key_count(enquired_column_classes,
enquired_group[0])))
partial_probs[enquired_group[0]] = probability
probs_sum += probability
for enquired_group in enquired_column_classes.items():
complete_probs[enquired_group[0]
] = partial_probs[enquired_group[0]
] / probs_sum
incomplete_item[enquired_column] = complete_probs
completed_items.append(incomplete_item)
return completed_items
# Program start
if len(sys.argv) < 2:
sys.exit('Please, input as an argument the name of the CSV file.')
(heading, complete_data, incomplete_data,
enquired_column) = common.csv_file_to_ordered_data(sys.argv[1])
# Calculate the Bayesian probability for the incomplete data
# and output it.
completed_data = bayes_probability(
heading, complete_data, incomplete_data, enquired_column)
print completed_data

复制代码

藤椅

军旗飞扬

发表于 2017-8-13 06:28:04

谢谢楼主分享！

板凳

MouJack007 发表于 2017-8-13 07:29:02

谢谢楼主分享！

报纸

MouJack007 发表于 2017-8-13 07:29:19

地板

albertwishedu 发表于 2017-8-13 08:51:44

7楼

ncme2011 发表于 2017-8-14 02:56:00

多谢！

8楼

paulinokok 发表于 2017-8-14 15:05:03

thank yhou

9楼

lianqu 发表于 2017-8-15 09:31:02

10楼

igs816

发表于 2017-8-18 21:52:56

PDF格式?

返回列表

12 下一页

发帖

本版微信群

加好友,备注jltj
拉您入交流群

京ICP备16021002号-2 京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明

【GitHub】Data Science Algorithms in a Week [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

高级学术勋章

特级学术勋章

高级信用勋章

特级信用勋章

高级热心勋章

特级热心勋章

本版微信群

【GitHub】Data Science Algorithms in a Week [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我 拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

高级学术勋章

特级学术勋章

高级信用勋章

特级信用勋章

高级热心勋章

特级热心勋章

本版微信群

扫码加我拉你入群