楼主: Lisrelchen
2461 11

【GitHub】Data Science Algorithms in a Week [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50288 个
通用积分
83.6906
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2017-8-13 02:49:21 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

Data Science Algorithms in a Week



本帖隐藏的内容

https://github.com/PacktPublishing/Data-Science-Algorithms-in-a-Week


This is the code repository for Data Science Algorithms in a Week, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.

About the Book

This book will address the problems related to accurate and efficient data classification and prediction. Over the course of 7 days, you will be introduced to seven algorithms, along with exercises that will help you learn different aspects of machine learning. You will see how to pre-cluster your data to optimize and classify it for large datasets. You will then find out how to predict data based on the existing trends in your datasets.

Related ProductsSuggestions and Feedback

Click here if you have any feedback or suggestions.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Data Science Algorithms Algorithm Science Data

已有 1 人评分经验 收起 理由
oliyiyi + 80 精彩帖子

总评分: 经验 + 80   查看全部评分

本帖被以下文库推荐

沙发
Lisrelchen 发表于 2017-8-13 02:50:24
  1. # A program that reads the CSV file with the data and returns
  2. # the Bayesian probability for the unknown value denoted by ? to
  3. # belong to a certain class.
  4. # An input CSV file should be of the following format:
  5. # 1. items in a row should be separated by a comma ','
  6. # 2. the first row should be a heading - should contain a name for each
  7. # column of the data.
  8. # 3. the remaining rows should contain the data itself - rows with
  9. # complete and rows with the incomplete data.
  10. # A row with complete data is the row that has a non-empty and
  11. # non-question mark value for each column. A row with incomplete data is
  12. # the row that has the last column with the value of a question mark ?.
  13. # Please, run this file on the example chess.csv to understand this help
  14. # better:
  15. # $ python naive_bayes.py chess.csv

  16. import imp
  17. import sys
  18. sys.path.append('../common')
  19. import common  # noqa

  20. # Calculates the Baysian probability for the rows of incomplete data and
  21. # returns them completed by the Bayesian probabilities. complete_data
  22. # are the rows with the data that is complete and are used to calculate
  23. # the conditional probabilities to complete the incomplete data.


  24. def bayes_probability(heading, complete_data, incomplete_data,
  25.                       enquired_column):
  26.     conditional_counts = {}
  27.     enquired_column_classes = {}
  28.     for data_item in complete_data:
  29.         common.dic_inc(enquired_column_classes,
  30.                        data_item[enquired_column])
  31.         for i in range(0, len(heading)):
  32.             if i != enquired_column:
  33.                 common.dic_inc(
  34.                     conditional_counts, (
  35.                         heading[i], data_item[i],
  36.                         data_item[enquired_column]))

  37.     completed_items = []
  38.     for incomplete_item in incomplete_data:
  39.         partial_probs = {}
  40.         complete_probs = {}
  41.         probs_sum = 0
  42.         for enquired_group in enquired_column_classes.items():
  43.             # For each class in the of the enquired variable A calculate
  44.             # the probability P(A)*P(B1|A)*P(B2|A)*...*P(Bn|A) where
  45.             # B1,...,Bn are the remaining variables.
  46.             probability = float(common.dic_key_count(
  47.                 enquired_column_classes,
  48.                 enquired_group[0])) / len(complete_data)
  49.             for i in range(0, len(heading)):
  50.                 if i != enquired_column:
  51.                     probability = probability * (float(
  52.                         common.dic_key_count(
  53.                             conditional_counts, (
  54.                                 heading[i], incomplete_item[i],
  55.                                 enquired_group[0]))) / (
  56.                         common.dic_key_count(enquired_column_classes,
  57.                                              enquired_group[0])))
  58.             partial_probs[enquired_group[0]] = probability
  59.             probs_sum += probability

  60.         for enquired_group in enquired_column_classes.items():
  61.             complete_probs[enquired_group[0]
  62.                            ] = partial_probs[enquired_group[0]
  63.                                              ] / probs_sum
  64.         incomplete_item[enquired_column] = complete_probs
  65.         completed_items.append(incomplete_item)
  66.     return completed_items

  67. # Program start
  68. if len(sys.argv) < 2:
  69.     sys.exit('Please, input as an argument the name of the CSV file.')

  70. (heading, complete_data, incomplete_data,
  71. enquired_column) = common.csv_file_to_ordered_data(sys.argv[1])

  72. # Calculate the Bayesian probability for the incomplete data
  73. # and output it.
  74. completed_data = bayes_probability(
  75.     heading, complete_data, incomplete_data, enquired_column)
  76. print completed_data
复制代码

藤椅
军旗飞扬 在职认证  发表于 2017-8-13 06:28:04
谢谢楼主分享!

板凳
MouJack007 发表于 2017-8-13 07:29:02
谢谢楼主分享!

报纸
MouJack007 发表于 2017-8-13 07:29:19

地板
albertwishedu 发表于 2017-8-13 08:51:44

7
ncme2011 发表于 2017-8-14 02:56:00
多谢!

8
paulinokok 发表于 2017-8-14 15:05:03
thank yhou

9
lianqu 发表于 2017-8-15 09:31:02

10
igs816 在职认证  发表于 2017-8-18 21:52:56
PDF格式?

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-10 23:21