楼主: Lisrelchen
1458 3

[博客精选]Implementing your own k-nearest neighbour algorithm using Python [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4192份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50278 个
通用积分
83.5106
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2016-8-19 01:33:29 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

In machine learning, you may often wish to build predictors that allows to classify things into categories based on some set of associated values. For example, it is possible to provide a diagnosis to a patient based on data from previous patients.

Classification can involve constructing highly non-linear boundaries between classes, as in the case of the red, green and blue classes below:

本帖隐藏的内容

7.pdf (5.2 MB)



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Implementing implement Algorithm neighbour python neighbour

沙发
Lisrelchen 发表于 2016-8-19 01:34:08
  1. from sklearn.datasets import load_iris
  2. from sklearn import cross_validation
  3. from sklearn.metrics import classification_report, accuracy_score
  4. from operator import itemgetter
  5. import numpy as np
  6. import math
  7. from collections import Counter

  8. # 1) given two data points, calculate the euclidean distance between them
  9. def get_distance(data1, data2):
  10.     points = zip(data1, data2)
  11.     diffs_squared_distance = [pow(a - b, 2) for (a, b) in points]
  12.     return math.sqrt(sum(diffs_squared_distance))

  13. # 2) given a training set and a test instance, use getDistance to calculate all pairwise distances
  14. def get_neighbours(training_set, test_instance, k):
  15.     distances = [_get_tuple_distance(training_instance, test_instance) for training_instance in training_set]
  16.     # index 1 is the calculated distance between training_instance and test_instance
  17.     sorted_distances = sorted(distances, key=itemgetter(1))
  18.     # extract only training instances
  19.     sorted_training_instances = [tuple[0] for tuple in sorted_distances]
  20.     # select first k elements
  21.     return sorted_training_instances[:k]

  22. def _get_tuple_distance(training_instance, test_instance):
  23.     return (training_instance, get_distance(test_instance, training_instance[0]))

  24. # 3) given an array of nearest neighbours for a test case, tally up their classes to vote on test case class
  25. def get_majority_vote(neighbours):
  26.     # index 1 is the class
  27.     classes = [neighbour[1] for neighbour in neighbours]
  28.     count = Counter(classes)
  29.     return count.most_common()[0][0]

  30. # setting up main executable method
  31. def main():

  32.     # load the data and create the training and test sets
  33.     # random_state = 1 is just a seed to permit reproducibility of the train/test split
  34.     iris = load_iris()
  35.     X_train, X_test, y_train, y_test = cross_validation.train_test_split(iris.data, iris.target, test_size=0.4, random_state=1)

  36.     # reformat train/test datasets for convenience
  37.     train = np.array(zip(X_train,y_train))
  38.     test = np.array(zip(X_test, y_test))

  39.     # generate predictions
  40.     predictions = []

  41.     # let's arbitrarily set k equal to 5, meaning that to predict the class of new instances,
  42.     k = 5

  43.     # for each instance in the test set, get nearest neighbours and majority vote on predicted class
  44.     for x in range(len(X_test)):

  45.             print 'Classifying test instance number ' + str(x) + ":",
  46.             neighbours = get_neighbours(training_set=train, test_instance=test[x][0], k=5)
  47.             majority_vote = get_majority_vote(neighbours)
  48.             predictions.append(majority_vote)
  49.             print 'Predicted label=' + str(majority_vote) + ', Actual label=' + str(test[x][1])

  50.     # summarize performance of the classification
  51.     print '\nThe overall accuracy of the model is: ' + str(accuracy_score(y_test, predictions)) + "\n"
  52.     report = classification_report(y_test, predictions, target_names = iris.target_names)
  53.     print 'A detailed classification report: \n\n' + report

  54. if __name__ == "__main__":
  55.     main()
复制代码

藤椅
fengyg 企业认证  发表于 2016-8-19 07:44:31
kankan

板凳
ekscheng 发表于 2016-8-19 09:07:51

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-6 02:58