请选择 进入手机版 | 继续访问电脑版
楼主: Lisrelchen
1273 4

【GitHub】Learning scikit-learn: Machine Learning in Python. [推广有奖]

  • 0关注
  • 62粉丝

VIP

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
49957 个
通用积分
79.5487
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

Lisrelchen 发表于 2017-4-22 08:36:53 |显示全部楼层 |坛友微信交流群

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Learning scikit-learn: Machine Learning in PythonIpython sources for each chapter of the book

This repository holds all the ipython source and data for the "Learning scikit-learn: machine learning in Python" book, by Raúl Garreta and Guillermo Moncecchi (http://www.packtpub.com/learning-scikit-learn-machine-in-python/book). For the planned 2nd edition, we added Diego Garat as a new author.


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:learning machine source

本帖被以下文库推荐

Lisrelchen 发表于 2017-4-22 08:40:17 |显示全部楼层 |坛友微信交流群
  1. K-Means
  2. from sklearn import cluster
  3. clf = cluster.KMeans(init='k-means++', n_clusters=10, random_state=42)
  4. clf.fit(X_train)
  5. print clf.labels_.shape
  6. print clf.labels_[1:10]
  7. print_digits(images_train, clf.labels_, max_n=10)
  8. # Predict clusters on testing data
  9. y_pred = clf.predict(X_test)

  10. def print_cluster(images, y_pred, cluster_number):
  11.     images = images[y_pred==cluster_number]
  12.     y_pred = y_pred[y_pred==cluster_number]
  13.     print_digits(images, y_pred, max_n=10)

  14. for i in range(10):
  15.      print_cluster(images_test, y_pred, i)
  16. from sklearn import metrics
  17. print "Addjusted rand score:{:.2}".format(metrics.adjusted_rand_score(y_test, y_pred))
  18. print "Homogeneity score:{:.2} ".format(metrics.homogeneity_score(y_test, y_pred))
  19. print "Completeness score: {:.2} ".format(metrics.completeness_score(y_test, y_pred))
  20. print "Confusion matrix"
  21. print metrics.confusion_matrix(y_test, y_pred)
  22. from sklearn import decomposition
  23. # in this case the seeding of the centers is deterministic, hence we run the
  24. # kmeans algorithm only once with n_init=1
  25. pca = decomposition.PCA(n_components=2).fit(X_train)
  26. reduced_X_train = pca.transform(X_train)
  27. # Step size of the mesh. Decrease to increase the quality of the VQ.
  28. h = .01     # point in the mesh [x_min, m_max]x[y_min, y_max].

  29. # Plot the decision boundary. For that, we will asign a color to each
  30. x_min, x_max = reduced_X_train[:, 0].min() + 1, reduced_X_train[:, 0].max() - 1
  31. y_min, y_max = reduced_X_train[:, 1].min() + 1, reduced_X_train[:, 1].max() - 1
  32. xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

  33. kmeans = cluster.KMeans(init='k-means++', n_clusters=n_digits, n_init=10)
  34. kmeans.fit(reduced_X_train)
  35. Z = kmeans.predict(np.c_[xx.ravel(), yy.ravel()])
  36. # Put the result into a color plot
  37. Z = Z.reshape(xx.shape)
  38. plt.figure(1)
  39. plt.clf()
  40. plt.imshow(Z, interpolation='nearest',
  41.           extent=(xx.min(), xx.max(), yy.min(), yy.max()),
  42.           cmap=plt.cm.Paired,
  43.           aspect='auto', origin='lower')
  44. #print reduced_X_train.shape

  45. plt.plot(reduced_X_train[:, 0], reduced_X_train[:, 1], 'k.', markersize=2)
  46. # Plot the centroids as a white X
  47. centroids = kmeans.cluster_centers_

  48. plt.scatter(centroids[:, 0], centroids[:, 1],
  49.            marker='.', s=169, linewidths=3,
  50.            color='w', zorder=10)

  51. plt.title('K-means clustering on the digits dataset (PCA-reduced data)\n'
  52.          'Centroids are marked with white dots')
  53. plt.xlim(x_min, x_max)
  54. plt.ylim(y_min, y_max)
  55. plt.xticks(())
  56. plt.yticks(())
  57. plt.show()
复制代码

使用道具

Lisrelchen 发表于 2017-4-22 08:44:19 |显示全部楼层 |坛友微信交流群

Affinity propagation

  1. In [8]:
  2. aff = cluster.AffinityPropagation()
  3. aff.fit(X_train)
  4. print aff.cluster_centers_indices_.shape

  5. In [9]:
  6. print_digits(images_train[aff.cluster_centers_indices_], y_train[aff.cluster_centers_indices_], max_n=aff.cluster_centers_indices_.shape[0])

  7. In [10]#MeanShift
  8. ms = cluster.MeanShift()
  9. ms.fit(X_train)
  10. print ms.cluster_centers_

  11. In [11]:print ms.cluster_centers_.shape
复制代码

使用道具

Lisrelchen 发表于 2017-4-22 08:47:40 |显示全部楼层 |坛友微信交流群

Mixture of Gaussian Models

  1. from sklearn import mixture

  2. # Define a heldout dataset to estimate covariance type
  3. X_train_heldout, X_test_heldout, y_train_heldout, y_test_heldout = train_test_split(
  4.         X_train, y_train,test_size=0.25, random_state=42)
  5. for covariance_type in ['spherical','tied','diag','full']:
  6.     gm=mixture.GMM(n_components=n_digits, covariance_type=covariance_type, random_state=42, n_init=5)
  7.     gm.fit(X_train_heldout)
  8.     y_pred=gm.predict(X_test_heldout)
  9.     print "Adjusted rand score for covariance={}:{:.2}".format(covariance_type, metrics.adjusted_rand_score(y_test_heldout, y_pred))

  10. gm = mixture.GMM(n_components=n_digits, covariance_type='tied', random_state=42)
  11. gm.fit(X_train)

  12. # Print train clustering and confusion matrix
  13. y_pred = gm.predict(X_test)
  14. print "Addjusted rand score:{:.2}".format(metrics.adjusted_rand_score(y_test, y_pred))
  15. print "Homogeneity score:{:.2} ".format(metrics.homogeneity_score(y_test, y_pred))
  16. print "Completeness score: {:.2} ".format(metrics.completeness_score(y_test, y_pred))
  17. for i in range(10):
  18.      print_cluster(images_test, y_pred, i)
  19. print "Confusion matrix"
  20. print metrics.confusion_matrix(y_test, y_pred)

  21. pl=plt
  22. from sklearn import decomposition
  23. # in this case the seeding of the centers is deterministic, hence we run the
  24. # kmeans algorithm only once with n_init=1
  25. pca = decomposition.PCA(n_components=2).fit(X_train)
  26. reduced_X_train = pca.transform(X_train)
  27. # Step size of the mesh. Decrease to increase the quality of the VQ.
  28. h = .01     # point in the mesh [x_min, m_max]x[y_min, y_max].

  29. # Plot the decision boundary. For that, we will asign a color to each
  30. # Taken from
  31. x_min, x_max = reduced_X_train[:, 0].min() + 1, reduced_X_train[:, 0].max() - 1
  32. y_min, y_max = reduced_X_train[:, 1].min() + 1, reduced_X_train[:, 1].max() - 1
  33. xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

  34. gm.fit(reduced_X_train)
  35. #print np.c_[xx.ravel(),yy.ravel()]
  36. Z = gm.predict(np.c_[xx.ravel(), yy.ravel()])
  37. # Put the result into a color plot
  38. Z = Z.reshape(xx.shape)
  39. pl.figure(1)
  40. pl.clf()
  41. pl.imshow(Z, interpolation='nearest',
  42.           extent=(xx.min(), xx.max(), yy.min(), yy.max()),
  43.           cmap=pl.cm.Paired,
  44.           aspect='auto', origin='lower')
  45. #print reduced_X_train.shape

  46. pl.plot(reduced_X_train[:, 0], reduced_X_train[:, 1], 'k.', markersize=2)
  47. # Plot the centroids as a white X
  48. centroids = gm.means_

  49. pl.scatter(centroids[:, 0], centroids[:, 1],
  50.            marker='.', s=169, linewidths=3,
  51.            color='w', zorder=10)

  52. pl.title('Mixture of gaussian models on the digits dataset (PCA-reduced data)\n'
  53.          'Means are marked with white dots')
  54. pl.xlim(x_min, x_max)
  55. pl.ylim(y_min, y_max)
  56. pl.xticks(())
  57. pl.yticks(())
  58. pl.show()
复制代码

使用道具

MouJack007 发表于 2017-4-22 09:33:30 |显示全部楼层 |坛友微信交流群
谢谢楼主分享!

使用道具

MouJack007 发表于 2017-4-22 09:33:53 |显示全部楼层 |坛友微信交流群

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-19 01:50