楼主: Lisrelchen
3020 18

[GitHub]Python Machine Learning Cookbook [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50288 个
通用积分
83.6306
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2017-7-7 22:41:10 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
  1. Python Machine Learning Cookbook by Packt Publishing

  2. ##Instructions and Navigation

  3. This is the code repository for Python Machine Learning Cookbook, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish. The code files are organized according to the chapters in the book. These code samples will work on any machine running Linux, Mac OS X, or Windows. Even though they are written and tested on Python 2.7, you can easily run them on Python 3.x with minimal changes.

  4. To run the code samples, you need to install scikit-learn, NumPy, SciPy, and matplotlib. For Chapter 6, you will need to install NLTK and gensim. To run the code in chapter 7, you need to install hmmlearn and python_speech_features. For chapter 8, you need to install Pandas and PyStruct. Chapter 8 also makes use of hmmlearn. For chapters 9 and 10, you need to install OpenCV. For chapter 11, you need to install NeuroLab.

  5. ##Description Machine learning is becoming increasingly pervasive in the modern data-driven world. It is used extensively across many fields like search engines, robotics, self-driving cars, and so on. During the course of this book, you will learn how to use Python to build a wide variety of machine learning applications to solve real-world problems. You will understand how to deal with different types of data like images, text, audio, and so on.

  6. We will explore various techniques in supervised and unsupervised learning. We will learn machine learning algorithms like Support Vector Machines, Random Forests, Hidden Markov Models, Conditional Random Fields, Deep Neural Networks, and many more. We will discuss about visualization techniques that can be used to interact with your data. Using these algorithms, we will discuss how to build recommendation engines, perform predictive modeling, build speech recognizers, perform sentiment analysis on text data, develop face recognition systems, and so on.

  7. You will understand what algorithms to use in a given context with the help of this exciting recipe-based guide. You will learn how to make informed decisions about the type of algorithms you need to use and learn how to implement those algorithms to get the best possible results. Stuck while making sense of images, text, speech, or some other form of data? This guide on applying machine learning techniques to each of these will come to your rescue! The code is well commented, so you will be able to get it up and running easily. The book contains all the relevant explanations of the algorithms that are used to build these applications.

  8. There is a lot of debate going on between Python 2.x and Python 3.x. While we believe that the world is moving forward with better versions coming out, a lot of developers still enjoy using Python 2.x. A lot of operating systems have Python 2.x built into them. It also helps in maintaining compatibility with Python libraries that haven't been ported to Python 3.x. Keeping that in mind, the code in this book is oriented towards Python 2.x. We have tried to keep all the code as agnostic as possible to the Python versions, so that Python 3.x users won't face too many issues. We are focused on utilizing the machine learning libraries in the best possible way in Python.

  9. ##Related Python/Machine Learning Products:

  10. Python Machine Learning
  11. Mastering Python Machine Learning
  12. OpenCV with Python By Example
复制代码

本帖隐藏的内容

Python-Machine-Learning-Cookbook-master.zip (13.68 MB)

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Learning Cookbook earning machine python

沙发
Lisrelchen 发表于 2017-7-7 22:42:37
  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. from sklearn.naive_bayes import GaussianNB

  4. from logistic_regression import plot_classifier

  5. input_file = 'data_multivar.txt'

  6. X = []
  7. y = []
  8. with open(input_file, 'r') as f:
  9.     for line in f.readlines():
  10.         data = [float(x) for x in line.split(',')]
  11.         X.append(data[:-1])
  12.         y.append(data[-1])

  13. X = np.array(X)
  14. y = np.array(y)

  15. classifier_gaussiannb = GaussianNB()
  16. classifier_gaussiannb.fit(X, y)
  17. y_pred = classifier_gaussiannb.predict(X)

  18. # compute accuracy of the classifier
  19. accuracy = 100.0 * (y == y_pred).sum() / X.shape[0]
  20. print "Accuracy of the classifier =", round(accuracy, 2), "%"

  21. plot_classifier(classifier_gaussiannb, X, y)

  22. ###############################################
  23. # Train test split
  24. from sklearn import cross_validation

  25. X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.25, random_state=5)
  26. classifier_gaussiannb_new = GaussianNB()
  27. classifier_gaussiannb_new.fit(X_train, y_train)
  28. y_test_pred = classifier_gaussiannb_new.predict(X_test)

  29. # compute accuracy of the classifier
  30. accuracy = 100.0 * (y_test == y_test_pred).sum() / X_test.shape[0]
  31. print "Accuracy of the classifier =", round(accuracy, 2), "%"

  32. plot_classifier(classifier_gaussiannb_new, X_test, y_test)

  33. ###############################################
  34. # Cross validation and scoring functions

  35. num_validations = 5
  36. accuracy = cross_validation.cross_val_score(classifier_gaussiannb,
  37.         X, y, scoring='accuracy', cv=num_validations)
  38. print "Accuracy: " + str(round(100*accuracy.mean(), 2)) + "%"

  39. f1 = cross_validation.cross_val_score(classifier_gaussiannb,
  40.         X, y, scoring='f1_weighted', cv=num_validations)
  41. print "F1: " + str(round(100*f1.mean(), 2)) + "%"

  42. precision = cross_validation.cross_val_score(classifier_gaussiannb,
  43.         X, y, scoring='precision_weighted', cv=num_validations)
  44. print "Precision: " + str(round(100*precision.mean(), 2)) + "%"

  45. recall = cross_validation.cross_val_score(classifier_gaussiannb,
  46.         X, y, scoring='recall_weighted', cv=num_validations)
  47. print "Recall: " + str(round(100*recall.mean(), 2)) + "%"
复制代码

藤椅
Lisrelchen 发表于 2017-7-7 22:44:49
  1. import numpy as np
  2. import matplotlib.pyplot as plt

  3. import utilities

  4. # Load input data
  5. input_file = 'data_multivar.txt'
  6. X, y = utilities.load_data(input_file)

  7. ###############################################
  8. # Separate the data into classes based on 'y'
  9. class_0 = np.array([X[i] for i in range(len(X)) if y[i]==0])
  10. class_1 = np.array([X[i] for i in range(len(X)) if y[i]==1])

  11. # Plot the input data
  12. plt.figure()
  13. plt.scatter(class_0[:,0], class_0[:,1], facecolors='black', edgecolors='black', marker='s')
  14. plt.scatter(class_1[:,0], class_1[:,1], facecolors='None', edgecolors='black', marker='s')
  15. plt.title('Input data')

  16. ###############################################
  17. # Train test split and SVM training
  18. from sklearn import cross_validation
  19. from sklearn.svm import SVC

  20. X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.25, random_state=5)

  21. params = {'kernel': 'linear'}
  22. #params = {'kernel': 'poly', 'degree': 3}
  23. #params = {'kernel': 'rbf'}
  24. classifier = SVC(**params)
  25. classifier.fit(X_train, y_train)
  26. utilities.plot_classifier(classifier, X_train, y_train, 'Training dataset')

  27. y_test_pred = classifier.predict(X_test)
  28. utilities.plot_classifier(classifier, X_test, y_test, 'Test dataset')

  29. ###############################################
  30. # Evaluate classifier performance

  31. from sklearn.metrics import classification_report

  32. target_names = ['Class-' + str(int(i)) for i in set(y)]
  33. print "\n" + "#"*30
  34. print "\nClassifier performance on training dataset\n"
  35. print classification_report(y_train, classifier.predict(X_train), target_names=target_names)
  36. print "#"*30 + "\n"

  37. print "#"*30
  38. print "\nClassification report on test dataset\n"
  39. print classification_report(y_test, y_test_pred, target_names=target_names)
  40. print "#"*30 + "\n"

  41. plt.show()
复制代码

板凳
Lisrelchen 发表于 2017-7-7 22:46:23
  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. from sklearn import metrics
  4. from sklearn.cluster import KMeans

  5. import utilities

  6. # Load data
  7. data = utilities.load_data('data_multivar.txt')
  8. num_clusters = 4

  9. # Plot data
  10. plt.figure()
  11. plt.scatter(data[:,0], data[:,1], marker='o',
  12.         facecolors='none', edgecolors='k', s=30)
  13. x_min, x_max = min(data[:, 0]) - 1, max(data[:, 0]) + 1
  14. y_min, y_max = min(data[:, 1]) - 1, max(data[:, 1]) + 1
  15. plt.title('Input data')
  16. plt.xlim(x_min, x_max)
  17. plt.ylim(y_min, y_max)
  18. plt.xticks(())
  19. plt.yticks(())

  20. # Train the model
  21. kmeans = KMeans(init='k-means++', n_clusters=num_clusters, n_init=10)
  22. kmeans.fit(data)

  23. # Step size of the mesh
  24. step_size = 0.01

  25. # Plot the boundaries
  26. x_min, x_max = min(data[:, 0]) - 1, max(data[:, 0]) + 1
  27. y_min, y_max = min(data[:, 1]) - 1, max(data[:, 1]) + 1
  28. x_values, y_values = np.meshgrid(np.arange(x_min, x_max, step_size), np.arange(y_min, y_max, step_size))

  29. # Predict labels for all points in the mesh
  30. predicted_labels = kmeans.predict(np.c_[x_values.ravel(), y_values.ravel()])

  31. # Plot the results
  32. predicted_labels = predicted_labels.reshape(x_values.shape)
  33. plt.figure()
  34. plt.clf()
  35. plt.imshow(predicted_labels, interpolation='nearest',
  36.            extent=(x_values.min(), x_values.max(), y_values.min(), y_values.max()),
  37.            cmap=plt.cm.Paired,
  38.            aspect='auto', origin='lower')

  39. plt.scatter(data[:,0], data[:,1], marker='o',
  40.         facecolors='none', edgecolors='k', s=30)

  41. centroids = kmeans.cluster_centers_
  42. plt.scatter(centroids[:,0], centroids[:,1], marker='o', s=200, linewidths=3,
  43.         color='k', zorder=10, facecolors='black')
  44. x_min, x_max = min(data[:, 0]) - 1, max(data[:, 0]) + 1
  45. y_min, y_max = min(data[:, 1]) - 1, max(data[:, 1]) + 1
  46. plt.title('Centoids and boundaries obtained using KMeans')
  47. plt.xlim(x_min, x_max)
  48. plt.ylim(y_min, y_max)
  49. plt.xticks(())
  50. plt.yticks(())
  51. plt.show()
复制代码

报纸
Lisrelchen 发表于 2017-7-7 22:47:27
  1. import csv

  2. import numpy as np
  3. from sklearn import cluster, covariance, manifold
  4. from sklearn.cluster import MeanShift, estimate_bandwidth
  5. import matplotlib.pyplot as plt

  6. # Load data from input file
  7. input_file = 'wholesale.csv'
  8. file_reader = csv.reader(open(input_file, 'rb'), delimiter=',')
  9. X = []
  10. for count, row in enumerate(file_reader):
  11.     if not count:
  12.         names = row[2:]
  13.         continue

  14.     X.append([float(x) for x in row[2:]])

  15. # Input data as numpy array
  16. X = np.array(X)

  17. # Estimating the bandwidth
  18. bandwidth = estimate_bandwidth(X, quantile=0.8, n_samples=len(X))

  19. # Compute clustering with MeanShift
  20. meanshift_estimator = MeanShift(bandwidth=bandwidth, bin_seeding=True)
  21. meanshift_estimator.fit(X)
  22. labels = meanshift_estimator.labels_
  23. centroids = meanshift_estimator.cluster_centers_
  24. num_clusters = len(np.unique(labels))

  25. print "\nNumber of clusters in input data =", num_clusters

  26. print "\nCentroids of clusters:"
  27. print '\t'.join([name[:3] for name in names])
  28. for centroid in centroids:
  29.     print '\t'.join([str(int(x)) for x in centroid])

  30. ################
  31. # Visualizing data

  32. centroids_milk_groceries = centroids[:, 1:3]

  33. # Plot the nodes using the coordinates of our centroids_milk_groceries
  34. plt.figure()
  35. plt.scatter(centroids_milk_groceries[:,0], centroids_milk_groceries[:,1],
  36.         s=100, edgecolors='k', facecolors='none')

  37. offset = 0.2
  38. plt.xlim(centroids_milk_groceries[:,0].min() - offset * centroids_milk_groceries[:,0].ptp(),
  39.         centroids_milk_groceries[:,0].max() + offset * centroids_milk_groceries[:,0].ptp(),)
  40. plt.ylim(centroids_milk_groceries[:,1].min() - offset * centroids_milk_groceries[:,1].ptp(),
  41.         centroids_milk_groceries[:,1].max() + offset * centroids_milk_groceries[:,1].ptp())

  42. plt.title('Centroids of clusters for milk and groceries')
  43. plt.show()
复制代码

地板
MouJack007 发表于 2017-7-7 23:01:10
  1. import numpy as np
  2. from sklearn import preprocessing
  3. from sklearn.ensemble import RandomForestClassifier
  4. import matplotlib.pyplot as plt

  5. input_file = 'car.data.txt'

  6. # Reading the data
  7. X = []
  8. y = []
  9. count = 0
  10. with open(input_file, 'r') as f:
  11.     for line in f.readlines():
  12.         data = line[:-1].split(',')
  13.         X.append(data)

  14. X = np.array(X)

  15. # Convert string data to numerical data
  16. label_encoder = []
  17. X_encoded = np.empty(X.shape)
  18. for i,item in enumerate(X[0]):
  19.     label_encoder.append(preprocessing.LabelEncoder())
  20.     X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])

  21. X = X_encoded[:, :-1].astype(int)
  22. y = X_encoded[:, -1].astype(int)

  23. # Build a Random Forest classifier
  24. params = {'n_estimators': 200, 'max_depth': 8, 'random_state': 7}
  25. classifier = RandomForestClassifier(**params)
  26. classifier.fit(X, y)

  27. # Cross validation
  28. from sklearn import cross_validation

  29. accuracy = cross_validation.cross_val_score(classifier,
  30.         X, y, scoring='accuracy', cv=3)
  31. print "Accuracy of the classifier: " + str(round(100*accuracy.mean(), 2)) + "%"

  32. # Testing encoding on single data instance
  33. input_data = ['vhigh', 'vhigh', '2', '2', 'small', 'low']
  34. input_data_encoded = [-1] * len(input_data)
  35. for i,item in enumerate(input_data):
  36.     input_data_encoded[i] = int(label_encoder[i].transform(input_data[i]))

  37. input_data_encoded = np.array(input_data_encoded)

  38. # Predict and print output for a particular datapoint
  39. output_class = classifier.predict(input_data_encoded)
  40. print "Output class:", label_encoder[-1].inverse_transform(output_class)[0]

  41. ########################
  42. # Validation curves

  43. from sklearn.learning_curve import validation_curve

  44. classifier = RandomForestClassifier(max_depth=4, random_state=7)

  45. parameter_grid = np.linspace(25, 200, 8).astype(int)
  46. train_scores, validation_scores = validation_curve(classifier, X, y,
  47.         "n_estimators", parameter_grid, cv=5)
  48. print "\n##### VALIDATION CURVES #####"
  49. print "\nParam: n_estimators\nTraining scores:\n", train_scores
  50. print "\nParam: n_estimators\nValidation scores:\n", validation_scores

  51. # Plot the curve
  52. plt.figure()
  53. plt.plot(parameter_grid, 100*np.average(train_scores, axis=1), color='black')
  54. plt.title('Training curve')
  55. plt.xlabel('Number of estimators')
  56. plt.ylabel('Accuracy')
  57. plt.show()

  58. classifier = RandomForestClassifier(n_estimators=20, random_state=7)
  59. parameter_grid = np.linspace(2, 10, 5).astype(int)
  60. train_scores, valid_scores = validation_curve(classifier, X, y,
  61.         "max_depth", parameter_grid, cv=5)
  62. print "\nParam: max_depth\nTraining scores:\n", train_scores
  63. print "\nParam: max_depth\nValidation scores:\n", validation_scores

  64. # Plot the curve
  65. plt.figure()
  66. plt.plot(parameter_grid, 100*np.average(train_scores, axis=1), color='black')
  67. plt.title('Validation curve')
  68. plt.xlabel('Maximum depth of the tree')
  69. plt.ylabel('Accuracy')
  70. plt.show()

  71. ########################
  72. # Learning curves

  73. from sklearn.learning_curve import learning_curve

  74. classifier = RandomForestClassifier(random_state=7)

  75. parameter_grid = np.array([200, 500, 800, 1100])
  76. train_sizes, train_scores, validation_scores = learning_curve(classifier,
  77.         X, y, train_sizes=parameter_grid, cv=5)
  78. print "\n##### LEARNING CURVES #####"
  79. print "\nTraining scores:\n", train_scores
  80. print "\nValidation scores:\n", validation_scores

  81. # Plot the curve
  82. plt.figure()
  83. plt.plot(parameter_grid, 100*np.average(train_scores, axis=1), color='black')
  84. plt.title('Learning curve')
  85. plt.xlabel('Number of training samples')
  86. plt.ylabel('Accuracy')
  87. plt.show()
复制代码

7
MouJack007 发表于 2017-7-7 23:02:24
  1. import numpy as np
  2. from sklearn import linear_model
  3. import matplotlib.pyplot as plt

  4. def plot_classifier(classifier, X, y):
  5.     # define ranges to plot the figure
  6.     x_min, x_max = min(X[:, 0]) - 1.0, max(X[:, 0]) + 1.0
  7.     y_min, y_max = min(X[:, 1]) - 1.0, max(X[:, 1]) + 1.0

  8.     # denotes the step size that will be used in the mesh grid
  9.     step_size = 0.01

  10.     # define the mesh grid
  11.     x_values, y_values = np.meshgrid(np.arange(x_min, x_max, step_size), np.arange(y_min, y_max, step_size))

  12.     # compute the classifier output
  13.     mesh_output = classifier.predict(np.c_[x_values.ravel(), y_values.ravel()])

  14.     # reshape the array
  15.     mesh_output = mesh_output.reshape(x_values.shape)

  16.     # Plot the output using a colored plot
  17.     plt.figure()

  18.     # choose a color scheme you can find all the options
  19.     # here: http://matplotlib.org/examples/color/colormaps_reference.html
  20.     plt.pcolormesh(x_values, y_values, mesh_output, cmap=plt.cm.gray)

  21.     # Overlay the training points on the plot
  22.     plt.scatter(X[:, 0], X[:, 1], c=y, s=80, edgecolors='black', linewidth=1, cmap=plt.cm.Paired)

  23.     # specify the boundaries of the figure
  24.     plt.xlim(x_values.min(), x_values.max())
  25.     plt.ylim(y_values.min(), y_values.max())

  26.     # specify the ticks on the X and Y axes
  27.     plt.xticks((np.arange(int(min(X[:, 0])-1), int(max(X[:, 0])+1), 1.0)))
  28.     plt.yticks((np.arange(int(min(X[:, 1])-1), int(max(X[:, 1])+1), 1.0)))

  29.     plt.show()

  30. if __name__=='__main__':
  31.     # input data
  32.     X = np.array([[4, 7], [3.5, 8], [3.1, 6.2], [0.5, 1], [1, 2], [1.2, 1.9], [6, 2], [5.7, 1.5], [5.4, 2.2]])
  33.     y = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])

  34.     # initialize the logistic regression classifier
  35.     classifier = linear_model.LogisticRegression(solver='liblinear', C=100)

  36.     # train the classifier
  37.     classifier.fit(X, y)

  38.     # draw datapoints and boundaries
  39.     plot_classifier(classifier, X, y)
复制代码

8
zwzhai 发表于 2017-7-7 23:08:23
  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. from sklearn.naive_bayes import GaussianNB

  4. from logistic_regression import plot_classifier

  5. input_file = 'data_multivar.txt'

  6. X = []
  7. y = []
  8. with open(input_file, 'r') as f:
  9.     for line in f.readlines():
  10.         data = [float(x) for x in line.split(',')]
  11.         X.append(data[:-1])
  12.         y.append(data[-1])

  13. X = np.array(X)
  14. y = np.array(y)

  15. classifier_gaussiannb = GaussianNB()
  16. classifier_gaussiannb.fit(X, y)
  17. y_pred = classifier_gaussiannb.predict(X)

  18. # compute accuracy of the classifier
  19. accuracy = 100.0 * (y == y_pred).sum() / X.shape[0]
  20. print "Accuracy of the classifier =", round(accuracy, 2), "%"

  21. plot_classifier(classifier_gaussiannb, X, y)

  22. ###############################################
  23. # Train test split
  24. from sklearn import cross_validation

  25. X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.25, random_state=5)
  26. classifier_gaussiannb_new = GaussianNB()
  27. classifier_gaussiannb_new.fit(X_train, y_train)
  28. y_test_pred = classifier_gaussiannb_new.predict(X_test)

  29. # compute accuracy of the classifier
  30. accuracy = 100.0 * (y_test == y_test_pred).sum() / X_test.shape[0]
  31. print "Accuracy of the classifier =", round(accuracy, 2), "%"

  32. plot_classifier(classifier_gaussiannb_new, X_test, y_test)

  33. ###############################################
  34. # Cross validation and scoring functions

  35. num_validations = 5
  36. accuracy = cross_validation.cross_val_score(classifier_gaussiannb,
  37.         X, y, scoring='accuracy', cv=num_validations)
  38. print "Accuracy: " + str(round(100*accuracy.mean(), 2)) + "%"

  39. f1 = cross_validation.cross_val_score(classifier_gaussiannb,
  40.         X, y, scoring='f1_weighted', cv=num_validations)
  41. print "F1: " + str(round(100*f1.mean(), 2)) + "%"

  42. precision = cross_validation.cross_val_score(classifier_gaussiannb,
  43.         X, y, scoring='precision_weighted', cv=num_validations)
  44. print "Precision: " + str(round(100*precision.mean(), 2)) + "%"

  45. recall = cross_validation.cross_val_score(classifier_gaussiannb,
  46.         X, y, scoring='recall_weighted', cv=num_validations)
  47. print "Recall: " + str(round(100*recall.mean(), 2)) + "%"
复制代码

9
cloudoversea 发表于 2017-7-7 23:09:30
看看     

10
钱学森64 发表于 2017-7-8 01:04:25
谢谢分享

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-22 20:16