[GitHub]Python Machine Learning Cookbook

0关注
62粉丝

VIP

已卖：4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库 其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

0%

威望: 0 级
论坛币: 50288 个
通用积分: 83.6306
学术水平: 253 点
热心指数: 300 点
信用等级: 208 点
经验: 41518 点
帖子: 3256
精华: 14
在线时间: 766 小时
注册时间: 2006-5-4
最后登录: 2022-11-6

楼主

Lisrelchen 发表于 2017-7-7 22:41:10 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Python Machine Learning Cookbook by Packt Publishing
##Instructions and Navigation
This is the code repository for Python Machine Learning Cookbook, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish. The code files are organized according to the chapters in the book. These code samples will work on any machine running Linux, Mac OS X, or Windows. Even though they are written and tested on Python 2.7, you can easily run them on Python 3.x with minimal changes.
To run the code samples, you need to install scikit-learn, NumPy, SciPy, and matplotlib. For Chapter 6, you will need to install NLTK and gensim. To run the code in chapter 7, you need to install hmmlearn and python_speech_features. For chapter 8, you need to install Pandas and PyStruct. Chapter 8 also makes use of hmmlearn. For chapters 9 and 10, you need to install OpenCV. For chapter 11, you need to install NeuroLab.
##Description Machine learning is becoming increasingly pervasive in the modern data-driven world. It is used extensively across many fields like search engines, robotics, self-driving cars, and so on. During the course of this book, you will learn how to use Python to build a wide variety of machine learning applications to solve real-world problems. You will understand how to deal with different types of data like images, text, audio, and so on.
We will explore various techniques in supervised and unsupervised learning. We will learn machine learning algorithms like Support Vector Machines, Random Forests, Hidden Markov Models, Conditional Random Fields, Deep Neural Networks, and many more. We will discuss about visualization techniques that can be used to interact with your data. Using these algorithms, we will discuss how to build recommendation engines, perform predictive modeling, build speech recognizers, perform sentiment analysis on text data, develop face recognition systems, and so on.
You will understand what algorithms to use in a given context with the help of this exciting recipe-based guide. You will learn how to make informed decisions about the type of algorithms you need to use and learn how to implement those algorithms to get the best possible results. Stuck while making sense of images, text, speech, or some other form of data? This guide on applying machine learning techniques to each of these will come to your rescue! The code is well commented, so you will be able to get it up and running easily. The book contains all the relevant explanations of the algorithms that are used to build these applications.
There is a lot of debate going on between Python 2.x and Python 3.x. While we believe that the world is moving forward with better versions coming out, a lot of developers still enjoy using Python 2.x. A lot of operating systems have Python 2.x built into them. It also helps in maintaining compatibility with Python libraries that haven't been ported to Python 3.x. Keeping that in mind, the code in this book is oriented towards Python 2.x. We have tried to keep all the code as agnostic as possible to the Python versions, so that Python 3.x users won't face too many issues. We are focused on utilizing the machine learning libraries in the best possible way in Python.
##Related Python/Machine Learning Products:
Python Machine Learning
Mastering Python Machine Learning
OpenCV with Python By Example

复制代码

本帖隐藏的内容

Python-Machine-Learning-Cookbook-master.zip (13.68 MB)

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏1 回帖

关键词：Learning Cookbook earning machine python

相关帖子

沙发

Lisrelchen 发表于 2017-7-7 22:42:37

import numpy as np
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB
from logistic_regression import plot_classifier
input_file = 'data_multivar.txt'
X = []
y = []
with open(input_file, 'r') as f:
for line in f.readlines():
data = [float(x) for x in line.split(',')]
X.append(data[:-1])
y.append(data[-1])
X = np.array(X)
y = np.array(y)
classifier_gaussiannb = GaussianNB()
classifier_gaussiannb.fit(X, y)
y_pred = classifier_gaussiannb.predict(X)
# compute accuracy of the classifier
accuracy = 100.0 * (y == y_pred).sum() / X.shape[0]
print "Accuracy of the classifier =", round(accuracy, 2), "%"
plot_classifier(classifier_gaussiannb, X, y)
###############################################
# Train test split
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.25, random_state=5)
classifier_gaussiannb_new = GaussianNB()
classifier_gaussiannb_new.fit(X_train, y_train)
y_test_pred = classifier_gaussiannb_new.predict(X_test)
# compute accuracy of the classifier
accuracy = 100.0 * (y_test == y_test_pred).sum() / X_test.shape[0]
print "Accuracy of the classifier =", round(accuracy, 2), "%"
plot_classifier(classifier_gaussiannb_new, X_test, y_test)
###############################################
# Cross validation and scoring functions
num_validations = 5
accuracy = cross_validation.cross_val_score(classifier_gaussiannb,
X, y, scoring='accuracy', cv=num_validations)
print "Accuracy: " + str(round(100*accuracy.mean(), 2)) + "%"
f1 = cross_validation.cross_val_score(classifier_gaussiannb,
X, y, scoring='f1_weighted', cv=num_validations)
print "F1: " + str(round(100*f1.mean(), 2)) + "%"
precision = cross_validation.cross_val_score(classifier_gaussiannb,
X, y, scoring='precision_weighted', cv=num_validations)
print "Precision: " + str(round(100*precision.mean(), 2)) + "%"
recall = cross_validation.cross_val_score(classifier_gaussiannb,
X, y, scoring='recall_weighted', cv=num_validations)
print "Recall: " + str(round(100*recall.mean(), 2)) + "%"

复制代码

藤椅

Lisrelchen 发表于 2017-7-7 22:44:49

import numpy as np
import matplotlib.pyplot as plt
import utilities
# Load input data
input_file = 'data_multivar.txt'
X, y = utilities.load_data(input_file)
###############################################
# Separate the data into classes based on 'y'
class_0 = np.array([X[i] for i in range(len(X)) if y[i]==0])
class_1 = np.array([X[i] for i in range(len(X)) if y[i]==1])
# Plot the input data
plt.figure()
plt.scatter(class_0[:,0], class_0[:,1], facecolors='black', edgecolors='black', marker='s')
plt.scatter(class_1[:,0], class_1[:,1], facecolors='None', edgecolors='black', marker='s')
plt.title('Input data')
###############################################
# Train test split and SVM training
from sklearn import cross_validation
from sklearn.svm import SVC
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.25, random_state=5)
params = {'kernel': 'linear'}
#params = {'kernel': 'poly', 'degree': 3}
#params = {'kernel': 'rbf'}
classifier = SVC(**params)
classifier.fit(X_train, y_train)
utilities.plot_classifier(classifier, X_train, y_train, 'Training dataset')
y_test_pred = classifier.predict(X_test)
utilities.plot_classifier(classifier, X_test, y_test, 'Test dataset')
###############################################
# Evaluate classifier performance
from sklearn.metrics import classification_report
target_names = ['Class-' + str(int(i)) for i in set(y)]
print "\n" + "#"*30
print "\nClassifier performance on training dataset\n"
print classification_report(y_train, classifier.predict(X_train), target_names=target_names)
print "#"*30 + "\n"
print "#"*30
print "\nClassification report on test dataset\n"
print classification_report(y_test, y_test_pred, target_names=target_names)
print "#"*30 + "\n"
plt.show()

复制代码

板凳

Lisrelchen 发表于 2017-7-7 22:46:23

import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.cluster import KMeans
import utilities
# Load data
data = utilities.load_data('data_multivar.txt')
num_clusters = 4
# Plot data
plt.figure()
plt.scatter(data[:,0], data[:,1], marker='o',
facecolors='none', edgecolors='k', s=30)
x_min, x_max = min(data[:, 0]) - 1, max(data[:, 0]) + 1
y_min, y_max = min(data[:, 1]) - 1, max(data[:, 1]) + 1
plt.title('Input data')
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())
# Train the model
kmeans = KMeans(init='k-means++', n_clusters=num_clusters, n_init=10)
kmeans.fit(data)
# Step size of the mesh
step_size = 0.01
# Plot the boundaries
x_min, x_max = min(data[:, 0]) - 1, max(data[:, 0]) + 1
y_min, y_max = min(data[:, 1]) - 1, max(data[:, 1]) + 1
x_values, y_values = np.meshgrid(np.arange(x_min, x_max, step_size), np.arange(y_min, y_max, step_size))
# Predict labels for all points in the mesh
predicted_labels = kmeans.predict(np.c_[x_values.ravel(), y_values.ravel()])
# Plot the results
predicted_labels = predicted_labels.reshape(x_values.shape)
plt.figure()
plt.clf()
plt.imshow(predicted_labels, interpolation='nearest',
extent=(x_values.min(), x_values.max(), y_values.min(), y_values.max()),
cmap=plt.cm.Paired,
aspect='auto', origin='lower')
plt.scatter(data[:,0], data[:,1], marker='o',
facecolors='none', edgecolors='k', s=30)
centroids = kmeans.cluster_centers_
plt.scatter(centroids[:,0], centroids[:,1], marker='o', s=200, linewidths=3,
color='k', zorder=10, facecolors='black')
x_min, x_max = min(data[:, 0]) - 1, max(data[:, 0]) + 1
y_min, y_max = min(data[:, 1]) - 1, max(data[:, 1]) + 1
plt.title('Centoids and boundaries obtained using KMeans')
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())
plt.show()

复制代码

报纸

Lisrelchen 发表于 2017-7-7 22:47:27

import csv
import numpy as np
from sklearn import cluster, covariance, manifold
from sklearn.cluster import MeanShift, estimate_bandwidth
import matplotlib.pyplot as plt
# Load data from input file
input_file = 'wholesale.csv'
file_reader = csv.reader(open(input_file, 'rb'), delimiter=',')
X = []
for count, row in enumerate(file_reader):
if not count:
names = row[2:]
continue
X.append([float(x) for x in row[2:]])
# Input data as numpy array
X = np.array(X)
# Estimating the bandwidth
bandwidth = estimate_bandwidth(X, quantile=0.8, n_samples=len(X))
# Compute clustering with MeanShift
meanshift_estimator = MeanShift(bandwidth=bandwidth, bin_seeding=True)
meanshift_estimator.fit(X)
labels = meanshift_estimator.labels_
centroids = meanshift_estimator.cluster_centers_
num_clusters = len(np.unique(labels))
print "\nNumber of clusters in input data =", num_clusters
print "\nCentroids of clusters:"
print '\t'.join([name[:3] for name in names])
for centroid in centroids:
print '\t'.join([str(int(x)) for x in centroid])
################
# Visualizing data
centroids_milk_groceries = centroids[:, 1:3]
# Plot the nodes using the coordinates of our centroids_milk_groceries
plt.figure()
plt.scatter(centroids_milk_groceries[:,0], centroids_milk_groceries[:,1],
s=100, edgecolors='k', facecolors='none')
offset = 0.2
plt.xlim(centroids_milk_groceries[:,0].min() - offset * centroids_milk_groceries[:,0].ptp(),
centroids_milk_groceries[:,0].max() + offset * centroids_milk_groceries[:,0].ptp(),)
plt.ylim(centroids_milk_groceries[:,1].min() - offset * centroids_milk_groceries[:,1].ptp(),
centroids_milk_groceries[:,1].max() + offset * centroids_milk_groceries[:,1].ptp())
plt.title('Centroids of clusters for milk and groceries')
plt.show()

复制代码

地板

MouJack007 发表于 2017-7-7 23:01:10

import numpy as np
from sklearn import preprocessing
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
input_file = 'car.data.txt'
# Reading the data
X = []
y = []
count = 0
with open(input_file, 'r') as f:
for line in f.readlines():
data = line[:-1].split(',')
X.append(data)
X = np.array(X)
# Convert string data to numerical data
label_encoder = []
X_encoded = np.empty(X.shape)
for i,item in enumerate(X[0]):
label_encoder.append(preprocessing.LabelEncoder())
X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])
X = X_encoded[:, :-1].astype(int)
y = X_encoded[:, -1].astype(int)
# Build a Random Forest classifier
params = {'n_estimators': 200, 'max_depth': 8, 'random_state': 7}
classifier = RandomForestClassifier(**params)
classifier.fit(X, y)
# Cross validation
from sklearn import cross_validation
accuracy = cross_validation.cross_val_score(classifier,
X, y, scoring='accuracy', cv=3)
print "Accuracy of the classifier: " + str(round(100*accuracy.mean(), 2)) + "%"
# Testing encoding on single data instance
input_data = ['vhigh', 'vhigh', '2', '2', 'small', 'low']
input_data_encoded = [-1] * len(input_data)
for i,item in enumerate(input_data):
input_data_encoded[i] = int(label_encoder[i].transform(input_data[i]))
input_data_encoded = np.array(input_data_encoded)
# Predict and print output for a particular datapoint
output_class = classifier.predict(input_data_encoded)
print "Output class:", label_encoder[-1].inverse_transform(output_class)[0]
########################
# Validation curves
from sklearn.learning_curve import validation_curve
classifier = RandomForestClassifier(max_depth=4, random_state=7)
parameter_grid = np.linspace(25, 200, 8).astype(int)
train_scores, validation_scores = validation_curve(classifier, X, y,
"n_estimators", parameter_grid, cv=5)
print "\n##### VALIDATION CURVES #####"
print "\nParam: n_estimators\nTraining scores:\n", train_scores
print "\nParam: n_estimators\nValidation scores:\n", validation_scores
# Plot the curve
plt.figure()
plt.plot(parameter_grid, 100*np.average(train_scores, axis=1), color='black')
plt.title('Training curve')
plt.xlabel('Number of estimators')
plt.ylabel('Accuracy')
plt.show()
classifier = RandomForestClassifier(n_estimators=20, random_state=7)
parameter_grid = np.linspace(2, 10, 5).astype(int)
train_scores, valid_scores = validation_curve(classifier, X, y,
"max_depth", parameter_grid, cv=5)
print "\nParam: max_depth\nTraining scores:\n", train_scores
print "\nParam: max_depth\nValidation scores:\n", validation_scores
# Plot the curve
plt.figure()
plt.plot(parameter_grid, 100*np.average(train_scores, axis=1), color='black')
plt.title('Validation curve')
plt.xlabel('Maximum depth of the tree')
plt.ylabel('Accuracy')
plt.show()
########################
# Learning curves
from sklearn.learning_curve import learning_curve
classifier = RandomForestClassifier(random_state=7)
parameter_grid = np.array([200, 500, 800, 1100])
train_sizes, train_scores, validation_scores = learning_curve(classifier,
X, y, train_sizes=parameter_grid, cv=5)
print "\n##### LEARNING CURVES #####"
print "\nTraining scores:\n", train_scores
print "\nValidation scores:\n", validation_scores
# Plot the curve
plt.figure()
plt.plot(parameter_grid, 100*np.average(train_scores, axis=1), color='black')
plt.title('Learning curve')
plt.xlabel('Number of training samples')
plt.ylabel('Accuracy')
plt.show()

复制代码

7楼

MouJack007 发表于 2017-7-7 23:02:24

import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
def plot_classifier(classifier, X, y):
# define ranges to plot the figure
x_min, x_max = min(X[:, 0]) - 1.0, max(X[:, 0]) + 1.0
y_min, y_max = min(X[:, 1]) - 1.0, max(X[:, 1]) + 1.0
# denotes the step size that will be used in the mesh grid
step_size = 0.01
# define the mesh grid
x_values, y_values = np.meshgrid(np.arange(x_min, x_max, step_size), np.arange(y_min, y_max, step_size))
# compute the classifier output
mesh_output = classifier.predict(np.c_[x_values.ravel(), y_values.ravel()])
# reshape the array
mesh_output = mesh_output.reshape(x_values.shape)
# Plot the output using a colored plot
plt.figure()
# choose a color scheme you can find all the options
# here: http://matplotlib.org/examples/color/colormaps_reference.html
plt.pcolormesh(x_values, y_values, mesh_output, cmap=plt.cm.gray)
# Overlay the training points on the plot
plt.scatter(X[:, 0], X[:, 1], c=y, s=80, edgecolors='black', linewidth=1, cmap=plt.cm.Paired)
# specify the boundaries of the figure
plt.xlim(x_values.min(), x_values.max())
plt.ylim(y_values.min(), y_values.max())
# specify the ticks on the X and Y axes
plt.xticks((np.arange(int(min(X[:, 0])-1), int(max(X[:, 0])+1), 1.0)))
plt.yticks((np.arange(int(min(X[:, 1])-1), int(max(X[:, 1])+1), 1.0)))
plt.show()
if __name__=='__main__':
# input data
X = np.array([[4, 7], [3.5, 8], [3.1, 6.2], [0.5, 1], [1, 2], [1.2, 1.9], [6, 2], [5.7, 1.5], [5.4, 2.2]])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
# initialize the logistic regression classifier
classifier = linear_model.LogisticRegression(solver='liblinear', C=100)
# train the classifier
classifier.fit(X, y)
# draw datapoints and boundaries
plot_classifier(classifier, X, y)

复制代码

8楼

zwzhai 发表于 2017-7-7 23:08:23

import numpy as np
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB
from logistic_regression import plot_classifier
input_file = 'data_multivar.txt'
X = []
y = []
with open(input_file, 'r') as f:
for line in f.readlines():
data = [float(x) for x in line.split(',')]
X.append(data[:-1])
y.append(data[-1])
X = np.array(X)
y = np.array(y)
classifier_gaussiannb = GaussianNB()
classifier_gaussiannb.fit(X, y)
y_pred = classifier_gaussiannb.predict(X)
# compute accuracy of the classifier
accuracy = 100.0 * (y == y_pred).sum() / X.shape[0]
print "Accuracy of the classifier =", round(accuracy, 2), "%"
plot_classifier(classifier_gaussiannb, X, y)
###############################################
# Train test split
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.25, random_state=5)
classifier_gaussiannb_new = GaussianNB()
classifier_gaussiannb_new.fit(X_train, y_train)
y_test_pred = classifier_gaussiannb_new.predict(X_test)
# compute accuracy of the classifier
accuracy = 100.0 * (y_test == y_test_pred).sum() / X_test.shape[0]
print "Accuracy of the classifier =", round(accuracy, 2), "%"
plot_classifier(classifier_gaussiannb_new, X_test, y_test)
###############################################
# Cross validation and scoring functions
num_validations = 5
accuracy = cross_validation.cross_val_score(classifier_gaussiannb,
X, y, scoring='accuracy', cv=num_validations)
print "Accuracy: " + str(round(100*accuracy.mean(), 2)) + "%"
f1 = cross_validation.cross_val_score(classifier_gaussiannb,
X, y, scoring='f1_weighted', cv=num_validations)
print "F1: " + str(round(100*f1.mean(), 2)) + "%"
precision = cross_validation.cross_val_score(classifier_gaussiannb,
X, y, scoring='precision_weighted', cv=num_validations)
print "Precision: " + str(round(100*precision.mean(), 2)) + "%"
recall = cross_validation.cross_val_score(classifier_gaussiannb,
X, y, scoring='recall_weighted', cv=num_validations)
print "Recall: " + str(round(100*recall.mean(), 2)) + "%"

复制代码

9楼

cloudoversea 发表于 2017-7-7 23:09:30

看看

10楼

钱学森64 发表于 2017-7-8 01:04:25

谢谢分享

[GitHub]Python Machine Learning Cookbook [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我拉你入群

相关帖子

浏览过的帖子

浏览过的版块

本版微信群

[GitHub]Python Machine Learning Cookbook [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我 拉你入群

相关帖子

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群