[GitHub]scikit-learn:Python Module for Machine Learning

1关注
62粉丝

VIP

已卖：4897份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库 其他...

R资源总汇

Panel Data Analysis

Experimental Design

0%

威望: 1 级
论坛币: 49635 个
通用积分: 55.6937
学术水平: 370 点
热心指数: 273 点
信用等级: 335 点
经验: 57805 点
帖子: 4005
精华: 21
在线时间: 582 小时
注册时间: 2005-5-8
最后登录: 2023-11-26

楼主

ReneeBK 发表于 2016-10-10 07:43:59 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

scikit-learn

scikit-learn is a Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license.

The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the AUTHORS.rst file for a complete list of contributors.

It is currently maintained by a team of volunteers.

Website: http://scikit-learn.org

Installation

Dependencies

Scikit-learn requires:

- Python (>= 2.6 or >= 3.3),
- NumPy (>= 1.6.1),
- SciPy (>= 0.9).
scikit-learn also uses CBLAS, the C interface to the Basic Linear Algebra Subprograms library. scikit-learn comes with a reference implementation, but the system CBLAS will be detected by the build system and used if present. CBLAS exists in many implementations; see Linear algebra libraries for known issues.

User installation

If you already have a working installation of numpy and scipy, the easiest way to install scikit-learn is using pip

pip install -U scikit-learn
or conda:

conda install scikit-learn
The documentation includes more detailed installation instructions.

Development

We welcome new contributors of all experience levels. The scikit-learn community goals are to be helpful, welcoming, and effective. The Contributor's Guide has detailed information about contributing code, documentation, tests, and more. We've included some basic information in this README.

Important links

Official source code repo: https://github.com/scikit-learn/scikit-learn
Download releases: http://sourceforge.net/projects/scikit-learn/files/
Issue tracker: https://github.com/scikit-learn/scikit-learn/issues
Source code

You can check the latest sources with the command:

git clone https://github.com/scikit-learn/scikit-learn.git
Setting up a development environment

Quick tutorial on how to go about setting up your environment to contribute to scikit-learn: https://github.com/scikit-learn/ ... ter/CONTRIBUTING.md

本帖隐藏的内容

scikit-learn-master.zip (5.86 MB, 需要: 5 个论坛币)

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：scikit-learn Learning earning machine module maintained volunteers currently learning Google

本帖被以下文库推荐

· 编程语言(Coding Languages)|主题: 3936, 订阅: 126

沙发

ReneeBK(未真实交易用户) 发表于 2016-10-10 07:44:38

"""
================================================================
Biclustering documents with the Spectral Co-clustering algorithm
================================================================
This example demonstrates the Spectral Co-clustering algorithm on the
twenty newsgroups dataset. The 'comp.os.ms-windows.misc' category is
excluded because it contains many posts containing nothing but data.
The TF-IDF vectorized posts form a word frequency matrix, which is
then biclustered using Dhillon's Spectral Co-Clustering algorithm. The
resulting document-word biclusters indicate subsets words used more
often in those subsets documents.
For a few of the best biclusters, its most common document categories
and its ten most important words get printed. The best biclusters are
determined by their normalized cut. The best words are determined by
comparing their sums inside and outside the bicluster.
For comparison, the documents are also clustered using
MiniBatchKMeans. The document clusters derived from the biclusters
achieve a better V-measure than clusters found by MiniBatchKMeans.
Output::
Vectorizing...
Coclustering...
Done in 9.53s. V-measure: 0.4455
MiniBatchKMeans...
Done in 12.00s. V-measure: 0.3309
Best biclusters:
----------------
bicluster 0 : 1951 documents, 4373 words
categories : 23% talk.politics.guns, 19% talk.politics.misc, 14% sci.med
words : gun, guns, geb, banks, firearms, drugs, gordon, clinton, cdt, amendment
bicluster 1 : 1165 documents, 3304 words
categories : 29% talk.politics.mideast, 26% soc.religion.christian, 25% alt.atheism
words : god, jesus, christians, atheists, kent, sin, morality, belief, resurrection, marriage
bicluster 2 : 2219 documents, 2830 words
categories : 18% comp.sys.mac.hardware, 16% comp.sys.ibm.pc.hardware, 16% comp.graphics
words : voltage, dsp, board, receiver, circuit, shipping, packages, stereo, compression, package
bicluster 3 : 1860 documents, 2745 words
categories : 26% rec.motorcycles, 23% rec.autos, 13% misc.forsale
words : bike, car, dod, engine, motorcycle, ride, honda, cars, bmw, bikes
bicluster 4 : 12 documents, 155 words
categories : 100% rec.sport.hockey
words : scorer, unassisted, reichel, semak, sweeney, kovalenko, ricci, audette, momesso, nedved
"""
from __future__ import print_function
print(__doc__)
from collections import defaultdict
import operator
import re
from time import time
import numpy as np
from sklearn.cluster.bicluster import SpectralCoclustering
from sklearn.cluster import MiniBatchKMeans
from sklearn.externals.six import iteritems
from sklearn.datasets.twenty_newsgroups import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.cluster import v_measure_score
def number_aware_tokenizer(doc):
""" Tokenizer that maps all numeric tokens to a placeholder.
For many applications, tokens that begin with a number are not directly
useful, but the fact that such a token exists can be relevant. By applying
this form of dimensionality reduction, some methods may perform better.
"""
token_pattern = re.compile(u'(?u)\\b\\w\\w+\\b')
tokens = token_pattern.findall(doc)
tokens = ["#NUMBER" if token[0] in "0123456789_" else token
for token in tokens]
return tokens
# exclude 'comp.os.ms-windows.misc'
categories = ['alt.atheism', 'comp.graphics',
'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware',
'comp.windows.x', 'misc.forsale', 'rec.autos',
'rec.motorcycles', 'rec.sport.baseball',
'rec.sport.hockey', 'sci.crypt', 'sci.electronics',
'sci.med', 'sci.space', 'soc.religion.christian',
'talk.politics.guns', 'talk.politics.mideast',
'talk.politics.misc', 'talk.religion.misc']
newsgroups = fetch_20newsgroups(categories=categories)
y_true = newsgroups.target
vectorizer = TfidfVectorizer(stop_words='english', min_df=5,
tokenizer=number_aware_tokenizer)
cocluster = SpectralCoclustering(n_clusters=len(categories),
svd_method='arpack', random_state=0)
kmeans = MiniBatchKMeans(n_clusters=len(categories), batch_size=20000,
random_state=0)
print("Vectorizing...")
X = vectorizer.fit_transform(newsgroups.data)
print("Coclustering...")
start_time = time()
cocluster.fit(X)
y_cocluster = cocluster.row_labels_
print("Done in {:.2f}s. V-measure: {:.4f}".format(
time() - start_time,
v_measure_score(y_cocluster, y_true)))
print("MiniBatchKMeans...")
start_time = time()
y_kmeans = kmeans.fit_predict(X)
print("Done in {:.2f}s. V-measure: {:.4f}".format(
time() - start_time,
v_measure_score(y_kmeans, y_true)))
feature_names = vectorizer.get_feature_names()
document_names = list(newsgroups.target_names[i] for i in newsgroups.target)
def bicluster_ncut(i):
rows, cols = cocluster.get_indices(i)
if not (np.any(rows) and np.any(cols)):
import sys
return sys.float_info.max
row_complement = np.nonzero(np.logical_not(cocluster.rows_[i]))[0]
col_complement = np.nonzero(np.logical_not(cocluster.columns_[i]))[0]
# Note: the following is identical to X[rows[:, np.newaxis], cols].sum() but
# much faster in scipy <= 0.16
weight = X[rows][:, cols].sum()
cut = (X[row_complement][:, cols].sum() +
X[rows][:, col_complement].sum())
return cut / weight
def most_common(d):
"""Items of a defaultdict(int) with the highest values.
Like Counter.most_common in Python >=2.7.
"""
return sorted(iteritems(d), key=operator.itemgetter(1), reverse=True)
bicluster_ncuts = list(bicluster_ncut(i)
for i in range(len(newsgroups.target_names)))
best_idx = np.argsort(bicluster_ncuts)[:5]
print()
print("Best biclusters:")
print("----------------")
for idx, cluster in enumerate(best_idx):
n_rows, n_cols = cocluster.get_shape(cluster)
cluster_docs, cluster_words = cocluster.get_indices(cluster)
if not len(cluster_docs) or not len(cluster_words):
continue
# categories
counter = defaultdict(int)
for i in cluster_docs:
counter[document_names[i]] += 1
cat_string = ", ".join("{:.0f}% {}".format(float(c) / n_rows * 100, name)
for name, c in most_common(counter)[:3])
# words
out_of_cluster_docs = cocluster.row_labels_ != cluster
out_of_cluster_docs = np.where(out_of_cluster_docs)[0]
word_col = X[:, cluster_words]
word_scores = np.array(word_col[cluster_docs, :].sum(axis=0) -
word_col[out_of_cluster_docs, :].sum(axis=0))
word_scores = word_scores.ravel()
important_words = list(feature_names[cluster_words[i]]
for i in word_scores.argsort()[:-11:-1])
print("bicluster {} : {} documents, {} words".format(
idx, n_rows, n_cols))
print("categories : {}".format(cat_string))
print("words : {}\n".format(', '.join(important_words)))

复制代码

藤椅

ReneeBK(未真实交易用户) 发表于 2016-10-10 07:45:46

"""
==================
Pipeline Anova SVM
==================
Simple usage of Pipeline that runs successively a univariate
feature selection with anova and then a C-SVM of the selected features.
"""
print(__doc__)
from sklearn import svm
from sklearn.datasets import samples_generator
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.pipeline import make_pipeline
# import some data to play with
X, y = samples_generator.make_classification(
n_features=20, n_informative=3, n_redundant=0, n_classes=4,
n_clusters_per_class=2)
# ANOVA SVM-C
# 1) anova filter, take 3 best ranked features
anova_filter = SelectKBest(f_regression, k=3)
# 2) svm
clf = svm.SVC(kernel='linear')
anova_svm = make_pipeline(anova_filter, clf)
anova_svm.fit(X, y)
anova_svm.predict(X)

复制代码

板凳

ReneeBK(未真实交易用户) 发表于 2016-10-10 07:47:09

"""
==============================
Lasso on dense and sparse data
==============================
We show that linear_model.Lasso provides the same results for dense and sparse
data and that in the case of sparse data the speed is improved.
"""
print(__doc__)
from time import time
from scipy import sparse
from scipy import linalg
from sklearn.datasets.samples_generator import make_regression
from sklearn.linear_model import Lasso
###############################################################################
# The two Lasso implementations on Dense data
print("--- Dense matrices")
X, y = make_regression(n_samples=200, n_features=5000, random_state=0)
X_sp = sparse.coo_matrix(X)
alpha = 1
sparse_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=1000)
dense_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=1000)
t0 = time()
sparse_lasso.fit(X_sp, y)
print("Sparse Lasso done in %fs" % (time() - t0))
t0 = time()
dense_lasso.fit(X, y)
print("Dense Lasso done in %fs" % (time() - t0))
print("Distance between coefficients : %s"
% linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_))
###############################################################################
# The two Lasso implementations on Sparse data
print("--- Sparse matrices")
Xs = X.copy()
Xs[Xs < 2.5] = 0.0
Xs = sparse.coo_matrix(Xs)
Xs = Xs.tocsc()
print("Matrix density : %s %%" % (Xs.nnz / float(X.size) * 100))
alpha = 0.1
sparse_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=10000)
dense_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=10000)
t0 = time()
sparse_lasso.fit(Xs, y)
print("Sparse Lasso done in %fs" % (time() - t0))
t0 = time()
dense_lasso.fit(Xs.toarray(), y)
print("Dense Lasso done in %fs" % (time() - t0))
print("Distance between coefficients : %s"
% linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_))

复制代码

报纸

ReneeBK(未真实交易用户) 发表于 2016-10-10 07:48:12

"""
============================================================
Parameter estimation using grid search with cross-validation
============================================================
This examples shows how a classifier is optimized by cross-validation,
which is done using the :class:`sklearn.model_selection.GridSearchCV` object
on a development set that comprises only half of the available labeled data.
The performance of the selected hyper-parameters and trained model is
then measured on a dedicated evaluation set that was not used during
the model selection step.
More details on tools available for model selection can be found in the
sections on :ref:`cross_validation` and :ref:`grid_search`.
"""
from __future__ import print_function
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.svm import SVC
print(__doc__)
# Loading the Digits dataset
digits = datasets.load_digits()
# To apply an classifier on this data, we need to flatten the image, to
# turn the data in a (samples, feature) matrix:
n_samples = len(digits.images)
X = digits.images.reshape((n_samples, -1))
y = digits.target
# Split the dataset in two equal parts
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=0)
# Set the parameters by cross-validation
tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
'C': [1, 10, 100, 1000]},
{'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]
scores = ['precision', 'recall']
for score in scores:
print("# Tuning hyper-parameters for %s" % score)
print()
clf = GridSearchCV(SVC(C=1), tuned_parameters, cv=5,
scoring='%s_macro' % score)
clf.fit(X_train, y_train)
print("Best parameters set found on development set:")
print()
print(clf.best_params_)
print()
print("Grid scores on development set:")
print()
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
print("%0.3f (+/-%0.03f) for %r"
% (mean, std * 2, params))
print()
print("Detailed classification report:")
print()
print("The model is trained on the full development set.")
print("The scores are computed on the full evaluation set.")
print()
y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))
print()
# Note the problem is too easy: the hyperparameter plateau is too flat and the
# output model is the same for precision and recall with ties in quality.

复制代码

地板

ReneeBK(未真实交易用户) 发表于 2016-10-10 07:50:00

"""
=================================================
Hyper-parameters of Approximate Nearest Neighbors
=================================================
This example demonstrates the behaviour of the
accuracy of the nearest neighbor queries of Locality Sensitive Hashing
Forest as the number of candidates and the number of estimators (trees)
vary.
In the first plot, accuracy is measured with the number of candidates. Here,
the term "number of candidates" refers to maximum bound for the number of
distinct points retrieved from each tree to calculate the distances. Nearest
neighbors are selected from this pool of candidates. Number of estimators is
maintained at three fixed levels (1, 5, 10).
In the second plot, the number of candidates is fixed at 50. Number of trees
is varied and the accuracy is plotted against those values. To measure the
accuracy, the true nearest neighbors are required, therefore
:class:`sklearn.neighbors.NearestNeighbors` is used to compute the exact
neighbors.
"""
from __future__ import division
print(__doc__)
# Author: Maheshakya Wijewardena <maheshakya.10@cse.mrt.ac.lk>
#
# License: BSD 3 clause
###############################################################################
import numpy as np
from sklearn.datasets.samples_generator import make_blobs
from sklearn.neighbors import LSHForest
from sklearn.neighbors import NearestNeighbors
import matplotlib.pyplot as plt
# Initialize size of the database, iterations and required neighbors.
n_samples = 10000
n_features = 100
n_queries = 30
rng = np.random.RandomState(42)
# Generate sample data
X, _ = make_blobs(n_samples=n_samples + n_queries,
n_features=n_features, centers=10,
random_state=0)
X_index = X[:n_samples]
X_query = X[n_samples:]
# Get exact neighbors
nbrs = NearestNeighbors(n_neighbors=1, algorithm='brute',
metric='cosine').fit(X_index)
neighbors_exact = nbrs.kneighbors(X_query, return_distance=False)
# Set `n_candidate` values
n_candidates_values = np.linspace(10, 500, 5).astype(np.int)
n_estimators_for_candidate_value = [1, 5, 10]
n_iter = 10
stds_accuracies = np.zeros((len(n_estimators_for_candidate_value),
n_candidates_values.shape[0]),
dtype=float)
accuracies_c = np.zeros((len(n_estimators_for_candidate_value),
n_candidates_values.shape[0]), dtype=float)
# LSH Forest is a stochastic index: perform several iteration to estimate
# expected accuracy and standard deviation displayed as error bars in
# the plots
for j, value in enumerate(n_estimators_for_candidate_value):
for i, n_candidates in enumerate(n_candidates_values):
accuracy_c = []
for seed in range(n_iter):
lshf = LSHForest(n_estimators=value,
n_candidates=n_candidates, n_neighbors=1,
random_state=seed)
# Build the LSH Forest index
lshf.fit(X_index)
# Get neighbors
neighbors_approx = lshf.kneighbors(X_query,
return_distance=False)
accuracy_c.append(np.sum(np.equal(neighbors_approx,
neighbors_exact)) /
n_queries)
stds_accuracies[j, i] = np.std(accuracy_c)
accuracies_c[j, i] = np.mean(accuracy_c)
# Set `n_estimators` values
n_estimators_values = [1, 5, 10, 20, 30, 40, 50]
accuracies_trees = np.zeros(len(n_estimators_values), dtype=float)
# Calculate average accuracy for each value of `n_estimators`
for i, n_estimators in enumerate(n_estimators_values):
lshf = LSHForest(n_estimators=n_estimators, n_neighbors=1)
# Build the LSH Forest index
lshf.fit(X_index)
# Get neighbors
neighbors_approx = lshf.kneighbors(X_query, return_distance=False)
accuracies_trees[i] = np.sum(np.equal(neighbors_approx,
neighbors_exact))/n_queries
###############################################################################
# Plot the accuracy variation with `n_candidates`
plt.figure()
colors = ['c', 'm', 'y']
for i, n_estimators in enumerate(n_estimators_for_candidate_value):
label = 'n_estimators = %d ' % n_estimators
plt.plot(n_candidates_values, accuracies_c[i, :],
'o-', c=colors[i], label=label)
plt.errorbar(n_candidates_values, accuracies_c[i, :],
stds_accuracies[i, :], c=colors[i])
plt.legend(loc='upper left', prop=dict(size='small'))
plt.ylim([0, 1.2])
plt.xlim(min(n_candidates_values), max(n_candidates_values))
plt.ylabel("Accuracy")
plt.xlabel("n_candidates")
plt.grid(which='both')
plt.title("Accuracy variation with n_candidates")
# Plot the accuracy variation with `n_estimators`
plt.figure()
plt.scatter(n_estimators_values, accuracies_trees, c='k')
plt.plot(n_estimators_values, accuracies_trees, c='g')
plt.ylim([0, 1.2])
plt.xlim(min(n_estimators_values), max(n_estimators_values))
plt.ylabel("Accuracy")
plt.xlabel("n_estimators")
plt.grid(which='both')
plt.title("Accuracy variation with n_estimators")
plt.show()

复制代码

7楼

ReneeBK(未真实交易用户) 发表于 2016-10-10 07:50:47

"""
================================================
Varying regularization in Multi-layer Perceptron
================================================
A comparison of different values for regularization parameter 'alpha' on
synthetic datasets. The plot shows that different alphas yield different
decision functions.
Alpha is a parameter for regularization term, aka penalty term, that combats
overfitting by constraining the size of the weights. Increasing alpha may fix
high variance (a sign of overfitting) by encouraging smaller weights, resulting
in a decision boundary plot that appears with lesser curvatures.
Similarly, decreasing alpha may fix high bias (a sign of underfitting) by
encouraging larger weights, potentially resulting in a more complicated
decision boundary.
"""
print(__doc__)
# Author: Issam H. Laradji
# License: BSD 3 clause
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.neural_network import MLPClassifier
h = .02 # step size in the mesh
alphas = np.logspace(-5, 3, 5)
names = []
for i in alphas:
names.append('alpha ' + str(i))
classifiers = []
for i in alphas:
classifiers.append(MLPClassifier(alpha=i, random_state=1))
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
random_state=0, n_clusters_per_class=1)
rng = np.random.RandomState(2)
X += 2 * rng.uniform(size=X.shape)
linearly_separable = (X, y)
datasets = [make_moons(noise=0.3, random_state=0),
make_circles(noise=0.2, factor=0.5, random_state=1),
linearly_separable]
figure = plt.figure(figsize=(17, 9))
i = 1
# iterate over datasets
for X, y in datasets:
# preprocess dataset, split into training and test part
X = StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4)
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
# just plot the dataset first
cm = plt.cm.RdBu
cm_bright = ListedColormap(['#FF0000', '#0000FF'])
ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
# Plot the training points
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)
# and testing points
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6)
ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xticks(())
ax.set_yticks(())
i += 1
# iterate over classifiers
for name, clf in zip(names, classifiers):
ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
if hasattr(clf, "decision_function"):
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
else:
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
# Put the result into a color plot
Z = Z.reshape(xx.shape)
ax.contourf(xx, yy, Z, cmap=cm, alpha=.8)
# Plot also the training points
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)
# and testing points
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright,
alpha=0.6)
ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(name)
ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip('0'),
size=15, horizontalalignment='right')
i += 1
figure.subplots_adjust(left=.02, right=.98)
plt.show()

复制代码

8楼

ReneeBK(未真实交易用户) 发表于 2016-10-10 07:52:38

"""
==============
Non-linear SVM
==============
Perform binary classification using non-linear SVC
with RBF kernel. The target to predict is a XOR of the
inputs.
The color map illustrates the decision function learned by the SVC.
"""
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
xx, yy = np.meshgrid(np.linspace(-3, 3, 500),
np.linspace(-3, 3, 500))
np.random.seed(0)
X = np.random.randn(300, 2)
Y = np.logical_xor(X[:, 0] > 0, X[:, 1] > 0)
# fit the model
clf = svm.NuSVC()
clf.fit(X, Y)
# plot the decision function for each datapoint on the grid
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.imshow(Z, interpolation='nearest',
extent=(xx.min(), xx.max(), yy.min(), yy.max()), aspect='auto',
origin='lower', cmap=plt.cm.PuOr_r)
contours = plt.contour(xx, yy, Z, levels=[0], linewidths=2,
linetypes='--')
plt.scatter(X[:, 0], X[:, 1], s=30, c=Y, cmap=plt.cm.Paired)
plt.xticks(())
plt.yticks(())
plt.axis([-3, 3, -3, 3])
plt.show()

复制代码

9楼

ReneeBK(未真实交易用户) 发表于 2016-10-10 07:53:27

"""
===================================================================
Support Vector Regression (SVR) using linear and non-linear kernels
===================================================================
Toy example of 1D regression using linear, polynomial and RBF kernels.
"""
print(__doc__)
import numpy as np
from sklearn.svm import SVR
import matplotlib.pyplot as plt
###############################################################################
# Generate sample data
X = np.sort(5 * np.random.rand(40, 1), axis=0)
y = np.sin(X).ravel()
###############################################################################
# Add noise to targets
y[::5] += 3 * (0.5 - np.random.rand(8))
###############################################################################
# Fit regression model
svr_rbf = SVR(kernel='rbf', C=1e3, gamma=0.1)
svr_lin = SVR(kernel='linear', C=1e3)
svr_poly = SVR(kernel='poly', C=1e3, degree=2)
y_rbf = svr_rbf.fit(X, y).predict(X)
y_lin = svr_lin.fit(X, y).predict(X)
y_poly = svr_poly.fit(X, y).predict(X)
###############################################################################
# look at the results
lw = 2
plt.scatter(X, y, color='darkorange', label='data')
plt.hold('on')
plt.plot(X, y_rbf, color='navy', lw=lw, label='RBF model')
plt.plot(X, y_lin, color='c', lw=lw, label='Linear model')
plt.plot(X, y_poly, color='cornflowerblue', lw=lw, label='Polynomial model')
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.show()

复制代码

10楼

fengyg(未真实交易用户)

发表于 2016-10-10 07:56:54

kankan

[GitHub]scikit-learn:Python Module for Machine Learning [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

[GitHub]scikit-learn:Python Module for Machine Learning [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我 拉你入群

相关帖子

本帖被以下文库推荐

浏览过的帖子

浏览过的版块

本版微信群

扫码加我拉你入群