楼主: ReneeBK
3426 12

[GitHub]scikit-learn:Python Module for Machine Learning [推广有奖]

  • 1关注
  • 62粉丝

VIP

已卖:4897份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49635 个
通用积分
55.6937
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57805 点
帖子
4005
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

楼主
ReneeBK 发表于 2016-10-10 07:43:59 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
scikit-learn

scikit-learn is a Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license.

The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the AUTHORS.rst file for a complete list of contributors.

It is currently maintained by a team of volunteers.

Website: http://scikit-learn.org

Installation

Dependencies

Scikit-learn requires:

- Python (>= 2.6 or >= 3.3),
- NumPy (>= 1.6.1),
- SciPy (>= 0.9).
scikit-learn also uses CBLAS, the C interface to the Basic Linear Algebra Subprograms library. scikit-learn comes with a reference implementation, but the system CBLAS will be detected by the build system and used if present. CBLAS exists in many implementations; see Linear algebra libraries for known issues.

User installation

If you already have a working installation of numpy and scipy, the easiest way to install scikit-learn is using pip

pip install -U scikit-learn
or conda:

conda install scikit-learn
The documentation includes more detailed installation instructions.

Development

We welcome new contributors of all experience levels. The scikit-learn community goals are to be helpful, welcoming, and effective. The Contributor's Guide has detailed information about contributing code, documentation, tests, and more. We've included some basic information in this README.

Important links

Official source code repo: https://github.com/scikit-learn/scikit-learn
Download releases: http://sourceforge.net/projects/scikit-learn/files/
Issue tracker: https://github.com/scikit-learn/scikit-learn/issues
Source code

You can check the latest sources with the command:

git clone https://github.com/scikit-learn/scikit-learn.git
Setting up a development environment

Quick tutorial on how to go about setting up your environment to contribute to scikit-learn: https://github.com/scikit-learn/ ... ter/CONTRIBUTING.md

本帖隐藏的内容

scikit-learn-master.zip (5.86 MB, 需要: 5 个论坛币)



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:scikit-learn Learning earning machine module maintained volunteers currently learning Google

本帖被以下文库推荐

沙发
ReneeBK(未真实交易用户) 发表于 2016-10-10 07:44:38
  1. """
  2. ================================================================
  3. Biclustering documents with the Spectral Co-clustering algorithm
  4. ================================================================

  5. This example demonstrates the Spectral Co-clustering algorithm on the
  6. twenty newsgroups dataset. The 'comp.os.ms-windows.misc' category is
  7. excluded because it contains many posts containing nothing but data.

  8. The TF-IDF vectorized posts form a word frequency matrix, which is
  9. then biclustered using Dhillon's Spectral Co-Clustering algorithm. The
  10. resulting document-word biclusters indicate subsets words used more
  11. often in those subsets documents.

  12. For a few of the best biclusters, its most common document categories
  13. and its ten most important words get printed. The best biclusters are
  14. determined by their normalized cut. The best words are determined by
  15. comparing their sums inside and outside the bicluster.

  16. For comparison, the documents are also clustered using
  17. MiniBatchKMeans. The document clusters derived from the biclusters
  18. achieve a better V-measure than clusters found by MiniBatchKMeans.

  19. Output::

  20.     Vectorizing...
  21.     Coclustering...
  22.     Done in 9.53s. V-measure: 0.4455
  23.     MiniBatchKMeans...
  24.     Done in 12.00s. V-measure: 0.3309

  25.     Best biclusters:
  26.     ----------------
  27.     bicluster 0 : 1951 documents, 4373 words
  28.     categories   : 23% talk.politics.guns, 19% talk.politics.misc, 14% sci.med
  29.     words        : gun, guns, geb, banks, firearms, drugs, gordon, clinton, cdt, amendment

  30.     bicluster 1 : 1165 documents, 3304 words
  31.     categories   : 29% talk.politics.mideast, 26% soc.religion.christian, 25% alt.atheism
  32.     words        : god, jesus, christians, atheists, kent, sin, morality, belief, resurrection, marriage

  33.     bicluster 2 : 2219 documents, 2830 words
  34.     categories   : 18% comp.sys.mac.hardware, 16% comp.sys.ibm.pc.hardware, 16% comp.graphics
  35.     words        : voltage, dsp, board, receiver, circuit, shipping, packages, stereo, compression, package

  36.     bicluster 3 : 1860 documents, 2745 words
  37.     categories   : 26% rec.motorcycles, 23% rec.autos, 13% misc.forsale
  38.     words        : bike, car, dod, engine, motorcycle, ride, honda, cars, bmw, bikes

  39.     bicluster 4 : 12 documents, 155 words
  40.     categories   : 100% rec.sport.hockey
  41.     words        : scorer, unassisted, reichel, semak, sweeney, kovalenko, ricci, audette, momesso, nedved

  42. """
  43. from __future__ import print_function

  44. print(__doc__)

  45. from collections import defaultdict
  46. import operator
  47. import re
  48. from time import time

  49. import numpy as np

  50. from sklearn.cluster.bicluster import SpectralCoclustering
  51. from sklearn.cluster import MiniBatchKMeans
  52. from sklearn.externals.six import iteritems
  53. from sklearn.datasets.twenty_newsgroups import fetch_20newsgroups
  54. from sklearn.feature_extraction.text import TfidfVectorizer
  55. from sklearn.metrics.cluster import v_measure_score


  56. def number_aware_tokenizer(doc):
  57.     """ Tokenizer that maps all numeric tokens to a placeholder.

  58.     For many applications, tokens that begin with a number are not directly
  59.     useful, but the fact that such a token exists can be relevant.  By applying
  60.     this form of dimensionality reduction, some methods may perform better.
  61.     """
  62.     token_pattern = re.compile(u'(?u)\\b\\w\\w+\\b')
  63.     tokens = token_pattern.findall(doc)
  64.     tokens = ["#NUMBER" if token[0] in "0123456789_" else token
  65.               for token in tokens]
  66.     return tokens

  67. # exclude 'comp.os.ms-windows.misc'
  68. categories = ['alt.atheism', 'comp.graphics',
  69.               'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware',
  70.               'comp.windows.x', 'misc.forsale', 'rec.autos',
  71.               'rec.motorcycles', 'rec.sport.baseball',
  72.               'rec.sport.hockey', 'sci.crypt', 'sci.electronics',
  73.               'sci.med', 'sci.space', 'soc.religion.christian',
  74.               'talk.politics.guns', 'talk.politics.mideast',
  75.               'talk.politics.misc', 'talk.religion.misc']
  76. newsgroups = fetch_20newsgroups(categories=categories)
  77. y_true = newsgroups.target

  78. vectorizer = TfidfVectorizer(stop_words='english', min_df=5,
  79.                              tokenizer=number_aware_tokenizer)
  80. cocluster = SpectralCoclustering(n_clusters=len(categories),
  81.                                  svd_method='arpack', random_state=0)
  82. kmeans = MiniBatchKMeans(n_clusters=len(categories), batch_size=20000,
  83.                          random_state=0)

  84. print("Vectorizing...")
  85. X = vectorizer.fit_transform(newsgroups.data)

  86. print("Coclustering...")
  87. start_time = time()
  88. cocluster.fit(X)
  89. y_cocluster = cocluster.row_labels_
  90. print("Done in {:.2f}s. V-measure: {:.4f}".format(
  91.     time() - start_time,
  92.     v_measure_score(y_cocluster, y_true)))

  93. print("MiniBatchKMeans...")
  94. start_time = time()
  95. y_kmeans = kmeans.fit_predict(X)
  96. print("Done in {:.2f}s. V-measure: {:.4f}".format(
  97.     time() - start_time,
  98.     v_measure_score(y_kmeans, y_true)))

  99. feature_names = vectorizer.get_feature_names()
  100. document_names = list(newsgroups.target_names[i] for i in newsgroups.target)


  101. def bicluster_ncut(i):
  102.     rows, cols = cocluster.get_indices(i)
  103.     if not (np.any(rows) and np.any(cols)):
  104.         import sys
  105.         return sys.float_info.max
  106.     row_complement = np.nonzero(np.logical_not(cocluster.rows_[i]))[0]
  107.     col_complement = np.nonzero(np.logical_not(cocluster.columns_[i]))[0]
  108.     # Note: the following is identical to X[rows[:, np.newaxis], cols].sum() but
  109.     # much faster in scipy <= 0.16
  110.     weight = X[rows][:, cols].sum()
  111.     cut = (X[row_complement][:, cols].sum() +
  112.            X[rows][:, col_complement].sum())
  113.     return cut / weight


  114. def most_common(d):
  115.     """Items of a defaultdict(int) with the highest values.

  116.     Like Counter.most_common in Python >=2.7.
  117.     """
  118.     return sorted(iteritems(d), key=operator.itemgetter(1), reverse=True)


  119. bicluster_ncuts = list(bicluster_ncut(i)
  120.                        for i in range(len(newsgroups.target_names)))
  121. best_idx = np.argsort(bicluster_ncuts)[:5]

  122. print()
  123. print("Best biclusters:")
  124. print("----------------")
  125. for idx, cluster in enumerate(best_idx):
  126.     n_rows, n_cols = cocluster.get_shape(cluster)
  127.     cluster_docs, cluster_words = cocluster.get_indices(cluster)
  128.     if not len(cluster_docs) or not len(cluster_words):
  129.         continue

  130.     # categories
  131.     counter = defaultdict(int)
  132.     for i in cluster_docs:
  133.         counter[document_names[i]] += 1
  134.     cat_string = ", ".join("{:.0f}% {}".format(float(c) / n_rows * 100, name)
  135.                            for name, c in most_common(counter)[:3])

  136.     # words
  137.     out_of_cluster_docs = cocluster.row_labels_ != cluster
  138.     out_of_cluster_docs = np.where(out_of_cluster_docs)[0]
  139.     word_col = X[:, cluster_words]
  140.     word_scores = np.array(word_col[cluster_docs, :].sum(axis=0) -
  141.                            word_col[out_of_cluster_docs, :].sum(axis=0))
  142.     word_scores = word_scores.ravel()
  143.     important_words = list(feature_names[cluster_words[i]]
  144.                            for i in word_scores.argsort()[:-11:-1])

  145.     print("bicluster {} : {} documents, {} words".format(
  146.         idx, n_rows, n_cols))
  147.     print("categories   : {}".format(cat_string))
  148.     print("words        : {}\n".format(', '.join(important_words)))
复制代码

藤椅
ReneeBK(未真实交易用户) 发表于 2016-10-10 07:45:46
  1. """
  2. ==================
  3. Pipeline Anova SVM
  4. ==================

  5. Simple usage of Pipeline that runs successively a univariate
  6. feature selection with anova and then a C-SVM of the selected features.
  7. """
  8. print(__doc__)

  9. from sklearn import svm
  10. from sklearn.datasets import samples_generator
  11. from sklearn.feature_selection import SelectKBest, f_regression
  12. from sklearn.pipeline import make_pipeline

  13. # import some data to play with
  14. X, y = samples_generator.make_classification(
  15.     n_features=20, n_informative=3, n_redundant=0, n_classes=4,
  16.     n_clusters_per_class=2)

  17. # ANOVA SVM-C
  18. # 1) anova filter, take 3 best ranked features
  19. anova_filter = SelectKBest(f_regression, k=3)
  20. # 2) svm
  21. clf = svm.SVC(kernel='linear')

  22. anova_svm = make_pipeline(anova_filter, clf)
  23. anova_svm.fit(X, y)
  24. anova_svm.predict(X)
复制代码

板凳
ReneeBK(未真实交易用户) 发表于 2016-10-10 07:47:09
  1. """
  2. ==============================
  3. Lasso on dense and sparse data
  4. ==============================

  5. We show that linear_model.Lasso provides the same results for dense and sparse
  6. data and that in the case of sparse data the speed is improved.

  7. """
  8. print(__doc__)

  9. from time import time
  10. from scipy import sparse
  11. from scipy import linalg

  12. from sklearn.datasets.samples_generator import make_regression
  13. from sklearn.linear_model import Lasso


  14. ###############################################################################
  15. # The two Lasso implementations on Dense data
  16. print("--- Dense matrices")

  17. X, y = make_regression(n_samples=200, n_features=5000, random_state=0)
  18. X_sp = sparse.coo_matrix(X)

  19. alpha = 1
  20. sparse_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=1000)
  21. dense_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=1000)

  22. t0 = time()
  23. sparse_lasso.fit(X_sp, y)
  24. print("Sparse Lasso done in %fs" % (time() - t0))

  25. t0 = time()
  26. dense_lasso.fit(X, y)
  27. print("Dense Lasso done in %fs" % (time() - t0))

  28. print("Distance between coefficients : %s"
  29.       % linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_))

  30. ###############################################################################
  31. # The two Lasso implementations on Sparse data
  32. print("--- Sparse matrices")

  33. Xs = X.copy()
  34. Xs[Xs < 2.5] = 0.0
  35. Xs = sparse.coo_matrix(Xs)
  36. Xs = Xs.tocsc()

  37. print("Matrix density : %s %%" % (Xs.nnz / float(X.size) * 100))

  38. alpha = 0.1
  39. sparse_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=10000)
  40. dense_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=10000)

  41. t0 = time()
  42. sparse_lasso.fit(Xs, y)
  43. print("Sparse Lasso done in %fs" % (time() - t0))

  44. t0 = time()
  45. dense_lasso.fit(Xs.toarray(), y)
  46. print("Dense Lasso done in %fs" % (time() - t0))

  47. print("Distance between coefficients : %s"
  48.       % linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_))
复制代码

报纸
ReneeBK(未真实交易用户) 发表于 2016-10-10 07:48:12
  1. """
  2. ============================================================
  3. Parameter estimation using grid search with cross-validation
  4. ============================================================

  5. This examples shows how a classifier is optimized by cross-validation,
  6. which is done using the :class:`sklearn.model_selection.GridSearchCV` object
  7. on a development set that comprises only half of the available labeled data.

  8. The performance of the selected hyper-parameters and trained model is
  9. then measured on a dedicated evaluation set that was not used during
  10. the model selection step.

  11. More details on tools available for model selection can be found in the
  12. sections on :ref:`cross_validation` and :ref:`grid_search`.

  13. """

  14. from __future__ import print_function

  15. from sklearn import datasets
  16. from sklearn.model_selection import train_test_split
  17. from sklearn.model_selection import GridSearchCV
  18. from sklearn.metrics import classification_report
  19. from sklearn.svm import SVC

  20. print(__doc__)

  21. # Loading the Digits dataset
  22. digits = datasets.load_digits()

  23. # To apply an classifier on this data, we need to flatten the image, to
  24. # turn the data in a (samples, feature) matrix:
  25. n_samples = len(digits.images)
  26. X = digits.images.reshape((n_samples, -1))
  27. y = digits.target

  28. # Split the dataset in two equal parts
  29. X_train, X_test, y_train, y_test = train_test_split(
  30.     X, y, test_size=0.5, random_state=0)

  31. # Set the parameters by cross-validation
  32. tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
  33.                      'C': [1, 10, 100, 1000]},
  34.                     {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

  35. scores = ['precision', 'recall']

  36. for score in scores:
  37.     print("# Tuning hyper-parameters for %s" % score)
  38.     print()

  39.     clf = GridSearchCV(SVC(C=1), tuned_parameters, cv=5,
  40.                        scoring='%s_macro' % score)
  41.     clf.fit(X_train, y_train)

  42.     print("Best parameters set found on development set:")
  43.     print()
  44.     print(clf.best_params_)
  45.     print()
  46.     print("Grid scores on development set:")
  47.     print()
  48.     means = clf.cv_results_['mean_test_score']
  49.     stds = clf.cv_results_['std_test_score']
  50.     for mean, std, params in zip(means, stds, clf.cv_results_['params']):
  51.         print("%0.3f (+/-%0.03f) for %r"
  52.               % (mean, std * 2, params))
  53.     print()

  54.     print("Detailed classification report:")
  55.     print()
  56.     print("The model is trained on the full development set.")
  57.     print("The scores are computed on the full evaluation set.")
  58.     print()
  59.     y_true, y_pred = y_test, clf.predict(X_test)
  60.     print(classification_report(y_true, y_pred))
  61.     print()

  62. # Note the problem is too easy: the hyperparameter plateau is too flat and the
  63. # output model is the same for precision and recall with ties in quality.
复制代码

地板
ReneeBK(未真实交易用户) 发表于 2016-10-10 07:50:00
  1. """
  2. =================================================
  3. Hyper-parameters of Approximate Nearest Neighbors
  4. =================================================

  5. This example demonstrates the behaviour of the
  6. accuracy of the nearest neighbor queries of Locality Sensitive Hashing
  7. Forest as the number of candidates and the number of estimators (trees)
  8. vary.

  9. In the first plot, accuracy is measured with the number of candidates. Here,
  10. the term "number of candidates" refers to maximum bound for the number of
  11. distinct points retrieved from each tree to calculate the distances. Nearest
  12. neighbors are selected from this pool of candidates. Number of estimators is
  13. maintained at three fixed levels (1, 5, 10).

  14. In the second plot, the number of candidates is fixed at 50. Number of trees
  15. is varied and the accuracy is plotted against those values. To measure the
  16. accuracy, the true nearest neighbors are required, therefore
  17. :class:`sklearn.neighbors.NearestNeighbors` is used to compute the exact
  18. neighbors.
  19. """
  20. from __future__ import division
  21. print(__doc__)

  22. # Author: Maheshakya Wijewardena <maheshakya.10@cse.mrt.ac.lk>
  23. #
  24. # License: BSD 3 clause


  25. ###############################################################################
  26. import numpy as np
  27. from sklearn.datasets.samples_generator import make_blobs
  28. from sklearn.neighbors import LSHForest
  29. from sklearn.neighbors import NearestNeighbors
  30. import matplotlib.pyplot as plt


  31. # Initialize size of the database, iterations and required neighbors.
  32. n_samples = 10000
  33. n_features = 100
  34. n_queries = 30
  35. rng = np.random.RandomState(42)

  36. # Generate sample data
  37. X, _ = make_blobs(n_samples=n_samples + n_queries,
  38.                   n_features=n_features, centers=10,
  39.                   random_state=0)
  40. X_index = X[:n_samples]
  41. X_query = X[n_samples:]
  42. # Get exact neighbors
  43. nbrs = NearestNeighbors(n_neighbors=1, algorithm='brute',
  44.                         metric='cosine').fit(X_index)
  45. neighbors_exact = nbrs.kneighbors(X_query, return_distance=False)

  46. # Set `n_candidate` values
  47. n_candidates_values = np.linspace(10, 500, 5).astype(np.int)
  48. n_estimators_for_candidate_value = [1, 5, 10]
  49. n_iter = 10
  50. stds_accuracies = np.zeros((len(n_estimators_for_candidate_value),
  51.                             n_candidates_values.shape[0]),
  52.                            dtype=float)
  53. accuracies_c = np.zeros((len(n_estimators_for_candidate_value),
  54.                          n_candidates_values.shape[0]), dtype=float)

  55. # LSH Forest is a stochastic index: perform several iteration to estimate
  56. # expected accuracy and standard deviation displayed as error bars in
  57. # the plots
  58. for j, value in enumerate(n_estimators_for_candidate_value):
  59.     for i, n_candidates in enumerate(n_candidates_values):
  60.         accuracy_c = []
  61.         for seed in range(n_iter):
  62.             lshf = LSHForest(n_estimators=value,
  63.                              n_candidates=n_candidates, n_neighbors=1,
  64.                              random_state=seed)
  65.             # Build the LSH Forest index
  66.             lshf.fit(X_index)
  67.             # Get neighbors
  68.             neighbors_approx = lshf.kneighbors(X_query,
  69.                                                return_distance=False)
  70.             accuracy_c.append(np.sum(np.equal(neighbors_approx,
  71.                                               neighbors_exact)) /
  72.                               n_queries)

  73.         stds_accuracies[j, i] = np.std(accuracy_c)
  74.         accuracies_c[j, i] = np.mean(accuracy_c)

  75. # Set `n_estimators` values
  76. n_estimators_values = [1, 5, 10, 20, 30, 40, 50]
  77. accuracies_trees = np.zeros(len(n_estimators_values), dtype=float)

  78. # Calculate average accuracy for each value of `n_estimators`
  79. for i, n_estimators in enumerate(n_estimators_values):
  80.     lshf = LSHForest(n_estimators=n_estimators, n_neighbors=1)
  81.     # Build the LSH Forest index
  82.     lshf.fit(X_index)
  83.     # Get neighbors
  84.     neighbors_approx = lshf.kneighbors(X_query, return_distance=False)
  85.     accuracies_trees[i] = np.sum(np.equal(neighbors_approx,
  86.                                           neighbors_exact))/n_queries

  87. ###############################################################################
  88. # Plot the accuracy variation with `n_candidates`
  89. plt.figure()
  90. colors = ['c', 'm', 'y']
  91. for i, n_estimators in enumerate(n_estimators_for_candidate_value):
  92.     label = 'n_estimators = %d ' % n_estimators
  93.     plt.plot(n_candidates_values, accuracies_c[i, :],
  94.              'o-', c=colors[i], label=label)
  95.     plt.errorbar(n_candidates_values, accuracies_c[i, :],
  96.                  stds_accuracies[i, :], c=colors[i])

  97. plt.legend(loc='upper left', prop=dict(size='small'))
  98. plt.ylim([0, 1.2])
  99. plt.xlim(min(n_candidates_values), max(n_candidates_values))
  100. plt.ylabel("Accuracy")
  101. plt.xlabel("n_candidates")
  102. plt.grid(which='both')
  103. plt.title("Accuracy variation with n_candidates")

  104. # Plot the accuracy variation with `n_estimators`
  105. plt.figure()
  106. plt.scatter(n_estimators_values, accuracies_trees, c='k')
  107. plt.plot(n_estimators_values, accuracies_trees, c='g')
  108. plt.ylim([0, 1.2])
  109. plt.xlim(min(n_estimators_values), max(n_estimators_values))
  110. plt.ylabel("Accuracy")
  111. plt.xlabel("n_estimators")
  112. plt.grid(which='both')
  113. plt.title("Accuracy variation with n_estimators")

  114. plt.show()
复制代码

7
ReneeBK(未真实交易用户) 发表于 2016-10-10 07:50:47
  1. """
  2. ================================================
  3. Varying regularization in Multi-layer Perceptron
  4. ================================================

  5. A comparison of different values for regularization parameter 'alpha' on
  6. synthetic datasets. The plot shows that different alphas yield different
  7. decision functions.

  8. Alpha is a parameter for regularization term, aka penalty term, that combats
  9. overfitting by constraining the size of the weights. Increasing alpha may fix
  10. high variance (a sign of overfitting) by encouraging smaller weights, resulting
  11. in a decision boundary plot that appears with lesser curvatures.
  12. Similarly, decreasing alpha may fix high bias (a sign of underfitting) by
  13. encouraging larger weights, potentially resulting in a more complicated
  14. decision boundary.
  15. """
  16. print(__doc__)


  17. # Author: Issam H. Laradji
  18. # License: BSD 3 clause

  19. import numpy as np
  20. from matplotlib import pyplot as plt
  21. from matplotlib.colors import ListedColormap
  22. from sklearn.model_selection import train_test_split
  23. from sklearn.preprocessing import StandardScaler
  24. from sklearn.datasets import make_moons, make_circles, make_classification
  25. from sklearn.neural_network import MLPClassifier

  26. h = .02  # step size in the mesh

  27. alphas = np.logspace(-5, 3, 5)
  28. names = []
  29. for i in alphas:
  30.     names.append('alpha ' + str(i))

  31. classifiers = []
  32. for i in alphas:
  33.     classifiers.append(MLPClassifier(alpha=i, random_state=1))

  34. X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
  35.                            random_state=0, n_clusters_per_class=1)
  36. rng = np.random.RandomState(2)
  37. X += 2 * rng.uniform(size=X.shape)
  38. linearly_separable = (X, y)

  39. datasets = [make_moons(noise=0.3, random_state=0),
  40.             make_circles(noise=0.2, factor=0.5, random_state=1),
  41.             linearly_separable]

  42. figure = plt.figure(figsize=(17, 9))
  43. i = 1
  44. # iterate over datasets
  45. for X, y in datasets:
  46.     # preprocess dataset, split into training and test part
  47.     X = StandardScaler().fit_transform(X)
  48.     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4)

  49.     x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
  50.     y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
  51.     xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
  52.                          np.arange(y_min, y_max, h))

  53.     # just plot the dataset first
  54.     cm = plt.cm.RdBu
  55.     cm_bright = ListedColormap(['#FF0000', '#0000FF'])
  56.     ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
  57.     # Plot the training points
  58.     ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)
  59.     # and testing points
  60.     ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6)
  61.     ax.set_xlim(xx.min(), xx.max())
  62.     ax.set_ylim(yy.min(), yy.max())
  63.     ax.set_xticks(())
  64.     ax.set_yticks(())
  65.     i += 1

  66.     # iterate over classifiers
  67.     for name, clf in zip(names, classifiers):
  68.         ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
  69.         clf.fit(X_train, y_train)
  70.         score = clf.score(X_test, y_test)

  71.         # Plot the decision boundary. For that, we will assign a color to each
  72.         # point in the mesh [x_min, x_max]x[y_min, y_max].
  73.         if hasattr(clf, "decision_function"):
  74.             Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
  75.         else:
  76.             Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]

  77.         # Put the result into a color plot
  78.         Z = Z.reshape(xx.shape)
  79.         ax.contourf(xx, yy, Z, cmap=cm, alpha=.8)

  80.         # Plot also the training points
  81.         ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)
  82.         # and testing points
  83.         ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright,
  84.                    alpha=0.6)

  85.         ax.set_xlim(xx.min(), xx.max())
  86.         ax.set_ylim(yy.min(), yy.max())
  87.         ax.set_xticks(())
  88.         ax.set_yticks(())
  89.         ax.set_title(name)
  90.         ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip('0'),
  91.                 size=15, horizontalalignment='right')
  92.         i += 1

  93. figure.subplots_adjust(left=.02, right=.98)
  94. plt.show()
复制代码

8
ReneeBK(未真实交易用户) 发表于 2016-10-10 07:52:38
  1. """
  2. ==============
  3. Non-linear SVM
  4. ==============

  5. Perform binary classification using non-linear SVC
  6. with RBF kernel. The target to predict is a XOR of the
  7. inputs.

  8. The color map illustrates the decision function learned by the SVC.
  9. """
  10. print(__doc__)

  11. import numpy as np
  12. import matplotlib.pyplot as plt
  13. from sklearn import svm

  14. xx, yy = np.meshgrid(np.linspace(-3, 3, 500),
  15.                      np.linspace(-3, 3, 500))
  16. np.random.seed(0)
  17. X = np.random.randn(300, 2)
  18. Y = np.logical_xor(X[:, 0] > 0, X[:, 1] > 0)

  19. # fit the model
  20. clf = svm.NuSVC()
  21. clf.fit(X, Y)

  22. # plot the decision function for each datapoint on the grid
  23. Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
  24. Z = Z.reshape(xx.shape)

  25. plt.imshow(Z, interpolation='nearest',
  26.            extent=(xx.min(), xx.max(), yy.min(), yy.max()), aspect='auto',
  27.            origin='lower', cmap=plt.cm.PuOr_r)
  28. contours = plt.contour(xx, yy, Z, levels=[0], linewidths=2,
  29.                        linetypes='--')
  30. plt.scatter(X[:, 0], X[:, 1], s=30, c=Y, cmap=plt.cm.Paired)
  31. plt.xticks(())
  32. plt.yticks(())
  33. plt.axis([-3, 3, -3, 3])
  34. plt.show()
复制代码

9
ReneeBK(未真实交易用户) 发表于 2016-10-10 07:53:27
  1. """
  2. ===================================================================
  3. Support Vector Regression (SVR) using linear and non-linear kernels
  4. ===================================================================

  5. Toy example of 1D regression using linear, polynomial and RBF kernels.

  6. """
  7. print(__doc__)

  8. import numpy as np
  9. from sklearn.svm import SVR
  10. import matplotlib.pyplot as plt

  11. ###############################################################################
  12. # Generate sample data
  13. X = np.sort(5 * np.random.rand(40, 1), axis=0)
  14. y = np.sin(X).ravel()

  15. ###############################################################################
  16. # Add noise to targets
  17. y[::5] += 3 * (0.5 - np.random.rand(8))

  18. ###############################################################################
  19. # Fit regression model
  20. svr_rbf = SVR(kernel='rbf', C=1e3, gamma=0.1)
  21. svr_lin = SVR(kernel='linear', C=1e3)
  22. svr_poly = SVR(kernel='poly', C=1e3, degree=2)
  23. y_rbf = svr_rbf.fit(X, y).predict(X)
  24. y_lin = svr_lin.fit(X, y).predict(X)
  25. y_poly = svr_poly.fit(X, y).predict(X)

  26. ###############################################################################
  27. # look at the results
  28. lw = 2
  29. plt.scatter(X, y, color='darkorange', label='data')
  30. plt.hold('on')
  31. plt.plot(X, y_rbf, color='navy', lw=lw, label='RBF model')
  32. plt.plot(X, y_lin, color='c', lw=lw, label='Linear model')
  33. plt.plot(X, y_poly, color='cornflowerblue', lw=lw, label='Polynomial model')
  34. plt.xlabel('data')
  35. plt.ylabel('target')
  36. plt.title('Support Vector Regression')
  37. plt.legend()
  38. plt.show()
复制代码

10
fengyg(未真实交易用户) 企业认证  发表于 2016-10-10 07:56:54
kankan

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-26 12:28