楼主: Lisrelchen
2682 19

Top 20 Python Machine Learning Open Source Projects [推广有奖]

  • 0关注
  • 62粉丝

VIP

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50164 个
通用积分
81.5628
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

1论坛币
Fig. 1: Python Machine learning projects on GitHub, with color corresponding to commits/contributors. Bob, Iepy, Nilearn, and NuPIC have the highest such value.

  • scikit-learn, 18845 commits, 404 contributors,
    www.github.com/scikit-learn/scikit-learn
    scikit-learn is a Python module for machine learning built on top of SciPy.It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
  • Pylearn2, 7027 commits, 117 contributors,
    www.github.com/lisa-lab/pylearn2
    Pylearn2 is a library designed to make machine learning research easy. Its a library based on Theano
  • NuPIC, 4392 commits, 60 contributors,
    www.github.com/numenta/nupic
    The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implements the HTM learning algorithms. HTM is a detailed computational theory of the neocortex. At the core of HTM are time-based continuous learning algorithms that store and recall spatial and temporal patterns. NuPIC is suited to a variety of problems, particularly anomaly detection and prediction of streaming data sources.
  • Nilearn, 2742 commits, 28 contributors,
    www.github.com/nilearn/nilearn
    Nilearn is a Python module for fast and easy statistical learning on NeuroImaging data. It leverages the scikit-learn Python toolbox for multivariate statistics with applications such as predictive modeling, classification, decoding, or connectivity analysis.
  • PyBrain, 969 commits, 27 contributors,
    www.github.com/pybrain/pybrain
    PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.
  • Pattern, 943 commits, 20 contributors,
    www.github.com/clips/pattern
    Pattern is a web mining module for Python. It has tools for Data Mining, Natural Language Processing, Network Analysis and Machine Learning. It supports vector space model, clustering, classification using KNN, SVM, Perceptron
  • Fuel, 497 commits, 12 contributors,
    www.github.com/mila-udem/fuel
    Fuel provides your machine learning models with the data they need to learn. it has interfaces to common datasets such as MNIST, CIFAR-10 (image datasets), Google's One Billion Words (text). It gives you the ability to iterate over your data in a variety of ways, such as in minibatches with shuffled/sequential examples
  • Bob, 5080 commits, 11 contributors,
    www.github.com/idiap/bob
    Bob is a free signal-processing and machine learning toolbox The toolbox is written in a mix of Python and C++ and is designed to be both efficient and reduce development time. It is composed of a reasonably large number of packages that implement tools for image, audio & video processing, machine learning and pattern recognition
  • skdata, 441 commits, 10 contributors,
    www.github.com/jaberg/skdata
    Skdata is a library of data sets for machine learning and statistics. This module provides standardized Python access to toy problems as well as popular computer vision and natural language processing data sets.
  • MILK, 687 commits, 9 contributors,
    www.github.com/luispedro/milk
    Milk is a machine learning toolkit in Python. Its focus is on supervised classification with several classifiers available: SVMs, k-NN, random forests, decision trees. It also performs feature selection. These classifiers can be combined in many ways to form different classification systems.For unsupervised learning, milk supports k-means clustering and affinity propagation.
  • IEPY, 1758 commits, 9 contributors,
    www.github.com/machinalis/iepy
    IEPY is an open source tool for Information Extraction focused on Relation Extraction
    It's aimed at users needing to perform Information Extraction on a large dataset. scientists wanting to experiment with new IE algorithms.
  • Quepy, 131 commits, 9 contributors,
    www.github.com/machinalis/quepy
    Quepy is a python framework to transform natural language questions to queries in a database query language. It can be easily customized to different kinds of questions in natural language and database queries. So, with little coding you can build your own system for natural language access to your database.
    Currently Quepy provides support for Sparql and MQL query languages, with plans to extended it to other database query languages.
  • Hebel, 244 commits, 5 contributors,
    www.github.com/hannes-brt/hebel
    Hebel is a library for deep learning with neural networks in Python using GPU acceleration with CUDA through PyCUDA. It implements the most important types of neural network models and offers a variety of different activation functions and training methods such as momentum, Nesterov momentum, dropout, and early stopping.
  • mlxtend, 135 commits, 5 contributors,
    www.github.com/rasbt/mlxtend
    Its a library consisting of useful tools and extensions for the day-to-day data science tasks.
  • nolearn, 192 commits, 4 contributors,
    www.github.com/dnouri/nolearn
    This package contains a number of utility modules that are helpful with machine learning tasks. Most of the modules work together with scikit-learn, others are more generally useful.
  • Ramp, 179 commits, 4 contributors,
    www.github.com/kvh/ramp
    Ramp is a python library for rapid prototyping of machine learning solutions. It's a light-weight pandas-based machine learning framework pluggable with existing python machine learning and statistics tools (scikit-learn, rpy2, etc.). Ramp provides a simple, declarative syntax for exploring features, algorithms and transformations quickly and efficiently.
  • Feature Forge, 219 commits, 3 contributors,
    www.github.com/machinalis/featureforge
    A set of tools for creating and testing machine learning features, with a scikit-learn compatible API.
    This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, etc.), and particularly helpful if you use scikit-learn (although this can work if you have a different algorithm).
  • REP, 50 commits, 3 contributors,
    www.github.com/yandex/rep
    REP is environment for conducting data-driven research in a consistent and reproducible way. It has a unified classifiers wrapper for variety of implementations like TMVA, Sklearn, XGBoost, uBoost. It can train classifiers in parallel on a cluster. It supports interactive plots
  • Python Machine Learning Samples, 15 commits, 3 contributors,
    www.github.com/awslabs/machine-learning-samples
    A collection of sample applications built using Amazon Machine Learning.
  • Python-ELM, 17 commits, 1 contributor,
    www.github.com/dclambert/Python-ELM
    This is an implementation of the Extreme Learning Machine in Python, based on scikit-learn.
This post used some content from www.pansop.com/1039/

Related:

关键词:Projects Learning machine Project earning including machines learning features commits

本帖被以下文库推荐

沙发
albertwishedu 发表于 2016-2-6 17:42:40 |只看作者 |坛友微信交流群

使用道具

藤椅
bailihongchen 发表于 2016-2-6 18:07:31 |只看作者 |坛友微信交流群
thanks for shairng

使用道具

板凳
Lisrelchen 发表于 2016-4-7 09:15:54 |只看作者 |坛友微信交流群
  1. """
  2. The Haxby dataset: face vs house in object recognition
  3. =======================================================
  4. This example does a simple but efficient decoding on the Haxby dataset:
  5. using a feature selection, followed by an SVM.
  6. """

  7. #############################################################################
  8. # Retrieve the files of the Haxby dataset
  9. from nilearn import datasets

  10. haxby_dataset = datasets.fetch_haxby_simple()

  11. # print basic information on the dataset
  12. print('Mask nifti image (3D) is located at: %s' % haxby_dataset.mask)
  13. print('Functional nifti image (4D) is located at: %s' %
  14.       haxby_dataset.func[0])

  15. #############################################################################
  16. # Load the behavioral data
  17. import numpy as np
  18. y, session = np.loadtxt(haxby_dataset.session_target[0]).astype("int").T
  19. conditions = np.recfromtxt(haxby_dataset.conditions_target[0])['f0']

  20. # Restrict to faces and houses
  21. condition_mask = np.logical_or(conditions == b'face', conditions == b'house')
  22. y = y[condition_mask]
  23. conditions = conditions[condition_mask]

  24. # We have 2 conditions
  25. n_conditions = np.size(np.unique(y))

  26. #############################################################################
  27. # Prepare the fMRI data
  28. from nilearn.input_data import NiftiMasker

  29. mask_filename = haxby_dataset.mask
  30. # For decoding, standardizing is often very important
  31. nifti_masker = NiftiMasker(mask_img=mask_filename, sessions=session,
  32.                            smoothing_fwhm=4, standardize=True,
  33.                            memory="nilearn_cache", memory_level=1)
  34. func_filename = haxby_dataset.func[0]
  35. X = nifti_masker.fit_transform(func_filename)
  36. # Apply our condition_mask
  37. X = X[condition_mask]
  38. session = session[condition_mask]

  39. #############################################################################
  40. # Build the decoder

  41. # Define the prediction function to be used.
  42. # Here we use a Support Vector Classification, with a linear kernel
  43. from sklearn.svm import SVC
  44. svc = SVC(kernel='linear')

  45. # Define the dimension reduction to be used.
  46. # Here we use a classical univariate feature selection based on F-test,
  47. # namely Anova. We set the number of features to be selected to 500
  48. from sklearn.feature_selection import SelectKBest, f_classif
  49. feature_selection = SelectKBest(f_classif, k=500)

  50. # We have our classifier (SVC), our feature selection (SelectKBest), and now,
  51. # we can plug them together in a *pipeline* that performs the two operations
  52. # successively:
  53. from sklearn.pipeline import Pipeline
  54. anova_svc = Pipeline([('anova', feature_selection), ('svc', svc)])

  55. #############################################################################
  56. # Fit the decoder and predict

  57. anova_svc.fit(X, y)
  58. y_pred = anova_svc.predict(X)

  59. #############################################################################
  60. # Visualize the results

  61. # Look at the SVC's discriminating weights
  62. coef = svc.coef_
  63. # reverse feature selection
  64. coef = feature_selection.inverse_transform(coef)
  65. # reverse masking
  66. weight_img = nifti_masker.inverse_transform(coef)


  67. # Create the figure
  68. from nilearn import image
  69. from nilearn.plotting import plot_stat_map, show

  70. # Plot the mean image because we have no anatomic data
  71. mean_img = image.mean_img(func_filename)

  72. plot_stat_map(weight_img, mean_img, title='SVM weights')

  73. # Saving the results as a Nifti file may also be important
  74. weight_img.to_filename('haxby_face_vs_house.nii')

  75. #############################################################################
  76. # Obtain prediction scores via cross validation

  77. from sklearn.cross_validation import LeaveOneLabelOut

  78. # Define the cross-validation scheme used for validation.
  79. # Here we use a LeaveOneLabelOut cross-validation on the session label
  80. # divided by 2, which corresponds to a leave-two-session-out
  81. cv = LeaveOneLabelOut(session // 2)

  82. # Compute the prediction accuracy for the different folds (i.e. session)
  83. cv_scores = []
  84. for train, test in cv:
  85.     anova_svc.fit(X[train], y[train])
  86.     y_pred = anova_svc.predict(X[test])
  87.     cv_scores.append(np.sum(y_pred == y[test]) / float(np.size(y[test])))

  88. # Return the corresponding mean prediction accuracy
  89. classification_accuracy = np.mean(cv_scores)

  90. # Print the results
  91. print("Classification accuracy: %.4f / Chance level: %f" %
  92.       (classification_accuracy, 1. / n_conditions))
  93. # Classification accuracy: 0.9861 / Chance level: 0.5000

  94. show()
复制代码

使用道具

报纸
Lisrelchen 发表于 2016-4-7 09:22:49 |只看作者 |坛友微信交流群
  1. """
  2. Different classifiers in decoding the Haxby dataset
  3. =====================================================
  4. Here we compare different classifiers on a visual object recognition
  5. decoding task.
  6. """

  7. #############################################################################
  8. # We start by loading the data and applying simple transformations to it

  9. # Fetch data using nilearn dataset fetcher
  10. from nilearn import datasets
  11. haxby_dataset = datasets.fetch_haxby(n_subjects=1)

  12. # print basic information on the dataset
  13. print('First subject anatomical nifti image (3D) located is at: %s' %
  14.       haxby_dataset.anat[0])
  15. print('First subject functional nifti image (4D) is located at: %s' %
  16.       haxby_dataset.func[0])

  17. # load labels
  18. import numpy as np
  19. labels = np.recfromcsv(haxby_dataset.session_target[0], delimiter=" ")
  20. stimuli = labels['labels']
  21. # identify resting state labels in order to be able to remove them
  22. resting_state = stimuli == b'rest'

  23. # find names of remaining active labels
  24. categories = np.unique(stimuli[np.logical_not(resting_state)])

  25. # extract tags indicating to which acquisition run a tag belongs
  26. session_labels = labels["chunks"][np.logical_not(resting_state)]

  27. # Load the fMRI data
  28. from nilearn.input_data import NiftiMasker

  29. # For decoding, standardizing is often very important
  30. mask_filename = haxby_dataset.mask_vt[0]
  31. masker = NiftiMasker(mask_img=mask_filename, standardize=True)
  32. func_filename = haxby_dataset.func[0]
  33. masked_timecourses = masker.fit_transform(
  34.     func_filename)[np.logical_not(resting_state)]


  35. #############################################################################
  36. # Then we define the various classifiers that we use

  37. # A support vector classifier
  38. from sklearn.svm import SVC
  39. svm = SVC(C=1., kernel="linear")

  40. # The logistic regression
  41. from sklearn.linear_model import LogisticRegression, RidgeClassifier, \
  42.     RidgeClassifierCV
  43. logistic = LogisticRegression(C=1., penalty="l1")
  44. logistic_50 = LogisticRegression(C=50., penalty="l1")
  45. logistic_l2 = LogisticRegression(C=1., penalty="l2")

  46. # Cross-validated versions of these classifiers
  47. from sklearn.grid_search import GridSearchCV
  48. # GridSearchCV is slow, but note that it takes an 'n_jobs' parameter that
  49. # can significantly speed up the fitting process on computers with
  50. # multiple cores
  51. svm_cv = GridSearchCV(SVC(C=1., kernel="linear"),
  52.                       param_grid={'C': [.1, .5, 1., 5., 10., 50., 100.]},
  53.                       scoring='f1', n_jobs=1)

  54. logistic_cv = GridSearchCV(LogisticRegression(C=1., penalty="l1"),
  55.                            param_grid={'C': [.1, .5, 1., 5., 10., 50., 100.]},
  56.                            scoring='f1')
  57. logistic_l2_cv = GridSearchCV(LogisticRegression(C=1., penalty="l2"),
  58.                               param_grid={
  59.                                   'C': [.1, .5, 1., 5., 10., 50., 100.]},
  60.                               scoring='f1')

  61. # The ridge classifier has a specific 'CV' object that can set it's
  62. # parameters faster than using a GridSearchCV
  63. ridge = RidgeClassifier()
  64. ridge_cv = RidgeClassifierCV()

  65. # A dictionary, to hold all our classifiers
  66. classifiers = {'SVC': svm,
  67.                'SVC cv': svm_cv,
  68.                'log l1': logistic,
  69.                'log l1 50': logistic_50,
  70.                'log l1 cv': logistic_cv,
  71.                'log l2': logistic_l2,
  72.                'log l2 cv': logistic_l2_cv,
  73.                'ridge': ridge,
  74.                'ridge cv': ridge_cv}


  75. #############################################################################
  76. # Here we compute prediction scores and run time for all these
  77. # classifiers

  78. # Make a data splitting object for cross validation
  79. from sklearn.cross_validation import LeaveOneLabelOut, cross_val_score
  80. cv = LeaveOneLabelOut(session_labels)

  81. import time

  82. classifiers_scores = {}

  83. for classifier_name, classifier in sorted(classifiers.items()):
  84.     classifiers_scores[classifier_name] = {}
  85.     print(70 * '_')

  86.     for category in categories:
  87.         task_mask = np.logical_not(resting_state)
  88.         classification_target = (stimuli[task_mask] == category)
  89.         t0 = time.time()
  90.         classifiers_scores[classifier_name][category] = cross_val_score(
  91.             classifier,
  92.             masked_timecourses,
  93.             classification_target,
  94.             cv=cv, scoring="f1")

  95.         print("%10s: %14s -- scores: %1.2f +- %1.2f, time %.2fs" % (
  96.             classifier_name, category,
  97.             classifiers_scores[classifier_name][category].mean(),
  98.             classifiers_scores[classifier_name][category].std(),
  99.             time.time() - t0))
复制代码

使用道具

地板
Lisrelchen 发表于 2016-4-7 09:24:25 |只看作者 |坛友微信交流群
  1. """
  2. Voxel-Based Morphometry on Oasis dataset
  3. ========================================
  4. This example uses Voxel-Based Morphometry (VBM) to study the relationship
  5. between aging and gray matter density.
  6. The data come from the `OASIS <http://www.oasis-brains.org/>`_ project.
  7. If you use it, you need to agree with the data usage agreement available
  8. on the website.
  9. It has been run through a standard VBM pipeline (using SPM8 and
  10. NewSegment) to create VBM maps, which we study here.
  11. Predictive modeling analysis: VBM bio-markers of aging?
  12. --------------------------------------------------------
  13. We run a standard SVM-ANOVA nilearn pipeline to predict age from the VBM
  14. data. We use only 100 subjects from the OASIS dataset to limit the memory
  15. usage.
  16. Note that for an actual predictive modeling study of aging, the study
  17. should be ran on the full set of subjects. Also, parameters such as the
  18. smoothing applied to the data and the number of features selected by the
  19. Anova step should be set by nested cross-validation, as they impact
  20. significantly the prediction score.
  21. Brain mapping with mass univariate
  22. -----------------------------------
  23. SVM weights are very noisy, partly because heavy smoothing is detrimental
  24. for the prediction here. A standard analysis using mass-univariate GLM
  25. (here permuted to have exact correction for multiple comparisons) gives a
  26. much clearer view of the important regions.
  27. ____
  28. """
  29. # Authors: Elvis Dhomatob, <elvis.dohmatob@inria.fr>, Apr. 2014
  30. #          Virgile Fritsch, <virgile.fritsch@inria.fr>, Apr 2014
  31. #          Gael Varoquaux, Apr 2014
  32. import numpy as np
  33. from scipy import linalg
  34. import matplotlib.pyplot as plt
  35. from nilearn import datasets
  36. from nilearn.input_data import NiftiMasker

  37. n_subjects = 100  # more subjects requires more memory

  38. ### Load Oasis dataset ########################################################
  39. oasis_dataset = datasets.fetch_oasis_vbm(n_subjects=n_subjects)
  40. gray_matter_map_filenames = oasis_dataset.gray_matter_maps
  41. age = oasis_dataset.ext_vars['age'].astype(float)

  42. # print basic information on the dataset
  43. print('First gray-matter anatomy image (3D) is located at: %s' %
  44.       oasis_dataset.gray_matter_maps[0])  # 3D data
  45. print('First white-matter anatomy image (3D) is located at: %s' %
  46.       oasis_dataset.white_matter_maps[0])  # 3D data

  47. ### Preprocess data ###########################################################
  48. nifti_masker = NiftiMasker(
  49.     standardize=False,
  50.     smoothing_fwhm=2,
  51.     memory='nilearn_cache')  # cache options
  52. gm_maps_masked = nifti_masker.fit_transform(gray_matter_map_filenames)
  53. n_samples, n_features = gm_maps_masked.shape
  54. print("%d samples, %d features" % (n_subjects, n_features))

  55. ### Prediction with SVR #######################################################
  56. print("ANOVA + SVR")
  57. # Define the prediction function to be used.
  58. # Here we use a Support Vector Classification, with a linear kernel
  59. from sklearn.svm import SVR
  60. svr = SVR(kernel='linear')

  61. # Dimension reduction
  62. from sklearn.feature_selection import VarianceThreshold, SelectKBest, \
  63.         f_regression

  64. # Remove features with too low between-subject variance
  65. variance_threshold = VarianceThreshold(threshold=.01)

  66. # Here we use a classical univariate feature selection based on F-test,
  67. # namely Anova.
  68. feature_selection = SelectKBest(f_regression, k=2000)

  69. # We have our predictor (SVR), our feature selection (SelectKBest), and now,
  70. # we can plug them together in a *pipeline* that performs the two operations
  71. # successively:
  72. from sklearn.pipeline import Pipeline
  73. anova_svr = Pipeline([
  74.             ('variance_threshold', variance_threshold),
  75.             ('anova', feature_selection),
  76.             ('svr', svr)])

  77. ### Fit and predict
  78. anova_svr.fit(gm_maps_masked, age)
  79. age_pred = anova_svr.predict(gm_maps_masked)

  80. # Visualization
  81. # Look at the SVR's discriminating weights
  82. coef = svr.coef_
  83. # reverse feature selection
  84. coef = feature_selection.inverse_transform(coef)
  85. # reverse variance threshold
  86. coef = variance_threshold.inverse_transform(coef)
  87. # reverse masking
  88. weight_img = nifti_masker.inverse_transform(coef)

  89. # Create the figure
  90. from nilearn.plotting import plot_stat_map, show
  91. bg_filename = gray_matter_map_filenames[0]
  92. z_slice = 0
  93. from nilearn.image.resampling import coord_transform
  94. affine = weight_img.get_affine()
  95. _, _, k_slice = coord_transform(0, 0, z_slice,
  96.                                 linalg.inv(affine))
  97. k_slice = np.round(k_slice)

  98. fig = plt.figure(figsize=(5.5, 7.5), facecolor='k')
  99. weight_slice_data = weight_img.get_data()[..., k_slice, 0]
  100. vmax = max(-np.min(weight_slice_data), np.max(weight_slice_data)) * 0.5
  101. display = plot_stat_map(weight_img, bg_img=bg_filename,
  102.                         display_mode='z', cut_coords=[z_slice],
  103.                         figure=fig, vmax=vmax)
  104. display.title('SVM weights', y=1.2)

  105. # Measure accuracy with cross validation
  106. from sklearn.cross_validation import cross_val_score
  107. cv_scores = cross_val_score(anova_svr, gm_maps_masked, age)

  108. # Return the corresponding mean prediction accuracy
  109. prediction_accuracy = np.mean(cv_scores)
  110. print("=== ANOVA ===")
  111. print("Prediction accuracy: %f" % prediction_accuracy)
  112. print("")

  113. ### Inference with massively univariate model #################################
  114. print("Massively univariate model")

  115. # Statistical inference
  116. from nilearn.mass_univariate import permuted_ols
  117. data = variance_threshold.fit_transform(gm_maps_masked)
  118. neg_log_pvals, t_scores_original_data, _ = permuted_ols(
  119.     age, data,  # + intercept as a covariate by default
  120.     n_perm=2000,  # 1,000 in the interest of time; 10000 would be better
  121.     n_jobs=1)  # can be changed to use more CPUs
  122. signed_neg_log_pvals = neg_log_pvals * np.sign(t_scores_original_data)
  123. signed_neg_log_pvals_unmasked = nifti_masker.inverse_transform(
  124.     variance_threshold.inverse_transform(signed_neg_log_pvals))

  125. # Show results
  126. threshold = -np.log10(0.1)  # 10% corrected

  127. fig = plt.figure(figsize=(5.5, 7.5), facecolor='k')

  128. display = plot_stat_map(signed_neg_log_pvals_unmasked, bg_img=bg_filename,
  129.                         threshold=threshold, cmap=plt.cm.RdBu_r,
  130.                         display_mode='z', cut_coords=[z_slice],
  131.                         figure=fig)
  132. title = ('Negative $\log_{10}$ p-values'
  133.          '\n(Non-parametric + max-type correction)')
  134. display.title(title, y=1.2)

  135. signed_neg_log_pvals_slice_data = \
  136.     signed_neg_log_pvals_unmasked.get_data()[..., k_slice, 0]
  137. n_detections = (np.abs(signed_neg_log_pvals_slice_data) > threshold).sum()
  138. print('\n%d detections' % n_detections)

  139. show()
复制代码

使用道具

7
Lisrelchen 发表于 2016-4-7 09:27:28 |只看作者 |坛友微信交流群
  1. import os, sys; sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))

  2. from pattern.web import Google, plaintext
  3. from pattern.web import SEARCH

  4. # The pattern.web module has a SearchEngine class,
  5. # with a SearchEngine.search() method that yields a list of Result objects.
  6. # Each Result has url, title, text, language, author and date and properties.
  7. # Subclasses of SearchEngine include:
  8. # Google, Bing, Yahoo, Twitter, Facebook, Wikipedia, Wiktionary, Flickr, ...

  9. # This example retrieves results from Google based on a given query.
  10. # The Google search engine can handle SEARCH type searches.
  11. # Other search engines may also handle IMAGE, NEWS, ...

  12. # Google's "Custom Search API" is a paid service.
  13. # The pattern.web module uses a test account by default,
  14. # with a 100 free queries per day shared by all Pattern users.
  15. # If this limit is exceeded, SearchEngineLimitError is raised.
  16. # You should obtain your own license key at:
  17. # https://code.google.com/apis/console/
  18. # Activate "Custom Search API" under "Services" and get the key under "API Access".
  19. # Then use Google(license=[YOUR_KEY]).search().
  20. # This will give you 100 personal free queries, or 5$ per 1000 queries.
  21. engine = Google(license=None, language="en")

  22. # Veale & Hao's method for finding similes using wildcards (*):
  23. # http://afflatus.ucd.ie/Papers/LearningFigurative_CogSci07.pdf
  24. # This will match results such as:
  25. # - "as light as a feather",
  26. # - "as cute as a cupcake",
  27. # - "as drunk as a lord",
  28. # - "as snug as a bug", etc.
  29. q = "as * as a *"

  30. # Google is very fast but you can only get up to 100 (10x10) results per query.
  31. for i in range(1, 2):
  32.     for result in engine.search(q, start=i, count=10, type=SEARCH, cached=True):
  33.         print plaintext(result.text) # plaintext() removes all HTML formatting.
  34.         print result.url
  35.         print result.date
  36.         print
复制代码

使用道具

8
Lisrelchen 发表于 2016-4-7 09:28:36 |只看作者 |坛友微信交流群
  1. import os, sys; sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))

  2. from pattern.web import Google, plaintext

  3. # A search engine in pattern.web sometimes has custom methods that the others don't.
  4. # For example, Google has Google.translate() and Google.identify().

  5. # This example demonstrates the Google Translate API.
  6. # It will only work with a license key, since it is a paid service.
  7. # In the Google API console (https://code.google.com/apis/console/),
  8. # activate Translate API.

  9. g = Google(license=None) # Enter your license key.
  10. q = "Your mother was a hamster and your father smelled of elderberries!"    # en
  11. #   "Ihre Mutter war ein Hamster und euer Vater roch nach Holunderbeeren!"  # de
  12. print q
  13. print plaintext(g.translate(q, input="en", output="de")) # fr, de, nl, es, cs, ja, ...
  14. print

  15. q = "C'est un lapin, lapin de bois, un cadeau."
  16. print q
  17. print g.identify(q) # (language, confidence)
复制代码

使用道具

9
Lisrelchen 发表于 2016-4-7 09:29:49 |只看作者 |坛友微信交流群
  1. import os, sys; sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))

  2. from pattern.web import Bing, asynchronous, plaintext
  3. from pattern.web import SEARCH, IMAGE, NEWS

  4. import time

  5. # This example retrieves results from Bing based on a given query.
  6. # The Bing search engine can retrieve up to a 1000 results (10x100) for a query.

  7. # Bing's "Custom Search API" is a paid service.
  8. # The pattern.web module uses a test account by default,
  9. # with 5000 free queries per month shared by all Pattern users.
  10. # If this limit is exceeded, SearchEngineLimitError is raised.
  11. # You should obtain your own license key at:
  12. # https://datamarket.azure.com/account/
  13. engine = Bing(license=None, language="en")

  14. # Quote a query to match it exactly:
  15. q = "\"is more important than\""

  16. # When you execute a query,
  17. # the script will halt until all results are downloaded.
  18. # In apps with an infinite main loop (e.g., GUI, game),
  19. # it is often more useful if the app keeps on running
  20. # while the search is executed in the background.
  21. # This can be achieved with the asynchronous() function.
  22. # It takes any function and that function's arguments and keyword arguments:
  23. request = asynchronous(engine.search, q, start=1, count=100, type=SEARCH, timeout=10)

  24. # This while-loop simulates an infinite application loop.
  25. # In real-life you would have an app.update() or similar
  26. # in which you can check request.done every now and then.
  27. while not request.done:
  28.     time.sleep(0.01)
  29.     print ".",

  30. print
  31. print

  32. # An error occured in engine.search(), raise it.
  33. if request.error:
  34.     raise request.error

  35. # Retrieve the list of search results.
  36. for result in request.value:
  37.     print result.text
  38.     print result.url
  39.     print
复制代码

使用道具

10
Lisrelchen 发表于 2016-4-7 09:30:57 |只看作者 |坛友微信交流群
  1. import os, sys; sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))

  2. from pattern.web import Twitter, hashtags
  3. from pattern.db  import Datasheet, pprint, pd

  4. # This example retrieves tweets containing given keywords from Twitter.

  5. try:
  6.     # We'll store tweets in a Datasheet.
  7.     # A Datasheet is a table of rows and columns that can be exported as a CSV-file.
  8.     # In the first column, we'll store a unique id for each tweet.
  9.     # We only want to add the latest tweets, i.e., those we haven't seen yet.
  10.     # With an index on the first column we can quickly check if an id already exists.
  11.     # The pd() function returns the parent directory of this script + any given path.
  12.     table = Datasheet.load(pd("cool.csv"))
  13.     index = set(table.columns[0])
  14. except:
  15.     table = Datasheet()
  16.     index = set()

  17. engine = Twitter(language="en")

  18. # With Twitter.search(cached=False), a "live" request is sent to Twitter:
  19. # we get the most recent results instead of those in the local cache.
  20. # Keeping a local cache can also be useful (e.g., while testing)
  21. # because a query is instant when it is executed the second time.
  22. prev = None
  23. for i in range(2):
  24.     print i
  25.     for tweet in engine.search("is cooler than", start=prev, count=25, cached=False):
  26.         print
  27.         print tweet.text
  28.         print tweet.author
  29.         print tweet.date
  30.         print hashtags(tweet.text) # Keywords in tweets start with a "#".
  31.         print
  32.         # Only add the tweet to the table if it doesn't already exists.
  33.         if len(table) == 0 or tweet.id not in index:
  34.             table.append([tweet.id, tweet.text])
  35.             index.add(tweet.id)
  36.         # Continue mining older tweets in next iteration.
  37.         prev = tweet.id

  38. # Create a .csv in pattern/examples/01-web/
  39. table.save(pd("cool.csv"))

  40. print "Total results:", len(table)
  41. print

  42. # Print all the rows in the table.
  43. # Since it is stored as a CSV-file it grows comfortably each time the script runs.
  44. # We can also open the table later on: in other scripts, for further analysis, ...

  45. pprint(table, truncate=100)

  46. # Note: you can also search tweets by author:
  47. # Twitter().search("from:tom_de_smedt")
复制代码

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-11-5 20:27