楼主: NewOccidental
1097 4

Ramp:Python Library for Prototyping of Machine Learning Solutions [推广有奖]

  • 0关注
  • 6粉丝

已卖:1554份资源

副教授

31%

还不是VIP/贵宾

-

TA的文库  其他...

Complex Data Analysis

东西方金融数据分析

eBook with Data and Code

威望
0
论坛币
11734 个
通用积分
2.2450
学术水平
119 点
热心指数
115 点
信用等级
114 点
经验
8940 点
帖子
173
精华
10
在线时间
30 小时
注册时间
2006-9-19
最后登录
2022-11-3

楼主
NewOccidental 发表于 2016-4-25 03:12:28 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

Ramp

Ramp is a python library for rapid prototyping of machine learning solutions. It's a light-weight pandas-based machine learning framework pluggable with existing python machine learning and statistics tools (scikit-learn, rpy2, etc.). Ramp provides a simple, declarative syntax for exploring features, algorithms and transformations quickly and efficiently.

ramp-master.zip (70.55 KB)


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:solutions Solution Learning machine Library framework learning existing features provides

沙发
NewOccidental 发表于 2016-4-25 03:13:42
  1. import urllib2

  2. import pandas as pd
  3. import sklearn
  4. from sklearn import decomposition

  5. import ramp
  6. from ramp.features import *
  7. from ramp.metrics import PositiveRate, Recall

  8. import logging
  9. logger = logging.getLogger()
  10. logger.setLevel(logging.DEBUG)

  11. # fetch and clean iris data from UCI
  12. data = pd.read_csv(urllib2.urlopen(
  13.     "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"))
  14. data = data.drop([149]) # bad line
  15. columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
  16. data.columns = columns

  17. # all features
  18. features = [FillMissing(f, 0) for f in columns[:-1]]

  19. # features, log transformed features, and interaction terms
  20. expanded_features = (
  21.     features +
  22.     [Log(F(f) + 1) for f in features] +
  23.     [
  24.         F('sepal_width') ** 2,
  25.         combo.Interactions(features),
  26.     ]
  27. )

  28. reporters = [
  29.     ramp.reporters.MetricReporter.factory(Recall(.4)),
  30.     ramp.reporters.DualThresholdMetricReporter.factory(Recall(), PositiveRate())
  31. ]


  32. # Define several models and feature sets to explore,
  33. # run 5 fold cross-validation on each and print the results.
  34. # We define 2 models and 4 feature sets, so this will be
  35. # 4 * 2 = 8 models tested.
  36. outcomes = ramp.shortcuts.cv_factory(
  37.     data=data,
  38.     folds=10,

  39.     target=[AsFactor('class')],

  40.     reporter_factories=reporters,

  41.     # Try out two algorithms
  42.     estimator=[
  43.         sklearn.ensemble.RandomForestClassifier(
  44.             n_estimators=20),
  45.         sklearn.linear_model.LogisticRegression(),
  46.         ],

  47.     # and 4 feature sets
  48.     features=[
  49.         expanded_features,

  50.         # Feature selection
  51.         # [trained.FeatureSelector(
  52.         #     expanded_features,
  53.         #     # use random forest's importance to trim
  54.         #     ramp.selectors.BinaryFeatureSelector(),
  55.         #     target=AsFactor('class'), # target to use
  56.         #     data=data,
  57.         #     n_keep=5, # keep top 5 features
  58.         #     )],

  59.         # Reduce feature dimension (pointless on this dataset)
  60.         [combo.DimensionReduction(expanded_features,
  61.                             decomposer=decomposition.PCA(n_components=4))],

  62.         # Normalized features
  63.         [Normalize(f) for f in expanded_features],
  64.     ]
  65. )

  66. print outcomes.values()[0]['reporters'][0]
复制代码

藤椅
NewOccidental 发表于 2016-4-25 03:14:16
  1. import pandas
  2. from ramp import *
  3. from ramp.estimators.sk import BinaryProbabilities
  4. import sklearn
  5. from sklearn import naive_bayes
  6. import gensim
  7. import tempfile

  8. try:
  9.     training_data = pandas.read_csv('train.csv')
  10. except IOError:
  11.     raise IOError("You need to download the 'Detecting Insults' dataset \
  12.                   from Kaggle to run this example. \
  13.                   http://www.kaggle.com/c/detecting-insults-in-social-commentary")


  14. tmpdir = tempfile.mkdtemp()
  15. context = DataContext(
  16.               store=tmpdir,
  17.               data=training_data)


  18. base_config = Configuration(
  19.     target='Insult',
  20.     metrics=[metrics.AUC()],
  21.     )

  22. base_features = [
  23.     Length('Comment'),
  24.     Log(Length('Comment') + 1)
  25. ]

  26. factory = ConfigFactory(
  27.     base_config,
  28.     features=[
  29.         # first feature set is basic attributes
  30.         base_features,

  31.         # second feature set adds word features
  32.         base_features + [
  33.             text.NgramCounts(
  34.                 text.Tokenizer('Comment'),
  35.                 mindocs=5,
  36.                 bool_=True)],

  37.         # third feature set creates character 5-grams
  38.         # and then selects the top 1000 most informative
  39.         base_features + [
  40.             trained.FeatureSelector(
  41.                 [text.NgramCounts(
  42.                     text.CharGrams('Comment', chars=5),
  43.                     bool_=True,
  44.                     mindocs=30)
  45.                 ],
  46.                 selector=selectors.BinaryFeatureSelector(),
  47.                 n_keep=1000,
  48.                 target=F('Insult')),
  49.             ],

  50.         # the fourth feature set creates 100 latent vectors
  51.         # from the character 5-grams
  52.         base_features + [
  53.             text.LSI(
  54.                 text.CharGrams('Comment', chars=5),
  55.                 mindocs=30,
  56.                 num_topics=100),
  57.             ]
  58.     ],

  59.     # we'll try two estimators (and wrap them so
  60.     # we get class probabilities as output):
  61.     model=[
  62.         BinaryProbabilities(
  63.             sklearn.linear_model.LogisticRegression()),
  64.         BinaryProbabilities(
  65.             naive_bayes.GaussianNB())
  66.     ]
  67. )


  68. for config in factory:
  69.     models.cv(config, context, folds=5, repeat=2,
  70.               print_results=True)


  71. def probability_of_insult(config, ctx, txt):
  72.     # create a unique index for this text
  73.     idx = int(md5(txt).hexdigest()[:10], 16)

  74.     # add the new comment to our DataFrame
  75.     d = DataFrame(
  76.             {'Comment':[txt]},
  77.             index=pandas.Index([idx]))
  78.     ctx.data = ctx.data.append(d)

  79.     # Specify which instances to predict with predict_index
  80.     # and make the prediction
  81.     pred, predict_x, predict_y = models.predict(
  82.             config,
  83.             ctx,
  84.             predict_index=pandas.Index([idx]))

  85.     return pred[idx]
复制代码

板凳
NewOccidental 发表于 2016-4-25 03:14:58
  1. # class Filter(Storable):

  2. #     def __init__(self, exclude_func=None, include_func=None):
  3. #         self.exclude_func = exclude_func
  4. #         self.include_func = include_func

  5. #     def filter(self, df):
  6. #         if self.include_func is not None:
  7. #             df =

  8. def filter_incomplete(df):
  9.     df = df.dropna()
  10.     return df
复制代码

报纸
NewOccidental 发表于 2016-4-25 03:16:36
  1. import types

  2. import numpy as np


  3. __all__ = ['Wrapper', 'Estimator', 'ConstantClassifier', 'ConstantRegressor',
  4.            'Probabilities', 'BinaryProbabilities', 'wrap_sklearn_like_estimator']


  5. class Wrapper(object):
  6.     def __init__(self, obj):
  7.         self._obj = obj

  8.     def __getattr__(self, attr):

  9.         if hasattr(self._obj, attr):
  10.             attr_value = getattr(self._obj,attr)

  11.             if isinstance(attr_value, types.MethodType):
  12.                 def callable(*args, **kwargs):
  13.                     return attr_value(*args, **kwargs)
  14.                 return callable
  15.             else:
  16.                 return attr_value

  17.         else:
  18.             raise AttributeError

  19.     def __getstate__(self): return self.__dict__
  20.     def __setstate__(self, d): self.__dict__.update(d)


  21. class Estimator(Wrapper):
  22.     def __init__(self, estimator):
  23.         self.base_estimator_ = estimator
  24.         super(Estimator, self).__init__(estimator)

  25.     def __repr__(self):
  26.         return repr(self.base_estimator_)

  27.     def fit(self, x, y, **kwargs):
  28.         return self.base_estimator_.fit(x.values, y.values, **kwargs)

  29.     def predict_maxprob(self, x, **kwargs):
  30.         """
  31.         Most likely value. Generally equivalent to predict.
  32.         """
  33.         return self.base_estimator_.predict(x.values, **kwargs)

  34.     def predict(self, x, **kwargs):
  35.         """
  36.         Model output. Not always the same as scikit_learn predict. E.g., in the
  37.         case of logistic regression, returns the probability of each outome.
  38.         """
  39.         return self.base_estimator_.predict(x.values, **kwargs)


  40. class Probabilities(Estimator):
  41.     """
  42.     Wraps a scikit-learn-like estimator to return probabilities (if
  43.     it supports it)
  44.     """
  45.     def __init__(self, estimator, binary=False):
  46.         """
  47.         binary: If True, predict returns only the probability
  48.             for the positive class. If False, returns probabilities for
  49.             all classes.
  50.         """
  51.         self.binary = binary
  52.         super(Probabilities, self).__init__(estimator)

  53.     def __str__(self):
  54.         return u"Probabilites for %s" % self.base_estimator_

  55.     def predict(self, x):
  56.         probs = self.base_estimator_.predict_proba(x)
  57.         if probs.shape[1] == 2 or self.binary:
  58.             return probs[:,1]
  59.         return probs


  60. class BinaryProbabilities(Probabilities):
  61.     def __init__(self, estimator):
  62.         super(BinaryProbabilities, self).__init__(estimator, binary=True)


  63. class ConstantClassifier(object):

  64.     def __init__(self, func):
  65.         self.func = func
  66.         self.constant = None

  67.     def fit(self, x, y, **kwargs):
  68.         self.constant = self.func(y)

  69.     def predict(self, x, **kwargs):
  70.         return np.full((x.shape[0],), int(self.constant > .5))

  71.     def predict_proba(self, x, **kwargs):
  72.         p = np.zeros((x.shape[0], 2))
  73.         p[:,0] = 1 - self.constant
  74.         p[:,1] = self.constant
  75.         return p


  76. class ConstantRegressor(object):

  77.     def __init__(self, func):
  78.         self.func = func
  79.         self.constant = None

  80.     def fit(self, x, y, **kwargs):
  81.         self.constant = self.func(y)

  82.     def predict(self, x, **kwargs):
  83.         return np.full((x.shape[0],), self.constant)


  84. def wrap_sklearn_like_estimator(estimator):
  85.     if isinstance(estimator, Estimator):
  86.         return estimator
  87.     elif estimator is None:
  88.         return None
  89.     elif not (hasattr(estimator, "fit") and (hasattr(estimator, "predict")
  90.                                           or hasattr(estimator, "predict_proba"))):
  91.         raise ValueError, "Invalid estimator: %s" % estimator
  92.     elif hasattr(estimator, "predict_proba"):
  93.         return Probabilities(estimator)
  94.     else:
  95.         return Estimator(estimator)
复制代码

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-31 20:53