楼主: Lisrelchen
1521 10

Milk:Machine Learning Toolkit in Python [推广有奖]

  • 0关注
  • 62粉丝

VIP

已卖:4194份资源

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50288 个
通用积分
83.6306
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

楼主
Lisrelchen 发表于 2016-4-25 03:20:51 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

MILK

Milk is a machine learning toolkit in Python. Its focus is on supervised classification with several classifiers available: SVMs, k-NN, random forests, decision trees. It also performs feature selection. These classifiers can be combined in many ways to form different classification systems.For unsupervised learning, milk supports k-means clustering and affinity propagation.

本帖隐藏的内容

milk-master.zip (197.44 KB)


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Learning machine earning Toolkit python different available learning machine several

本帖被以下文库推荐

沙发
Lisrelchen 发表于 2016-4-25 03:21:33
  1. # -*- coding: utf-8 -*-
  2. # Copyright (C) 2008-2012, Luis Pedro Coelho <luis@luispedro.org>
  3. # vim: set ts=4 sts=4 sw=4 expandtab smartindent:
  4. # License: MIT. See COPYING.MIT file in the milk distribution

  5. from __future__ import division
  6. import numpy as np
  7. from .normalise import normaliselabels
  8. from .base import supervised_model

  9. '''
  10. AdaBoost
  11. Simple implementation of Adaboost
  12. Learner
  13. -------
  14. boost_learner
  15. '''

  16. __all__ = [
  17.     'boost_learner',
  18.     ]

  19. def _adaboost(features, labels, base, max_iters):
  20.     m = len(features)
  21.     D = np.ones(m, dtype=float)
  22.     D /= m
  23.     Y = np.ones(len(labels), dtype=float)
  24.     names = np.array([-1, +1])
  25.     Y = names[labels]
  26.     H = []
  27.     A = []
  28.     for t in range(max_iters):
  29.         Ht = base.train(features, labels, weights=D)
  30.         train_out = np.array(list(map(Ht.apply, features)))
  31.         train_out = names[train_out.astype(int)]
  32.         Et = np.dot(D, (Y != train_out))
  33.         if Et > .5:
  34.             # early return
  35.             break
  36.         At = .5 * np.log((1. + Et) / (1. - Et))
  37.         D *= np.exp((-At) * Y * train_out)
  38.         D /= np.sum(D)
  39.         A.append(At)
  40.         H.append(Ht)
  41.     return H, A


  42. class boost_model(supervised_model):
  43.     def __init__(self, H, A, names):
  44.         self.H = H
  45.         self.A = A
  46.         self.names = names

  47.     def apply(self, f):
  48.         v = sum((a*h.apply(f)) for h,a in zip(self.H, self.A))
  49.         v /= np.sum(self.A)
  50.         return self.names[v > .5]


  51. class boost_learner(object):
  52.     '''
  53.     learner = boost_learner(weak_learner_type(), max_iters=100)
  54.     model = learner.train(features, labels)
  55.     test = model.apply(f)
  56.     AdaBoost learner
  57.     Attributes
  58.     ----------
  59.     base : learner
  60.         Weak learner
  61.     max_iters : integer
  62.         Nr of iterations (default: 100)
  63.     '''
  64.     def __init__(self, base, max_iters=100):
  65.         self.base = base
  66.         self.max_iters = max_iters

  67.     def train(self, features, labels, normalisedlabels=False, names=(0,1), weights=None, **kwargs):
  68.         if not normalisedlabels:
  69.             labels,names = normaliselabels(labels)
  70.         H,A = _adaboost(features, labels, self.base, self.max_iters)
  71.         return boost_model(H, A, names)
复制代码

藤椅
Lisrelchen 发表于 2016-4-25 03:22:20
  1. # -*- coding: utf-8 -*-
  2. # Copyright (C) 2011, Luis Pedro Coelho <luis@luispedro.org>
  3. # vim: set ts=4 sts=4 sw=4 expandtab smartindent:
  4. # License: MIT. See COPYING.MIT file in the milk distribution

  5. from __future__ import division

  6. class supervised_model(object):
  7.     def apply_many(self, fs):
  8.         '''
  9.         labels = model.apply_many( examples )
  10.         This is equivalent to ``map(model.apply, examples)`` but may be
  11.         implemented in a faster way.
  12.         Parameters
  13.         ----------
  14.         examples : sequence of training examples
  15.         Returns
  16.         -------
  17.         labels : sequence of labels
  18.         '''
  19.         return list(map(self.apply, fs))


  20. class base_adaptor(object):
  21.     def __init__(self, base):
  22.         self.base = base

  23.     def set_option(self, k, v):
  24.         self.base.set_option(k, v)
复制代码

板凳
Lisrelchen 发表于 2016-4-25 03:22:57
  1. from __future__ import division
  2. import numpy as np
  3. from .normalise import normaliselabels
  4. from .base import supervised_model

  5. __all__ = ['normaliselabels', 'ctransforms']

  6. class threshold_model(object):
  7.     '''
  8.     threshold_model
  9.     Attributes
  10.     ----------
  11.     threshold : float
  12.         threshold value
  13.     '''
  14.     def __init__(self, threshold=.5):
  15.         self.threshold = .5

  16.     def apply(self, f):
  17.         return f >= self.threshold

  18.     def __repr__(self):
  19.         return 'threshold_model({})'.format(self.threshold)
  20.     __str__ = __repr__

  21. class fixed_threshold_learner(object):
  22.     def __init__(self, threshold=.5):
  23.         self.threshold = threshold
  24.     def train(self, features, labels, **kwargs):
  25.         return threshold_model(self.threshold)

  26.     def __repr__(self):
  27.         return 'fixed_threshold_learner({})'.format(self.threshold)
  28.     __str__ = __repr__


  29. class ctransforms_model(supervised_model):
  30.     '''
  31.     model = ctransforms_model(models)
  32.     A model that consists of a series of transformations.
  33.     See Also
  34.     --------
  35.       ctransforms
  36.     '''
  37.     def __init__(self, models):
  38.         self.models = models

  39.     def apply_many(self, features):
  40.         if len(features) == 0:
  41.             return features
  42.         for m in self.models:
  43.             features = m.apply_many(features)
  44.         return features

  45.     def __repr__(self):
  46.         return 'ctransforms_model({})'.format(self.models)
  47.     __str__ = __repr__

  48.     def __getitem__(self, ix):
  49.         return self.models[ix]

  50.     def apply(self,features):
  51.         for T in self.models:
  52.             features = T.apply(features)
  53.         return features

  54. class ctransforms(object):
  55.     '''
  56.     ctransf = ctransforms(c0, c1, c2, ...)
  57.     Concatenate transforms.
  58.     '''
  59.     def __init__(self,*args):
  60.         self.transforms = args


  61.     def train(self, features, labels, **kwargs):
  62.         models = []
  63.         model = None
  64.         for T in self.transforms:
  65.             if model is not None:
  66.                 features = np.array([model.apply(f) for f in features])
  67.             model = T.train(features, labels, **kwargs)
  68.             models.append(model)
  69.         return ctransforms_model(models)

  70.     def __repr__(self):
  71.         return 'ctransforms(*{})'.format(self.transforms)

  72.     __str__ = __repr__

  73.     def set_option(self, opt, val):
  74.         idx, opt = opt
  75.         self.transforms[idx].set_option(opt,val)
复制代码

报纸
Lisrelchen 发表于 2016-4-25 03:24:42
  1. from __future__ import division
  2. import numpy as np
  3. from . classifier import normaliselabels

  4. __all__ = [
  5.     'sda',
  6.     'linearly_independent_subset',
  7.     'linear_independent_features',
  8.     'filterfeatures',
  9.     'featureselector',
  10.     'sda_filter',
  11.     'rank_corr',
  12.     'select_n_best',
  13.     ]

  14. def _sweep(A, k, flag):
  15.     Akk = A[k,k]
  16.     if Akk == 0:
  17.         Akk = 1.e-5

  18.     # cross[i,j] = A[i,k] * A[k,j]
  19.     cross = (A[:,k][:, np.newaxis] * A[k])
  20.     B = A - cross/Akk

  21.     # currently: B[i,j] = A[i,j] - A[i,k]*A[k,j]/Akk
  22.     # Now fix row k and col k, followed by Bkk
  23.     B[k] = flag * A[k]/A[k,k]
  24.     B[:,k] = flag * A[:,k]/A[k,k]
  25.     B[k,k] = -1./Akk
  26.     return B

  27. def sda(features, labels, tolerance=.01, significance_in=.05, significance_out=.05, loose=False):
  28.     '''
  29.     features_idx = sda(features, labels, tolerance=.01, significance_in=.05, significance_out=.05)
  30.     Stepwise Discriminant Analysis for feature selection
  31.     Pre-filter the feature matrix to remove linearly dependent features
  32.     before calling this function. Behaviour is undefined otherwise.
  33.     This implements the algorithm described in Jennrich, R.I. (1977), "Stepwise
  34.     Regression" & "Stepwise Discriminant Analysis," both in Statistical Methods
  35.     for Digital Computers, eds.  K. Enslein, A. Ralston, and H. Wilf, New York;
  36.     John Wiley & Sons, Inc.
  37.     Parameters
  38.     ----------
  39.     features : ndarray
  40.         feature matrix. There should not be any perfectly correlated features.
  41.     labels : 1-array
  42.         labels
  43.     tolerance : float, optional
  44.     significance_in : float, optional
  45.     significance_out : float, optional
  46.     Returns
  47.     -------
  48.     features_idx : sequence
  49.         sequence of integer indices
  50.     '''
  51.     from scipy import stats

  52.     assert len(features) == len(labels), 'milk.supervised.featureselection.sda: length of features not the same as length of labels'
  53.     N, m = features.shape
  54.     labels,labelsu = normaliselabels(labels)
  55.     q = len(labelsu)

  56.     df = features - features.mean(0)
  57.     T = np.dot(df.T, df)

  58.     dfs = [(features[labels == i] - features[labels == i].mean(0)) for i in range(q)]
  59.     W = np.sum(np.dot(d.T, d) for d in dfs)

  60.     ignoreidx = ( W.diagonal() == 0 )
  61.     if ignoreidx.any():
  62.         idxs, = np.where(~ignoreidx)
  63.         if not len(idxs):
  64.             return np.arange(m)
  65.         selected = sda(features[:,~ignoreidx],labels)
  66.         return idxs[selected]
  67.     output = []
  68.     D = W.diagonal()
  69.     df1 = q-1
  70.     last_enter_k = -1
  71.     while True:
  72.         V = W.diagonal()/T.diagonal()
  73.         W_d = W.diagonal()
  74.         V_neg = (W_d < 0)
  75.         p = V_neg.sum()
  76.         if V_neg.any():
  77.             V_m = V[V_neg].min()
  78.             k, = np.where(V == V_m)
  79.             k = k[0]
  80.             Fremove = (N-p-q+1)/(q-1)*(V_m-1)
  81.             df2 = N-p-q+1
  82.             PrF = 1 - stats.f.cdf(Fremove,df1,df2)
  83.             if PrF > significance_out:
  84.                 #print 'removing ',k, 'V(k)', 1./V_m, 'Fremove', Fremove, 'df1', df1, 'df2', df2, 'PrF', PrF
  85.                 if k == last_enter_k:
  86.                     # We are going into an infinite loop.
  87.                     import warnings
  88.                     warnings.warn('milk.featureselection.sda: infinite loop detected (maybe bug?).')
  89.                     break
  90.                 W = _sweep(W,k,1)
  91.                 T = _sweep(T,k,1)
  92.                 continue
  93.         ks = ( (W_d / D) > tolerance)
  94.         if ks.any():
  95.             V_m = V[ks].min()
  96.             k, = np.where(V==V_m)
  97.             k = k[0]
  98.             Fenter = (N-p-q)/(q-1) * (1-V_m)/V_m
  99.             df2 = N-p-q
  100.             PrF = 1 - stats.f.cdf(Fenter,df1,df2)
  101.             if PrF < significance_in:
  102.                 #print 'adding ',k, 'V(k)', 1./V_m, 'Fenter', Fenter, 'df1', df1, 'df2', df2, 'PrF', PrF
  103.                 W = _sweep(W,k,-1)
  104.                 T = _sweep(T,k,-1)
  105.                 if loose or (PrF < 0.0001):
  106.                     output.append((Fenter,k))
  107.                 last_enter_k = k
  108.                 continue
  109.         break

  110.     output.sort(reverse=True)
  111.     return np.array([idx for _,idx in output])


  112. def linearly_independent_subset(V, threshold=1.e-5, return_orthogonal_basis=False):
  113.     '''
  114.     subset = linearly_independent_subset(V, threshold=1.e-5)
  115.     subset,U = linearly_independent_subset(V, threshold=1.e-5, return_orthogonal_basis=True)
  116.     Discover a linearly independent subset of `V`
  117.     Parameters
  118.     ----------
  119.     V : sequence of input vectors
  120.     threshold : float, optional
  121.         vectors with 2-norm smaller or equal to this are considered zero
  122.         (default: 1e.-5)
  123.     return_orthogonal_basis : Boolean, optional
  124.         whether to return orthogonal basis set
  125.     Returns
  126.     -------
  127.     subset : ndarray of integers
  128.         indices used for basis
  129.     U : 2-array
  130.         orthogonal basis into span{V}
  131.     Implementation Reference
  132.     ------------------------
  133.     Use Gram-Schmidt with a check for when the v_k is close enough to zero to ignore
  134.     See http://en.wikipedia.org/wiki/Gram-Schmidt_process
  135.     '''
  136.     V = np.array(V, copy=True)
  137.     orthogonal = []
  138.     used = []
  139.     for i,u in enumerate(V):
  140.         for v in orthogonal:
  141.             u -= np.dot(u,v)/np.dot(v,v) * v
  142.         if np.dot(u,u) > threshold:
  143.             orthogonal.append(u)
  144.             used.append(i)
  145.     if return_orthogonal_basis:
  146.         return np.array(used),np.array(orthogonal)
  147.     return np.array(used)


  148. def linear_independent_features(features, labels=None):
  149.     '''
  150.     indices = linear_independent_features(features, labels=None)
  151.     Returns the indices of a set of linearly independent features (columns).
  152.     Parameters
  153.     ----------
  154.     features : ndarray
  155.     labels : ignored
  156.         This argument is only here to conform to the learner interface.
  157.     Returns
  158.     -------
  159.     indices : ndarray of integers
  160.         indices of features to keep
  161.     See Also
  162.     --------
  163.     `linearly_independent_subset` :
  164.         this function is equivalent to `linearly_independent_subset(features.T)`
  165.     '''
  166.     return linearly_independent_subset(features.T)


  167. class filterfeatures(object):
  168.     '''
  169.     selector = filterfeatures(idxs)
  170.     Returns a transformer which selects the features given by idxs. I.e.,
  171.     ``apply(features)`` is equivalent to ``features[idxs]``
  172.     Parameters
  173.     ----------
  174.     idxs : ndarray
  175.         This can be either an array of integers (positions) or an array of booleans
  176.     '''
  177.     def __init__(self, idxs):
  178.         self.idxs = idxs

  179.     def apply(self, features):
  180.         return features[self.idxs]

  181.     def apply_many(self, features):
  182.         if len(features) == 0:
  183.             return features
  184.         features = np.asanyarray(features)
  185.         return features[:,self.idxs]

  186.     def __repr__(self):
  187.         return 'filterfeatures(%s)' % self.idxs

  188. class featureselector(object):
  189.     '''
  190.     selector = featureselector(function)
  191.     Returns a transformer which selects features according to
  192.         selected_idxs = function(features,labels)
  193.     '''
  194.     def __init__(self, selector):
  195.         self.selector = selector

  196.     def train(self, features, labels, **kwargs):
  197.         idxs = self.selector(features, labels)
  198.         if len(idxs) == 0:
  199.             import warnings
  200.             warnings.warn('milk.featureselection: No features selected! Using all features as fall-back.')
  201.             idxs = np.arange(len(features[0]))
  202.         return filterfeatures(idxs)

  203.     def __repr__(self):
  204.         return 'featureselector(%s)' % self.selector

  205. def sda_filter():
  206.     return featureselector(sda)

  207. def rank_corr(features, labels):
  208.     '''
  209.     rs = rank_corr(features, labels)
  210.     Computes the following expression::
  211.         rs[i] = max_e COV2(rank(features[:,i]), labels == e)
  212.     This is appropriate for numeric features and categorical labels.
  213.     Parameters
  214.     ----------
  215.     features : ndarray
  216.         feature matrix
  217.     labels : sequence
  218.     Returns
  219.     -------
  220.     rs : ndarray of float
  221.         rs are the rank correlations
  222.     '''
  223.     features = np.asanyarray(features)
  224.     labels = np.asanyarray(labels)

  225.     n = len(features)
  226.     ranks = features.argsort(0)
  227.     ranks = ranks.astype(float)
  228.     binlabels = np.array([(labels == ell) for ell in set(labels)], dtype=float)
  229.     mx = ranks.mean(0)
  230.     my = binlabels.mean(1)
  231.     sx = ranks.std(0)
  232.     sy = binlabels.std(1)

  233.     r = np.dot(binlabels,ranks)
  234.     r -= np.outer(n*my, mx)
  235.     r /= np.outer(sy, sx)
  236.     r /= n # Use n [instead of n-1] to match numpy's corrcoef
  237.     r **= 2
  238.     return r.max(0)

  239. class select_n_best(object):
  240.     '''
  241.     select_n_best(n, measure)
  242.     Selects the `n` features that score the highest in `measure`
  243.     '''
  244.     def __init__(self, n, measure):
  245.         self.n = n
  246.         self.measure = measure

  247.     def train(self, features, labels, **kwargs):
  248.         values = self.measure(features, labels)
  249.         values = values.argsort()
  250.         return filterfeatures(values[:self.n])
复制代码

地板
Lisrelchen 发表于 2016-4-25 03:25:27
  1. from __future__ import division
  2. from collections import defaultdict
  3. from milk.utils import get_nprandom
  4. import numpy as np
  5. from .base import supervised_model

  6. __all__ = [
  7.     'kNN',
  8.     'knn_learner',
  9.     'approximate_knn_learner',
  10.     ]

  11. def _plurality(xs):
  12.     from collections import defaultdict
  13.     counts = defaultdict(int)
  14.     for x in xs: counts[x] += 1
  15.     best,_ = max(iter(counts.items()), key=(lambda k_v: k_v[1]))
  16.     return best

  17. class kNN(object):
  18.     '''
  19.     k-Nearest Neighbour Classifier
  20.     Naive implementation of a k-nearest neighbour classifier.
  21.     C = kNN(k)
  22.     Attributes:
  23.     -----------
  24.     k : integer
  25.         number of neighbours to consider
  26.     '''


  27.     def __init__(self, k=1):
  28.         self.k = k

  29.     def train(self, features, labels, normalisedlabels=False, copy_features=False):
  30.         features = np.asanyarray(features)
  31.         labels = np.asanyarray(labels)
  32.         if copy_features:
  33.             features = features.copy()
  34.             labels = labels.copy()
  35.         features2 = np.sum(features**2, axis=1)
  36.         return kNN_model(self.k, features, features2, labels)

  37. knn_learner = kNN

  38. class kNN_model(supervised_model):
  39.     def __init__(self, k, features, features2, labels):
  40.         self.k = k
  41.         self.features = features
  42.         self.f2 = features2
  43.         self.labels = labels

  44.     def apply(self, features):
  45.         features = np.asanyarray(features)
  46.         diff2 = np.dot(self.features, (-2.)*features)
  47.         diff2 += self.f2
  48.         neighbours = diff2.argsort()[:self.k]
  49.         labels = self.labels[neighbours]
  50.         return _plurality(labels)


  51. class approximate_knn_model(supervised_model):
  52.     def __init__(self, k, X, projected):
  53.         self.k = k
  54.         self.X = X
  55.         self.projected = projected
  56.         self.p2 = np.array([np.dot(p,p) for p in projected])

  57.     def apply(self, t):
  58.         tx = np.dot(self.X.T, t)
  59.         d = np.dot(self.projected,tx)
  60.         d *= -2
  61.         d += self.p2
  62.         if self.k == 1:
  63.             return np.array([d.argmin()])
  64.         d = d.argsort()
  65.         return d[:self.k]

  66. class approximate_knn_classification_model(supervised_model):
  67.     def __init__(self, k, X, projected, labels):
  68.         self.base = approximate_knn_model(k, X, projected)
  69.         self.labels = labels

  70.     def apply(self, f):
  71.         idxs = self.base.apply(f)
  72.         return _plurality(self.labels[idxs])

  73. class approximate_knn_learner(object):
  74.     '''
  75.     approximate_knn_learner
  76.     Learns a k-nearest neighbour classifier, where the proximity is approximate
  77.     as it is computed on a small dimensional subspace (random subspace
  78.     projection). For many datasets, this is acceptable.
  79.     '''

  80.     def __init__(self, k, ndims=8):
  81.         self.k = k
  82.         self.ndims = ndims
  83.     def train(self, features, labels, **kwargs):
  84.         labels = np.asanyarray(labels)
  85.         R = get_nprandom(kwargs.get('R'))
  86.         _, n_features = features.shape
  87.         X = R.random_sample((n_features, self.ndims))
  88.         projected = np.dot(features, X)
  89.         return approximate_knn_classification_model(self.k, X, projected, labels.copy())
复制代码

7
Lisrelchen 发表于 2016-4-25 03:36:01
  1. # -*- coding: utf-8 -*-
  2. import numpy as np
  3. from . import _lasso
  4. from .base import supervised_model
  5. from milk.unsupervised import center

  6. def lasso(X, Y, B=None, lam=1., max_iter=None, tol=None):
  7.     '''
  8.     B = lasso(X, Y, B={np.zeros()}, lam=1. max_iter={1024}, tol={1e-5})
  9.     Solve LASSO Optimisation
  10.         B* = arg min_B ½/n || Y - BX ||₂2 + λ||B||₁
  11.     where $n$ is the number of samples.
  12.     Milk uses coordinate descent, looping through the coordinates in order
  13.     (with an active set strategy to update only non-zero βs, if possible). The
  14.     problem is convex and the solution is guaranteed to be optimal (within
  15.     floating point accuracy).
  16.     Parameters
  17.     ----------
  18.     X : ndarray
  19.         Design matrix
  20.     Y : ndarray
  21.         Matrix of outputs
  22.     B : ndarray, optional
  23.         Starting values for approximation. This can be used for a warm start if
  24.         you have an estimate of where the solution should be. If used, the
  25.         solution might be written in-place (if the array has the right format).
  26.     lam : float, optional
  27.         λ (default: 1.0)
  28.     max_iter : int, optional
  29.         Maximum nr of iterations (default: 1024)
  30.     tol : float, optional
  31.         Tolerance. Whenever a parameter is to be updated by a value smaller
  32.         than ``tolerance``, that is considered a null update. Be careful that
  33.         if the value is too small, performance will degrade horribly.
  34.         (default: 1e-5)
  35.     Returns
  36.     -------
  37.     B : ndarray
  38.     '''
  39.     X = np.ascontiguousarray(X, dtype=np.float32)
  40.     Y = np.ascontiguousarray(Y, dtype=np.float32)
  41.     if B is None:
  42.         B = np.zeros((Y.shape[0],X.shape[0]), np.float32)
  43.     else:
  44.         B = np.ascontiguousarray(B, dtype=np.float32)
  45.     if max_iter is None:
  46.         max_iter = 1024
  47.     if tol is None:
  48.         tol = 1e-5
  49.     if X.shape[0] != B.shape[1] or \
  50.         Y.shape[0] != B.shape[0] or \
  51.         X.shape[1] != Y.shape[1]:
  52.         raise ValueError('milk.supervised.lasso: Dimensions do not match')
  53.     if np.any(np.isnan(X)) or np.any(np.isnan(B)):
  54.         raise ValueError('milk.supervised.lasso: NaNs are only supported in the ``Y`` matrix')
  55.     W = np.ascontiguousarray(~np.isnan(Y), dtype=np.float32)
  56.     Y = np.nan_to_num(Y)
  57.     n = Y.size
  58.     _lasso.lasso(X, Y, W, B, max_iter, float(2*n*lam), float(tol))
  59.     return B

  60. def lasso_walk(X, Y, B=None, nr_steps=None, start=None, step=None, tol=None, return_lams=False):
  61.     '''
  62.     Bs = lasso_walk(X, Y, B={np.zeros()}, nr_steps={64}, start={automatically inferred}, step={.9}, tol=None, return_lams=False)
  63.     Bs,lams = lasso_walk(X, Y, B={np.zeros()}, nr_steps={64}, start={automatically inferred}, step={.9}, tol=None, return_lams=True)
  64.     Repeatedly solve LASSO Optimisation
  65.         B* = arg min_B ½/n || Y - BX ||₂2 + λ||B||₁
  66.     for different values of λ.
  67.     Parameters
  68.     ----------
  69.     X : ndarray
  70.         Design matrix
  71.     Y : ndarray
  72.         Matrix of outputs
  73.     B : ndarray, optional
  74.         Starting values for approximation. This can be used for a warm start if
  75.         you have an estimate of where the solution should be.
  76.     start : float, optional
  77.         first λ to use (default is ``np.abs(Y).max()``)
  78.     nr_steps : int, optional
  79.         How many steps in the path (default is 64)
  80.     step : float, optional
  81.         Multiplicative step to take (default is 0.9)
  82.     tol : float, optional
  83.         This is the tolerance parameter. It is passed to the lasso function
  84.         unmodified.
  85.     return_lams : bool, optional
  86.         Whether to return the values of λ used (default: False)
  87.     Returns
  88.     -------
  89.     Bs : ndarray
  90.     '''
  91.     if nr_steps is None:
  92.         nr_steps = 64
  93.     if step is None:
  94.         step = .9
  95.     if start is None:
  96.         n = Y.size
  97.         start = 0.5/n*np.nanmax(np.abs(Y))*np.abs(X).max()


  98.     lam = start
  99.     lams = []
  100.     Bs = []
  101.     for i in range(nr_steps):
  102.         # The central idea is that each iteration is already "warm" and this
  103.         # should be faster than starting from zero each time
  104.         B = lasso(X, Y, B, lam=lam, tol=tol)
  105.         lams.append(lam)
  106.         Bs.append(B.copy())
  107.         lam *= step
  108.     if return_lams:
  109.         return np.array(Bs), np.array(lams)
  110.     return np.array(Bs)

  111. def _dict_subset(mapping, keys):
  112.     return dict(
  113.             [(k,mapping[k]) for k in keys])

  114. class lasso_model(supervised_model):
  115.     def __init__(self, betas, mean):
  116.         self.betas = betas
  117.         self.mean = mean

  118.     def retrain(self, features, labels, lam, **kwargs):
  119.         features, mean = center(features)
  120.         betas = lasso(features, labels, self.betas.copy(), lam=lam, **_dict_subset(kwargs, ['tol', 'max_iter']))
  121.         return lasso_model(betas, mean)
  122.         
  123.     def apply(self, features):
  124.         return np.dot(self.betas, features) + self.mean


  125. class lasso_learner(object):
  126.     def __init__(self, lam=1.0):
  127.         self.lam = lam

  128.     def train(self, features, labels, betas=None, **kwargs):
  129.         labels, mean = center(labels, axis=1)
  130.         betas = lasso(features, labels, betas, lam=self.lam)
  131.         return lasso_model(betas, mean)

  132. def lasso_model_walk(X, Y, B=None, nr_steps=64, start=None, step=.9, tol=None, return_lams=False):
  133.     Y, mean = center(Y, axis=1)
  134.     Bs,lams = lasso_walk(X,Y, B, nr_steps, start, step, tol, return_lams=True)
  135.     models = [lasso_model(B, mean) for B in Bs]
  136.     if return_lams:
  137.         return models, lams
  138.     return models
复制代码

8
Lisrelchen 发表于 2016-4-25 03:37:11
  1. from __future__ import division
  2. import numpy as np
  3. from .normalise import normaliselabels
  4. from .base import supervised_model

  5. __all__ = [
  6.     'logistic_learner',
  7.     ]

  8. @np.vectorize
  9. def _sigmoidal(z):
  10.     if (z > 300): return 1.
  11.     if z < -300: return 0.
  12.     return 1./(1+np.exp(-z))

  13. class logistic_model(supervised_model):
  14.     def __init__(self, bs):
  15.         self.bs = bs

  16.     def apply(self, fs):
  17.         return _sigmoidal(self.bs[0] + np.dot(fs, self.bs[1:]))

  18. class logistic_learner(object):
  19.     '''
  20.     learner = logistic_learner(alpha=0.0)
  21.     Logistic regression learner
  22.     There are two implementations:
  23.     1. One which depends on ``scipy.optimize``. This is the default and is
  24.        extremely fast.
  25.     2. If ``import scipy`` fails, then we fall back to a Python only
  26.        gradient-descent. This gives good results, but is many times slower.
  27.     Properties
  28.     ----------
  29.     alpha : real, optional
  30.         penalty for L2-normalisation. Default is zero, for no penalty.
  31.     '''
  32.     def __init__(self, alpha=0.0):
  33.         self.alpha = alpha

  34.     def train(self, features, labels, normalisedlabels=False, names=None, **kwargs):
  35.         def error(bs):
  36.             response = bs[0] + np.dot(features, bs[1:])
  37.             response = _sigmoidal(response)
  38.             diff = response - labels
  39.             log_like = np.dot(diff, diff)
  40.             L2_penalty = self.alpha * np.dot(bs, bs)
  41.             return log_like + L2_penalty
  42.         def error_prime(bs):
  43.             fB = np.dot(features, bs[1:])
  44.             response = _sigmoidal(bs[0] + fB)
  45.             sprime = response * (1-response)
  46.             ds = (response - labels) * sprime
  47.             b0p = np.sum(ds)
  48.             b1p = np.dot(features.T, ds)
  49.             bp = np.concatenate( ([b0p], b1p) )
  50.             return 2.*(bp + self.alpha*bs)

  51.         features = np.asanyarray(features)
  52.         if not normalisedlabels:
  53.             labels, _ = normaliselabels(labels)
  54.         N,f = features.shape
  55.         bs = np.zeros(f+1)
  56.         try:
  57.             from scipy import optimize
  58.             # Some testing revealed that this was a good combination
  59.             # call fmin_cg twice first and then fmin
  60.             # I do not understand why 100%, but there it is
  61.             bs = optimize.fmin_cg(error, bs, error_prime, disp=False)
  62.             bs = optimize.fmin_cg(error, bs, error_prime, disp=False)
  63.             bs = optimize.fmin(error, bs, disp=False)
  64.         except ImportError:
  65.             import warnings
  66.             warnings.warn('''\
  67. milk.supervised.logistic.train: Could not import scipy.optimize.
  68. Fall back to very simple gradient descent (which is slow).''')
  69.             bs = np.zeros(f+1)
  70.             cur = 1.e-6
  71.             ebs = error(bs)
  72.             for i in range(1000000):
  73.                 dir = error_prime(bs)
  74.                 step = (lambda e : bs - e *dir)
  75.                 enbs = ebs + 1
  76.                 while enbs > ebs:
  77.                     cur /= 2.
  78.                     if cur == 0.:
  79.                         break
  80.                     nbs = step(cur)
  81.                     enbs = error(nbs)
  82.                 while cur < 10.:
  83.                     cur *= 2
  84.                     nnbs = step(cur)
  85.                     ennbs = error(nnbs)
  86.                     if ennbs < enbs:
  87.                         nbs = nnbs
  88.                         enbs = ennbs
  89.                     else:
  90.                         break
  91.                 bs = nbs
  92.                 ebs = enbs
  93.         return logistic_model(bs)
复制代码

9
Lisrelchen 发表于 2016-4-25 03:38:47
  1. # -*- coding: utf-8 -*-
  2. # Copyright (C) 2008-2011, Luis Pedro Coelho <luis@luispedro.org>
  3. # vim: set ts=4 sts=4 sw=4 expandtab smartindent:
  4. #
  5. # License: MIT. See COPYING.MIT file in the milk distribution

  6. from __future__ import division
  7. import numpy as np

  8. def get_parzen_rbf_loocv(features,labels):
  9.     xij = np.dot(features,features.T)
  10.     f2 = np.sum(features**2,1)
  11.     d = f2-2*xij
  12.     d = d.T + f2
  13.     d_argsorted = d.argsort(1)
  14.     d_sorted = d.copy()
  15.     d_sorted.sort(1)
  16.     e_d = np.exp(-d_sorted)
  17.     labels_sorted = labels[d_argsorted].astype(np.double)
  18.     labels_sorted *= 2
  19.     labels_sorted -= 1
  20.     def f(sigma):
  21.         k = e_d ** (1./sigma)
  22.         return (((k[:,1:] * labels_sorted[:,1:]).sum(1) > 0) == labels).mean()
  23.     return f
复制代码

10
Lisrelchen 发表于 2016-4-25 03:39:28
  1. # -*- coding: utf-8 -*-
  2. # Copyright (C) 2011, Luis Pedro Coelho <luis@luispedro.org>
  3. # vim: set ts=4 sts=4 sw=4 expandtab smartindent:
  4. #
  5. # License: MIT. See COPYING.MIT file in the milk distribution

  6. import numpy as np
  7. from .classifier import normaliselabels
  8. from .base import supervised_model
  9. from . import _perceptron

  10. class perceptron_model(supervised_model):
  11.     def __init__(self, w):
  12.         self.w = w

  13.     def apply(self, f):
  14.         f = np.asanyarray(f)
  15.         v = self.w[0] + np.dot(f, self.w[1:])
  16.         return v > 0

  17. class perceptron_learner(object):
  18.     def __init__(self, eta=.1, max_iters=128):
  19.         self.eta = eta
  20.         self.max_iters = max_iters

  21.     def train(self, features, labels, normalisedlabels=False, **kwargs):
  22.         if not normalisedlabels:
  23.             labels, _ = normaliselabels(labels)
  24.         features = np.asanyarray(features)
  25.         if features.dtype not in (np.float32, np.float64):
  26.             features = features.astype(np.float64)
  27.         weights = np.zeros(features.shape[1]+1, features.dtype)
  28.         for i in range(self.max_iters):
  29.             errors = _perceptron.perceptron(features, labels, weights, self.eta)
  30.             if not errors:
  31.                 break
  32.         return perceptron_model(weights)
复制代码

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-27 04:01