楼主: ReneeBK
1159 9

[GitHub]Advanced Machine Learning [推广有奖]

  • 1关注
  • 62粉丝

VIP

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49407 个
通用积分
51.8104
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57815 点
帖子
4006
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Advanced Machine Learning

This folder contain many algorithms that I wrote for the graduate Machine Learning class at Stony Brook University.


Algorithms:
  • Naive Bayes vs. Logistic Regression

  • Adaboost

  • kNN vs. SVM

  • Expectation Maximization

  • Hidden Markov Chain


It also contains the theory from mt homework. For the final project about classifying complex networks, please refer to the specific repository.


Installation$ pip install -r requirements.txt
License

When making a reference to my work, please use my twitter handle b_t_3 or my website.


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

本帖隐藏的内容

https://github.com/bt3gl/Advanced-Machine-Learning



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:reference specific complex contain folder

沙发
ReneeBK 发表于 2017-4-22 04:29:11 |只看作者 |坛友微信交流群
  1. #!/usr/bin/python
  2. # marina von steinkirch @2014
  3. # steinkirch at gmail

  4. import sys
  5. import math
  6. import numpy as np


  7. class AdaBoost(object):
  8.     '''  Implements adaboost with the chosen classifier '''
  9.     def __init__(self, weak_classifier):
  10.         self.WeakClassifier = weak_classifier


  11.     def ada_train(self, T, X, Y, optional=False):
  12.        ''' adaboost training '''

  13.        # Defines variables
  14.        self.weak_classifier_ens = []
  15.        self.alpha = []
  16.        self.X = X
  17.        self.Y = Y
  18.        self.T = T
  19.        self.e = []
  20.        N = len(self.Y)

  21.        # Initializes with equal weigths
  22.        Z = (1.0/N)*np.ones(N)
  23.    
  24.        # T iterations
  25.        for t in range(T):
  26.            # Methods are inside the decision stump class
  27.            weak_learner = self.WeakClassifier()
  28.            weak_learner.set_training_sample(X,Y)
  29.            weak_learner.weights = Z

  30.            # extra plottings for the homework
  31.            if t < 10 and optional:
  32.                 print("For t = ", t+1)
  33.                 print('Y= ', int(Y[t]))
  34.                 opt = True
  35.            else: opt = False

  36.            # train the decision stump
  37.            weak_learner.stump_train(opt)
  38.            self.weak_classifier_ens.append(weak_learner)   
  39.         
  40.            # Predict so that wrong value will give more weight
  41.            Y_p= weak_learner.stump_predict(X)
  42.          
  43.            # Calculates weighted training error
  44.            epsilon = sum(0.5*Z*abs((Y-Y_p)))/sum(Z)
  45.            self.e = epsilon
  46.           
  47.            # Calculates alpha
  48.            inside = abs( (1-epsilon)/(epsilon*1.0)+0.00001 )
  49.            print inside
  50.            alpha = 0.5*math.log(inside)
  51.            self.alpha.append(alpha)
  52.      
  53.            # Updates the weights
  54.            Z *= np.exp(-alpha*Y*Y_p)
  55.            Z /= sum(Z)



  56.     def ada_predict(self, X=[]):
  57.        ''' adaboost predicting '''
  58.        if X == None: return
  59.        X = np.array(X)
  60.        N, d = X.shape
  61.        Y = np.zeros(N)
  62.        score = []

  63.        # T iterations
  64.        for t in range(self.T):
  65.             weak_learner = self.weak_classifier_ens[t]
  66.             Y += self.alpha[t]*weak_learner.stump_predict(X)
  67.                    score.append(np.sign(Y))

  68.        return score




  69.     def run_adaboost(self, X_train, Y_train, T, X_test=None, optional=False):
  70.        ''' test in training and test '''
  71.       
  72.        self.ada_train(T, X_train, Y_train, optional)
  73.        return self.ada_predict(X_train), self.ada_predict(X_test)
复制代码

使用道具

藤椅
ReneeBK 发表于 2017-4-22 04:30:07 |只看作者 |坛友微信交流群
  1. '''
  2.    based on: http://nipunbatra.github.io/2013/05/simulating-a-discrete-hidden-markov-model/
  3.    Unfair cassino problem: there may be two die, one fair and other biased.
  4.    The biased die is much more likely to produce a 6 than others (p=0.5).
  5.    The observer is only able to observe the values of die being thrown without
  6.    having a knowledge whether a fair or biased die were used:

  7.    observed states: 1 to 6 on the die faces
  8.    hidden states: fair or biased die
  9.    prior: probability that the first thworn is made from a fair or biased die
  10.    transition matrix A: matrix enconding the prob of the 4 possible transition
  11.        between fair and biased die
  12.    emission matrix B: matrix enconding the prob of an obsevation given the hidden
  13.        state
  14. '''

  15. import numpy as np
  16. import matplotlib.pyplot as plt
  17. import matplotlib
  18. import random

  19. # plot setup
  20. matplotlib.rcParams.update({'font.size': 11})


  21. '''
  22.     setting the components of HMM
  23.     prior: a fair die is twice as likely as biased die
  24.     A :
  25.             1. Fair -> Fair: .95
  26.             2. Fair -> Biased: 1-.95 =.05
  27.             3. Biased -> Biased: .90
  28.             4. Biased -> Biased: 1-.90 =.10

  29.     B:
  30.             Pr(6)= 0.5
  31.         Pr(1)= Pr(2)= Pr(3)= Pr(4)= Pr(5)= 0.1
  32. '''

  33. prior = np.array([2.0/3,1.0/3])
  34. A = np.array([[.95,.05],[.1,.9]])
  35. B = np.array([[1.0/6 for i in range(6)],[.1,.1,.1,.1,.1,.5]])



  36. # return next state to the weighted probability array
  37. def next_state(weights):
  38.     choice = random.random() * sum(weights)
  39.     for i, w in enumerate(weights):
  40.         choice -= w
  41.         if choice < 0:
  42.             return i


  43. def create_hidden_sequence(prior, A, length):
  44.     out=[None]*length
  45.     out[0]=next_state(prior)
  46.     for i in range(1,length):
  47.         out[i]=next_state(A[out[i-1]])
  48.     return out


  49. def create_observation_sequence(hidden_sequence, B):
  50.     length=len(hidden_sequence)
  51.     out=[None]*length
  52.     for i in range(length):
  53.         out[i]=next_state(B[hidden_sequence[i]])
  54.     return out


  55. # group all contiguous values in tuple
  56. def group(L):
  57.     first = last = L[0]
  58.     for n in L[1:]:
  59.         if n - 1 == last:
  60.             last = n
  61.         else:
  62.             yield first, last
  63.             first = last = n
  64.     yield first, last


  65. # create tuples of the form (start, number_of_continuous values)
  66. def create_tuple(x):
  67.     return [(a,b-a+1) for (a,b) in x]




  68. if __name__ == '__main__':
  69.         count = 0
  70.         num_calls = 500

  71.         for i in range(num_calls):
  72.             count += next_state(prior)

  73.         print("Expected number of Fair states:", num_calls-count)
  74.         print("Expected number of Biased states:", count)

  75.     # Create the sequences
  76.         hidden = np.array(create_hidden_sequence(prior, A, num_calls))
  77.         observed = np.array(create_observation_sequence(hidden, B))
  78.         print('Observed: ', observed)
  79.         print('Hidden: ', hidden)

  80.         # Tuples of form index value, number of continuous values corresponding to Fair State
  81.         indices_hidden_fair = np.where(hidden==0)[0]
  82.         tuples_contiguous_values_fair = list(group(indices_hidden_fair))
  83.         tuples_start_break_fair = create_tuple(tuples_contiguous_values_fair)

  84.         # Tuples of form index value, number of continuous values corresponding to Biased State
  85.         indices_hidden_biased = np.where(hidden==1)[0]
  86.         tuples_contiguous_values_biased = list(group(indices_hidden_biased))
  87.         tuples_start_break_biased = create_tuple(tuples_contiguous_values_biased)

  88.         # Tuples for observations
  89.         observation_tuples=[]
  90.         for i in range(6):
  91.             observation_tuples.append(create_tuple(group(list(np.where(observed==i)[0]))))

  92.     # Make plots
  93.         plt.subplot(2,1,1)
  94.         plt.xlim((0, num_calls));
  95.         plt.title('Observations');
  96.         for i in range(6):
  97.             plt.broken_barh(observation_tuples[i],(i+0.5,1),facecolor='k');
  98.         plt.subplot(2,1,2);
  99.         plt.xlim((0, num_calls));
  100.         plt.title('Hidden States Blue:Fair, Red: Biased');
  101.         plt.broken_barh(tuples_start_break_fair,(0,1),facecolor='b');
  102.         plt.broken_barh(tuples_start_break_biased,(0,1),facecolor='r');
  103.         plt.savefig('hmm.png')
复制代码

使用道具

板凳
ReneeBK 发表于 2017-4-22 04:30:54 |只看作者 |坛友微信交流群
  1. '''
  2. Calculates the k-nearest neighbor (kNN) algorithm

  3. '''

  4. import math
  5. import numpy as np
  6. import scipy.io


  7. from calculate_cosine_distance import cosineDistance


  8. __author__ = """Mari Wahl"""



  9. ###########################################


  10. def loading_data(filename):
  11.     '''
  12.        This function load a MATLAB file and get the dict variables
  13.     '''
  14.     f = scipy.io.loadmat(filename)
  15.     traindata = f['traindata']
  16.     trainlabels = f['trainlabels']
  17.     testdata = f['testdata']
  18.     evaldata = f['evaldata']
  19.     testlabels = f['testlabels']
  20.     return traindata, trainlabels, testdata, evaldata, testlabels

  21. ###########################################


  22. def calculate_distances(trainlabels, traindata):
  23.     '''
  24.        This function calculate the distances for all the input examples
  25.     '''
  26.     distances = []
  27.     for i in range(len(trainlabels)):
  28.         first_train_example_class1 = traindata[i]
  29.         aux = []
  30.         for j in range (len(trainlabels)):
  31.              if i != j:
  32.                  first_train_example_class2 = traindata[j]
  33.                  d = cosineDistance(first_train_example_class1, first_train_example_class2)
  34.                  aux.append(d)
  35.         distances.append(aux)
  36.     return distances


  37. ###########################################

  38. def get_closest_k_points(D, k):
  39.     '''
  40.        Get the k closest points
  41.     '''
  42.     return sorted(D)[:k+1]


  43. ###########################################

  44. def get_index_vec(points_list, D):
  45.     '''
  46.        Return the index of the k closest points
  47.     '''
  48.     index_vec = []
  49.     while points_list:
  50.         point  = points_list.pop()
  51.         for j, point_D in enumerate(D):
  52.             if  np.isclose(point_D, point, rtol=1e-8, atol=1e-08, equal_nan=False) and j not in set(index_vec):       
  53.                 index_vec.append(j)
  54.                 break
  55.         
  56.     return index_vec


  57. ###########################################

  58. def knn_search(x, D, k):
  59.     '''
  60.        Find the K nearest neighbours of data among distance D
  61.     '''
  62.     points_list = get_closest_k_points(D, k)
  63.     return get_index_vec(points_list, D)



  64. ############################################

  65. def knn_classify(index_vec, trainlabels):
  66.     '''
  67.        Return the classification of the indices
  68.     '''
  69.     label1 = trainlabels[0]
  70.     label2 = None
  71.     neg, pos = 0,0

  72.     for i in index_vec[1:]:
  73.       if trainlabels[i] == label1:
  74.         pos += 1
  75.       else:
  76.         neg += 1
  77.         label2 = trainlabels[i]
  78.     if pos >= neg: return label1
  79.     else: return label2



  80. #########################################

  81. def calculate_knn(traindata, trainlabels, k):
  82.     '''
  83.        Return knn classification
  84.     '''
  85.     distances = calculate_distances(trainlabels, traindata)

  86.    
  87.     # calculate knn for the entire data
  88.     correct = 0.0
  89.     total = len(traindata)
  90.     for x in range(len(traindata)):
  91.             index_vec = knn_search(x, distances[x], k)
  92.             classification = knn_classify(index_vec, trainlabels)
  93.             if trainlabels[x] == classification[0]:
  94.                 correct += 1
  95.     return correct/total




  96. ###########################################



  97. if __name__ == '__main__':
  98.     import sys

  99.     k = sys.argv[1] if len(sys.argv) == 2 else 5
  100.     datafile = sys.argv[2] if len(sys.argv) == 3 else 'cvdataset.mat'

  101.     traindata, trainlabels, testdata, evaldata, testlabels = loading_data(datafile)
  102.    
  103.     print(calculate_knn(traindata, trainlabels, k))
  104.    
复制代码

使用道具

报纸
ReneeBK 发表于 2017-4-22 04:31:39 |只看作者 |坛友微信交流群
  1. #!/usr/bin/python
  2. # marina von steinkirch @2014
  3. # steinkirch at gmail

  4. from __future__ import division
  5. import numpy as np
  6. from scipy.sparse import csc_matrix


  7. def train_logistic_regression(features, labels, l_rate, target_delta, reg_constant):
  8.     ''' gradient ascent algorithm to train logistic regression params,
  9.         where offset and weights are the parameters '''
  10.     offset = 0
  11.     w_delta = 100000

  12.     # old_ws and ws are matrices 1x 10770 with zeros
  13.     old_ws = np.zeros((1, np.size(features, axis=1)-1))
  14.     ws =  np.zeros((1, np.size(features, axis=1)))
  15.    
  16.     # regularizer is a matrix 1x10770 multiplied by reg_constant
  17.     regularizer = np.zeros((1, np.size(features, axis=1)))*reg_constant
  18.   
  19.     ''' calculate the probability that each instance is classified as 1 '''
  20.     ''' first calculate the weight sums '''
  21.     # create a 200x1 array with the value of offset
  22.     # same as repmat(offset, size(features,1),1) in MATLAB
  23.     w_1 = np.ones((np.size(features, axis=0),1))* offset

  24.     # create a 200x10770 aarray with the value of ws
  25.     # same as repmat(ws, size(features,1),1) in MATLAB   
  26.     w_2 =   np.ones((np.size(features, axis=0), 1)) * ws

  27.     # multiply these by features (which is 200x10770)
  28.     w_3 =  (np.multiply(w_2, features.todense())).sum(1)
  29.    
  30.     # finally calculates the weight sums 200x1
  31.     w_sums = w_1 + w_3
  32.    
  33.     ''' now comput the probabilities '''
  34.     # creating logistic
  35.     den = np.exp(w_sums) + 1
  36.     num = np.exp(w_sums)
  37.     probs = num/den
  38.    
  39.     ''' calculating current ll '''
  40.     # expand 1 dim in labels to 200x1
  41.     l_2d = np.expand_dims(labels,1)
  42.     c_aux = np.multiply(w_sums[:np.size(w_sums, axis=0)-1], l_2d)
  43.     c_aux2 = np.log(1 + np.exp(w_sums[:np.size(w_sums, axis=0)-1]))
  44.     c_aux3 = (c_aux - c_aux2).sum()
  45.     c_aux4 = np.multiply(np.multiply(ws,ws), (regularizer//2))
  46.     c_aux5 = c_aux4.sum()
  47.     current_ll = c_aux3 - c_aux5

  48.     print("Training logistic regression. Initial: ", current_ll)
  49.        
  50.     '''starting iterations '''
  51.     iter_n = 0
  52.     probss = probs[:np.size(probs)-1]
  53.     featuress =  features [:np.size(features, axis=0)-1,:np.size(features, axis=1)-1]
  54.     wss= ws[:, :np.size(ws, axis=1)-1]
  55.     regularizers = regularizer[:,:np.size(regularizer, axis=1)-1]

  56.     while (w_delta > target_delta):
  57.         old_ws[:] = wss[:]

  58.         # calculating the gradient
  59.         grad_aux0 = (l_2d  - probss)
  60.         grad_aux00 = np.ones((1, np.size(features, axis=1)-1))
  61.         grad_aux = grad_aux0*grad_aux00
  62.         grad_aux1 = np.multiply(grad_aux, featuress.todense())
  63.         grad_aux2 = csc_matrix(grad_aux1.sum(0))
  64.         grad_aux3 = np.multiply(regularizers, wss)
  65.         grad_aux4 = grad_aux3[:, :np.size(grad_aux3, axis=1)-1]
  66.         grad = grad_aux2 - grad_aux3       

  67.         # magnitude limit gradient
  68.         grad = l_rate*grad
  69.         iter_n += 1

  70.         # update ws with previous labe prob
  71.         offset = offset +  l_rate*grad_aux0.sum()
  72.         wss = wss + grad

  73.         # using the current weights, calculate the proba for instance
  74.             w_1 = np.ones((np.size(featuress, axis=0),1))* offset  
  75.            w_2 =   np.ones((np.size(featuress, axis=0), 1)) * wss
  76.             w_3 =  (np.multiply(w_2, featuress.todense())).sum(1)
  77.            w_sums = w_1 + w_3

  78.         # iterating logist
  79.         den = np.exp(w_sums) + 1
  80.             num = np.exp(w_sums)
  81.             probss = num/den

  82.         # update likelihood
  83.             l_2d = np.expand_dims(labels,1)
  84.             c_aux = np.multiply(w_sums, l_2d)
  85.             c_aux2 = np.log(1 + np.exp(w_sums))
  86.             c_aux3 = (c_aux - c_aux2).sum()
  87.             c_aux4 = np.multiply(np.multiply(wss,wss), (regularizers//2))
  88.             c_aux5 = c_aux4.sum()
  89.             current_ll = c_aux3 - c_aux5
  90.        
  91.         # update weight delta
  92.         w_delta = np.sqrt((  np.multiply((old_ws - wss),(old_ws - wss)) ).sum() )

  93.         # print?
  94.           if (np.mod(iter_n, 100) == 0):
  95.                 print('Log-likelihood, weight delta: ', current_ll, w_delta)

  96.     print('Final ll:', current_ll)
  97.     return  offset, wss



  98. def run_logistic_regression(offset, wss, features):
  99.         featuress =  features [:np.size(features, axis=0)-1,:np.size(features, axis=1)-1]
  100.             # using the current weights, calculate the proba for instance
  101.             w_1 = np.ones((np.size(featuress, axis=0),1))* offset  
  102.            w_2 =   np.ones((np.size(featuress, axis=0), 1)) * wss
  103.             w_3 =  (np.multiply(w_2, featuress.todense())).sum(1)
  104.            w_sums = w_1 + w_3

  105.             posinds = (w_sums > 0).nonzero()[0]

  106.         laux = np.zeros((np.size(features, axis=0), 1))
  107.             labels = np.squeeze(np.asarray(laux))

  108.         labels[posinds] = 1
  109.             return labels
复制代码

使用道具

地板
ReneeBK 发表于 2017-4-22 04:32:13 |只看作者 |坛友微信交流群
  1. #!/usr/bin/python
  2. # marina von steinkirch @2014
  3. # steinkirch at gmail

  4. from __future__ import division
  5. import numpy as np
  6. from scipy.sparse import csc_matrix
  7. np.set_printoptions(threshold='nan')


  8. def naive_bayes(features, labels):
  9.     ''' compute the naive bayes learner '''
  10.     # count number each label for prior
  11.     neginds = (labels == 0.0).nonzero()[0]
  12.     posinds = (labels == 1.0).nonzero()[0]

  13.     a1 = len(neginds)/len(labels)
  14.     a2 = len(posinds)/len(labels)
  15.     label_counts = np.array([a1, a2]).flatten()
  16.     label_prob = np.log(label_counts)
  17.    
  18.     # dividing the feature data into pos and neg
  19.     aux_neg = features[neginds, :]
  20.     aux_pos = features[posinds, :]
  21.    
  22.     # summing over for each document (first field)
  23.     # adding smoothing, log to prevent overflow
  24.     param_neg = np.log((1 + aux_neg.sum(0)) / (1+aux_neg.sum(0)).sum())
  25.     param_pos = np.log((1 + aux_pos.sum(0)) / (1+aux_pos.sum(0)).sum())
  26.     pn = np.squeeze(np.asarray(param_neg))
  27.     pp = np.squeeze(np.asarray(param_pos))
  28.     params = [pn, pp]
  29.    
  30.     return label_prob, params


  31.      
  32. def classify_naive_bayes(params, label_probs, features):
  33.     ''' compute the naive bayes classifier '''
  34.     #create an array with zeros in the size of test (200)
  35.     laux = np.zeros((np.size(features, axis=0), 1))
  36.     labels = np.squeeze(np.asarray(laux))

  37.     # find conditional proba. for each class
  38.     # find most likely label for each instance
  39.     for i in range(np.size(features, axis=0)):
  40.             cp1 = features[i,:]*params[0]+label_probs[0]
  41.           cp2 = features[i,:]*params[1]+label_probs[1]
  42.         if cp1 > cp2:
  43.                 j = 0
  44.         else:
  45.                 j = 1

  46.         labels[i]=j
  47.     return labels
  48.    
复制代码

使用道具

7
fengyg 企业认证  发表于 2017-4-22 13:25:18 |只看作者 |坛友微信交流群
kankan

使用道具

8
franky_sas 发表于 2017-4-22 14:23:32 |只看作者 |坛友微信交流群

使用道具

9
h2h2 发表于 2017-4-22 14:33:49 |只看作者 |坛友微信交流群
谢谢分享

使用道具

10
william9225 学生认证  发表于 2017-4-22 22:35:15 来自手机 |只看作者 |坛友微信交流群
谢谢分享

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-25 19:03