[GitHub]Advanced Machine Learning - winbugs及其他软件专版

1关注
62粉丝

VIP

学术权威

14%

还不是VIP/贵宾

-

TA的文库 其他...

R资源总汇

Panel Data Analysis

Experimental Design

0%

威望: 1 级
论坛币: 49407 个
通用积分: 51.8104
学术水平: 370 点
热心指数: 273 点
信用等级: 335 点
经验: 57815 点
帖子: 4006
精华: 21
在线时间: 582 小时
注册时间: 2005-5-8
最后登录: 2023-11-26

楼主

ReneeBK 发表于 2017-4-22 04:28:45 |只看作者 |坛友微信交流群|倒序 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Advanced Machine Learning

This folder contain many algorithms that I wrote for the graduate Machine Learning class at Stony Brook University.

Algorithms:

Naive Bayes vs. Logistic Regression
Adaboost
kNN vs. SVM
Expectation Maximization
Hidden Markov Chain

It also contains the theory from mt homework. For the final project about classifying complex networks, please refer to the specific repository.

Installation$ pip install -r requirements.txt

License

When making a reference to my work, please use my twitter handle b_t_3 or my website.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

本帖隐藏的内容

https://github.com/bt3gl/Advanced-Machine-Learning

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：reference specific complex contain folder

相关帖子

使用道具举报

沙发

ReneeBK 发表于 2017-4-22 04:29:11 |只看作者 |坛友微信交流群

#!/usr/bin/python
# marina von steinkirch @2014
# steinkirch at gmail
import sys
import math
import numpy as np
class AdaBoost(object):
''' Implements adaboost with the chosen classifier '''
def __init__(self, weak_classifier):
self.WeakClassifier = weak_classifier
def ada_train(self, T, X, Y, optional=False):
''' adaboost training '''
# Defines variables
self.weak_classifier_ens = []
self.alpha = []
self.X = X
self.Y = Y
self.T = T
self.e = []
N = len(self.Y)
# Initializes with equal weigths
Z = (1.0/N)*np.ones(N)
# T iterations
for t in range(T):
# Methods are inside the decision stump class
weak_learner = self.WeakClassifier()
weak_learner.set_training_sample(X,Y)
weak_learner.weights = Z
# extra plottings for the homework
if t < 10 and optional:
print("For t = ", t+1)
print('Y= ', int(Y[t]))
opt = True
else: opt = False
# train the decision stump
weak_learner.stump_train(opt)
self.weak_classifier_ens.append(weak_learner)
# Predict so that wrong value will give more weight
Y_p= weak_learner.stump_predict(X)
# Calculates weighted training error
epsilon = sum(0.5*Z*abs((Y-Y_p)))/sum(Z)
self.e = epsilon
# Calculates alpha
inside = abs( (1-epsilon)/(epsilon*1.0)+0.00001 )
print inside
alpha = 0.5*math.log(inside)
self.alpha.append(alpha)
# Updates the weights
Z *= np.exp(-alpha*Y*Y_p)
Z /= sum(Z)
def ada_predict(self, X=[]):
''' adaboost predicting '''
if X == None: return
X = np.array(X)
N, d = X.shape
Y = np.zeros(N)
score = []
# T iterations
for t in range(self.T):
weak_learner = self.weak_classifier_ens[t]
Y += self.alpha[t]*weak_learner.stump_predict(X)
score.append(np.sign(Y))
return score
def run_adaboost(self, X_train, Y_train, T, X_test=None, optional=False):
''' test in training and test '''
self.ada_train(T, X_train, Y_train, optional)
return self.ada_predict(X_train), self.ada_predict(X_test)

复制代码

使用道具举报

藤椅

ReneeBK 发表于 2017-4-22 04:30:07 |只看作者 |坛友微信交流群

'''
based on: http://nipunbatra.github.io/2013/05/simulating-a-discrete-hidden-markov-model/
Unfair cassino problem: there may be two die, one fair and other biased.
The biased die is much more likely to produce a 6 than others (p=0.5).
The observer is only able to observe the values of die being thrown without
having a knowledge whether a fair or biased die were used:
observed states: 1 to 6 on the die faces
hidden states: fair or biased die
prior: probability that the first thworn is made from a fair or biased die
transition matrix A: matrix enconding the prob of the 4 possible transition
between fair and biased die
emission matrix B: matrix enconding the prob of an obsevation given the hidden
state
'''
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import random
# plot setup
matplotlib.rcParams.update({'font.size': 11})
'''
setting the components of HMM
prior: a fair die is twice as likely as biased die
A :
1. Fair -> Fair: .95
2. Fair -> Biased: 1-.95 =.05
3. Biased -> Biased: .90
4. Biased -> Biased: 1-.90 =.10
B:
Pr(6)= 0.5
Pr(1)= Pr(2)= Pr(3)= Pr(4)= Pr(5)= 0.1
'''
prior = np.array([2.0/3,1.0/3])
A = np.array([[.95,.05],[.1,.9]])
B = np.array([[1.0/6 for i in range(6)],[.1,.1,.1,.1,.1,.5]])
# return next state to the weighted probability array
def next_state(weights):
choice = random.random() * sum(weights)
for i, w in enumerate(weights):
choice -= w
if choice < 0:
return i
def create_hidden_sequence(prior, A, length):
out=[None]*length
out[0]=next_state(prior)
for i in range(1,length):
out[i]=next_state(A[out[i-1]])
return out
def create_observation_sequence(hidden_sequence, B):
length=len(hidden_sequence)
out=[None]*length
for i in range(length):
out[i]=next_state(B[hidden_sequence[i]])
return out
# group all contiguous values in tuple
def group(L):
first = last = L[0]
for n in L[1:]:
if n - 1 == last:
last = n
else:
yield first, last
first = last = n
yield first, last
# create tuples of the form (start, number_of_continuous values)
def create_tuple(x):
return [(a,b-a+1) for (a,b) in x]
if __name__ == '__main__':
count = 0
num_calls = 500
for i in range(num_calls):
count += next_state(prior)
print("Expected number of Fair states:", num_calls-count)
print("Expected number of Biased states:", count)
# Create the sequences
hidden = np.array(create_hidden_sequence(prior, A, num_calls))
observed = np.array(create_observation_sequence(hidden, B))
print('Observed: ', observed)
print('Hidden: ', hidden)
# Tuples of form index value, number of continuous values corresponding to Fair State
indices_hidden_fair = np.where(hidden==0)[0]
tuples_contiguous_values_fair = list(group(indices_hidden_fair))
tuples_start_break_fair = create_tuple(tuples_contiguous_values_fair)
# Tuples of form index value, number of continuous values corresponding to Biased State
indices_hidden_biased = np.where(hidden==1)[0]
tuples_contiguous_values_biased = list(group(indices_hidden_biased))
tuples_start_break_biased = create_tuple(tuples_contiguous_values_biased)
# Tuples for observations
observation_tuples=[]
for i in range(6):
observation_tuples.append(create_tuple(group(list(np.where(observed==i)[0]))))
# Make plots
plt.subplot(2,1,1)
plt.xlim((0, num_calls));
plt.title('Observations');
for i in range(6):
plt.broken_barh(observation_tuples[i],(i+0.5,1),facecolor='k');
plt.subplot(2,1,2);
plt.xlim((0, num_calls));
plt.title('Hidden States Blue:Fair, Red: Biased');
plt.broken_barh(tuples_start_break_fair,(0,1),facecolor='b');
plt.broken_barh(tuples_start_break_biased,(0,1),facecolor='r');
plt.savefig('hmm.png')

复制代码

使用道具举报

板凳

ReneeBK 发表于 2017-4-22 04:30:54 |只看作者 |坛友微信交流群

'''
Calculates the k-nearest neighbor (kNN) algorithm
'''
import math
import numpy as np
import scipy.io
from calculate_cosine_distance import cosineDistance
__author__ = """Mari Wahl"""
###########################################
def loading_data(filename):
'''
This function load a MATLAB file and get the dict variables
'''
f = scipy.io.loadmat(filename)
traindata = f['traindata']
trainlabels = f['trainlabels']
testdata = f['testdata']
evaldata = f['evaldata']
testlabels = f['testlabels']
return traindata, trainlabels, testdata, evaldata, testlabels
###########################################
def calculate_distances(trainlabels, traindata):
'''
This function calculate the distances for all the input examples
'''
distances = []
for i in range(len(trainlabels)):
first_train_example_class1 = traindata[i]
aux = []
for j in range (len(trainlabels)):
if i != j:
first_train_example_class2 = traindata[j]
d = cosineDistance(first_train_example_class1, first_train_example_class2)
aux.append(d)
distances.append(aux)
return distances
###########################################
def get_closest_k_points(D, k):
'''
Get the k closest points
'''
return sorted(D)[:k+1]
###########################################
def get_index_vec(points_list, D):
'''
Return the index of the k closest points
'''
index_vec = []
while points_list:
point = points_list.pop()
for j, point_D in enumerate(D):
if np.isclose(point_D, point, rtol=1e-8, atol=1e-08, equal_nan=False) and j not in set(index_vec):
index_vec.append(j)
break
return index_vec
###########################################
def knn_search(x, D, k):
'''
Find the K nearest neighbours of data among distance D
'''
points_list = get_closest_k_points(D, k)
return get_index_vec(points_list, D)
############################################
def knn_classify(index_vec, trainlabels):
'''
Return the classification of the indices
'''
label1 = trainlabels[0]
label2 = None
neg, pos = 0,0
for i in index_vec[1:]:
if trainlabels[i] == label1:
pos += 1
else:
neg += 1
label2 = trainlabels[i]
if pos >= neg: return label1
else: return label2
#########################################
def calculate_knn(traindata, trainlabels, k):
'''
Return knn classification
'''
distances = calculate_distances(trainlabels, traindata)
# calculate knn for the entire data
correct = 0.0
total = len(traindata)
for x in range(len(traindata)):
index_vec = knn_search(x, distances[x], k)
classification = knn_classify(index_vec, trainlabels)
if trainlabels[x] == classification[0]:
correct += 1
return correct/total
###########################################
if __name__ == '__main__':
import sys
k = sys.argv[1] if len(sys.argv) == 2 else 5
datafile = sys.argv[2] if len(sys.argv) == 3 else 'cvdataset.mat'
traindata, trainlabels, testdata, evaldata, testlabels = loading_data(datafile)
print(calculate_knn(traindata, trainlabels, k))

复制代码

使用道具举报

报纸

ReneeBK 发表于 2017-4-22 04:31:39 |只看作者 |坛友微信交流群

#!/usr/bin/python
# marina von steinkirch @2014
# steinkirch at gmail
from __future__ import division
import numpy as np
from scipy.sparse import csc_matrix
def train_logistic_regression(features, labels, l_rate, target_delta, reg_constant):
''' gradient ascent algorithm to train logistic regression params,
where offset and weights are the parameters '''
offset = 0
w_delta = 100000
# old_ws and ws are matrices 1x 10770 with zeros
old_ws = np.zeros((1, np.size(features, axis=1)-1))
ws = np.zeros((1, np.size(features, axis=1)))
# regularizer is a matrix 1x10770 multiplied by reg_constant
regularizer = np.zeros((1, np.size(features, axis=1)))*reg_constant
''' calculate the probability that each instance is classified as 1 '''
''' first calculate the weight sums '''
# create a 200x1 array with the value of offset
# same as repmat(offset, size(features,1),1) in MATLAB
w_1 = np.ones((np.size(features, axis=0),1))* offset
# create a 200x10770 aarray with the value of ws
# same as repmat(ws, size(features,1),1) in MATLAB
w_2 = np.ones((np.size(features, axis=0), 1)) * ws
# multiply these by features (which is 200x10770)
w_3 = (np.multiply(w_2, features.todense())).sum(1)
# finally calculates the weight sums 200x1
w_sums = w_1 + w_3
''' now comput the probabilities '''
# creating logistic
den = np.exp(w_sums) + 1
num = np.exp(w_sums)
probs = num/den
''' calculating current ll '''
# expand 1 dim in labels to 200x1
l_2d = np.expand_dims(labels,1)
c_aux = np.multiply(w_sums[:np.size(w_sums, axis=0)-1], l_2d)
c_aux2 = np.log(1 + np.exp(w_sums[:np.size(w_sums, axis=0)-1]))
c_aux3 = (c_aux - c_aux2).sum()
c_aux4 = np.multiply(np.multiply(ws,ws), (regularizer//2))
c_aux5 = c_aux4.sum()
current_ll = c_aux3 - c_aux5
print("Training logistic regression. Initial: ", current_ll)
'''starting iterations '''
iter_n = 0
probss = probs[:np.size(probs)-1]
featuress = features [:np.size(features, axis=0)-1,:np.size(features, axis=1)-1]
wss= ws[:, :np.size(ws, axis=1)-1]
regularizers = regularizer[:,:np.size(regularizer, axis=1)-1]
while (w_delta > target_delta):
old_ws[:] = wss[:]
# calculating the gradient
grad_aux0 = (l_2d - probss)
grad_aux00 = np.ones((1, np.size(features, axis=1)-1))
grad_aux = grad_aux0*grad_aux00
grad_aux1 = np.multiply(grad_aux, featuress.todense())
grad_aux2 = csc_matrix(grad_aux1.sum(0))
grad_aux3 = np.multiply(regularizers, wss)
grad_aux4 = grad_aux3[:, :np.size(grad_aux3, axis=1)-1]
grad = grad_aux2 - grad_aux3
# magnitude limit gradient
grad = l_rate*grad
iter_n += 1
# update ws with previous labe prob
offset = offset + l_rate*grad_aux0.sum()
wss = wss + grad
# using the current weights, calculate the proba for instance
w_1 = np.ones((np.size(featuress, axis=0),1))* offset
w_2 = np.ones((np.size(featuress, axis=0), 1)) * wss
w_3 = (np.multiply(w_2, featuress.todense())).sum(1)
w_sums = w_1 + w_3
# iterating logist
den = np.exp(w_sums) + 1
num = np.exp(w_sums)
probss = num/den
# update likelihood
l_2d = np.expand_dims(labels,1)
c_aux = np.multiply(w_sums, l_2d)
c_aux2 = np.log(1 + np.exp(w_sums))
c_aux3 = (c_aux - c_aux2).sum()
c_aux4 = np.multiply(np.multiply(wss,wss), (regularizers//2))
c_aux5 = c_aux4.sum()
current_ll = c_aux3 - c_aux5
# update weight delta
w_delta = np.sqrt(( np.multiply((old_ws - wss),(old_ws - wss)) ).sum() )
# print?
if (np.mod(iter_n, 100) == 0):
print('Log-likelihood, weight delta: ', current_ll, w_delta)
print('Final ll:', current_ll)
return offset, wss
def run_logistic_regression(offset, wss, features):
featuress = features [:np.size(features, axis=0)-1,:np.size(features, axis=1)-1]
# using the current weights, calculate the proba for instance
w_1 = np.ones((np.size(featuress, axis=0),1))* offset
w_2 = np.ones((np.size(featuress, axis=0), 1)) * wss
w_3 = (np.multiply(w_2, featuress.todense())).sum(1)
w_sums = w_1 + w_3
posinds = (w_sums > 0).nonzero()[0]
laux = np.zeros((np.size(features, axis=0), 1))
labels = np.squeeze(np.asarray(laux))
labels[posinds] = 1
return labels

复制代码

使用道具举报

地板

ReneeBK 发表于 2017-4-22 04:32:13 |只看作者 |坛友微信交流群

#!/usr/bin/python
# marina von steinkirch @2014
# steinkirch at gmail
from __future__ import division
import numpy as np
from scipy.sparse import csc_matrix
np.set_printoptions(threshold='nan')
def naive_bayes(features, labels):
''' compute the naive bayes learner '''
# count number each label for prior
neginds = (labels == 0.0).nonzero()[0]
posinds = (labels == 1.0).nonzero()[0]
a1 = len(neginds)/len(labels)
a2 = len(posinds)/len(labels)
label_counts = np.array([a1, a2]).flatten()
label_prob = np.log(label_counts)
# dividing the feature data into pos and neg
aux_neg = features[neginds, :]
aux_pos = features[posinds, :]
# summing over for each document (first field)
# adding smoothing, log to prevent overflow
param_neg = np.log((1 + aux_neg.sum(0)) / (1+aux_neg.sum(0)).sum())
param_pos = np.log((1 + aux_pos.sum(0)) / (1+aux_pos.sum(0)).sum())
pn = np.squeeze(np.asarray(param_neg))
pp = np.squeeze(np.asarray(param_pos))
params = [pn, pp]
return label_prob, params
def classify_naive_bayes(params, label_probs, features):
''' compute the naive bayes classifier '''
#create an array with zeros in the size of test (200)
laux = np.zeros((np.size(features, axis=0), 1))
labels = np.squeeze(np.asarray(laux))
# find conditional proba. for each class
# find most likely label for each instance
for i in range(np.size(features, axis=0)):
cp1 = features[i,:]*params[0]+label_probs[0]
cp2 = features[i,:]*params[1]+label_probs[1]
if cp1 > cp2:
j = 0
else:
j = 1
labels[i]=j
return labels

复制代码

使用道具举报

7楼

fengyg

发表于 2017-4-22 13:25:18 |只看作者 |坛友微信交流群

kankan

使用道具举报

8楼

franky_sas 发表于 2017-4-22 14:23:32 |只看作者 |坛友微信交流群

使用道具举报

9楼

h2h2 发表于 2017-4-22 14:33:49 |只看作者 |坛友微信交流群

谢谢分享

使用道具举报

10楼

william9225

发表于 2017-4-22 22:35:15 来自手机 |只看作者 |坛友微信交流群

谢谢分享

使用道具举报

[GitHub]Advanced Machine Learning [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我拉你入群

相关帖子

本版微信群

[GitHub]Advanced Machine Learning [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

本帖隐藏的内容

扫码加我 拉你入群

相关帖子

本版微信群

扫码加我拉你入群