楼主: ReneeBK
2495 15

[GitHub]Machine Learning for the Web [推广有奖]

  • 1关注
  • 62粉丝

VIP

已卖:4897份资源

学术权威

14%

还不是VIP/贵宾

-

TA的文库  其他...

R资源总汇

Panel Data Analysis

Experimental Design

威望
1
论坛币
49635 个
通用积分
55.6937
学术水平
370 点
热心指数
273 点
信用等级
335 点
经验
57805 点
帖子
4005
精华
21
在线时间
582 小时
注册时间
2005-5-8
最后登录
2023-11-26

楼主
ReneeBK 发表于 2017-5-20 19:30:44 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
  1. Machine Learning For The Web

  2. Each folder contains the codes discussed in each chapter of the book
复制代码

本帖隐藏的内容

Machine Learning for the Web.zip (99.55 MB)


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Learning machine earning GitHub Learn

本帖被以下文库推荐

沙发
ReneeBK 发表于 2017-5-20 19:31:39
  1. import numpy as np
  2. from sklearn import mixture
  3. from scipy.cluster.hierarchy import dendrogram, linkage
  4. from scipy.cluster.hierarchy import fcluster
  5. from sklearn.cluster import KMeans
  6. from sklearn.cluster import MeanShift
  7. from matplotlib import pyplot as plt
  8. # generate two clusters: a with 100 points, b with 50:
  9. np.random.seed(4711)  # for repeatability
  10. c1 = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[100,])
  11. l1 = np.zeros(100)
  12. l2 = np.ones(100)
  13. c2 = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[100,])

  14. print c1.shape
  15. #add noise:
  16. np.random.seed(1)  # for repeatability
  17. noise1x = np.random.normal(0,2,100)
  18. noise1y = np.random.normal(0,8,100)
  19. noise2 = np.random.normal(0,8,100)
  20. c1[:,0] += noise1x
  21. c1[:,1] += noise1y
  22. c2[:,1] += noise2

  23. #
  24. fig = plt.figure(figsize=(20,15))
  25. ax = fig.add_subplot(111)
  26. ax.set_xlabel('x',fontsize=30)
  27. ax.set_ylabel('y',fontsize=30)
  28. fig.suptitle('classes',fontsize=30)
  29. labels = np.concatenate((l1,l2),)
  30. X = np.concatenate((c1, c2),)
  31. pp1= ax.scatter(c1[:,0], c1[:,1],cmap='prism',s=50,color='r')
  32. pp2= ax.scatter(c2[:,0], c2[:,1],cmap='prism',s=50,color='g')
  33. ax.legend((pp1,pp2),('class 1', 'class2'),fontsize=35)
  34. fig.savefig('classes.png')


  35. #start figure
  36. fig.clf()#reset plt
  37. fig, ((axis1, axis2), (axis3, axis4)) = plt.subplots(2, 2, sharex='col', sharey='row')

  38. #k-means
  39. kmeans = KMeans(n_clusters=2)
  40. kmeans.fit(X)
  41. pred_kmeans = kmeans.labels_
  42. #axis1 = fig.add_subplot(211)
  43. print 'kmeans:',np.unique(kmeans.labels_)
  44. print 'kmeans:',homogeneity_completeness_v_measure(labels,pred_kmeans)
  45. plt.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='prism')  # plot points with cluster dependent colors
  46. axis1.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='prism')
  47. #axis1.set_xlabel('x',fontsize=40)
  48. axis1.set_ylabel('y',fontsize=40)
  49. axis1.set_title('k-means',fontsize=20)
  50. #plt.show()


  51. #mean-shift
  52. ms = MeanShift(bandwidth=7)
  53. ms.fit(X)
  54. pred_ms = ms.labels_
  55. axis2.scatter(X[:,0], X[:,1], c=pred_ms, cmap='prism')
  56. axis2.set_title('mean-shift',fontsize=20)

  57. print 'ms:',homogeneity_completeness_v_measure(labels,pred_ms)
  58. print 'ms:',np.unique(ms.labels_)

  59. #gaussian mixture
  60. g = mixture.GMM(n_components=2)
  61. g.fit(X)
  62. print g.means_
  63. pred_gmm = g.predict(X)
  64. print 'gmm:',homogeneity_completeness_v_measure(labels,pred_gmm)
  65. axis3.scatter(X[:,0], X[:,1], c=pred_gmm, cmap='prism')
  66. axis3.set_xlabel('x',fontsize=40)
  67. axis3.set_ylabel('y',fontsize=40)
  68. axis3.set_title('gaussian mixture',fontsize=20)



  69. #hierarchical
  70. # generate the linkage matrix
  71. Z = linkage(X, 'ward')
  72. max_d = 20
  73. pred_h = fcluster(Z, max_d, criterion='distance')
  74. print 'clusters:',np.unique(pred_h)
  75. k=2
  76. fcluster(Z, k, criterion='maxclust')
  77. print 'h:',homogeneity_completeness_v_measure(labels,pred_h)
  78. axis4.scatter(X[:,0], X[:,1], c=pred_h, cmap='prism')
  79. axis4.set_xlabel('x',fontsize=40)
  80. #axis4.set_ylabel('y',fontsize=40)
  81. axis4.set_title('hierarchical ward',fontsize=20)

  82. fig.set_size_inches(18.5,10.5)
  83. fig.savefig('comp_clustering.png', dpi=100)

  84. fig.clf()#reset plt
  85. fig = plt.figure(figsize=(20,15))
  86. plt.title('Hierarchical Clustering Dendrogram',fontsize=30)
  87. plt.xlabel('data point index (or cluster index)',fontsize=30)
  88. plt.ylabel('distance (ward)',fontsize=30)
  89. dendrogram(
  90.     Z,
  91.     truncate_mode='lastp',  # show only the last p merged clusters
  92.     p=12,  # show only the last p merged clusters
  93.     leaf_rotation=90.,
  94.     leaf_font_size=12.,
  95.     show_contracted=True,  # to get a distribution impression in truncated branches
  96. )
  97. fig.savefig('dendrogram.png')
  98. #plt.show()


  99. #measures:
  100. from sklearn.metrics import homogeneity_completeness_v_measure
  101. from sklearn.metrics import silhouette_score

  102. res = homogeneity_completeness_v_measure(labels,pred_kmeans)
  103. print 'kmeans measures, homogeneity:',res[0],' completeness:',res[1],' v-measure:',res[2],' silhouette score:',silhouette_score(X,pred_kmeans)
  104. res = homogeneity_completeness_v_measure(labels,pred_ms)
  105. print 'mean-shift measures, homogeneity:',res[0],' completeness:',res[1],' v-measure:',res[2],' silhouette score:',silhouette_score(X,pred_ms)
  106. res = homogeneity_completeness_v_measure(labels,pred_gmm)
  107. print 'gaussian mixture model measures, homogeneity:',res[0],' completeness:',res[1],' v-measure:',res[2],' silhouette score:',silhouette_score(X,pred_gmm)
  108. res = homogeneity_completeness_v_measure(labels,pred_h)
  109. print 'hierarchical (ward) measures, homogeneity:',res[0],' completeness:',res[1],' v-measure:',res[2],' silhouette score:',silhouette_score(X,pred_h)
复制代码

藤椅
ReneeBK 发表于 2017-5-20 19:32:05
  1. import numpy as np
  2. from matplotlib import pyplot as plt

  3. #line y = 2*x
  4. x = np.arange(1,101,1).astype(float)
  5. y = 5*np.arange(1,101,1).astype(float)
  6. #add noise
  7. noise = np.random.normal(0, 10, 100)
  8. y += noise

  9. fig = plt.figure(figsize=(10,10))
  10. #plot
  11. plt.plot(x,y,'ro')
  12. plt.axis([0,102, -20,220])
  13. plt.quiver(60, 100,10-0, 20-0, scale_units='xy', scale=1)
  14. plt.arrow(60, 100,10-0, 20-0,head_width=2.5, head_length=2.5, fc='k', ec='k')
  15. plt.text(70, 110, r'$v^1$', fontsize=20)
  16. #plt.show()

  17. #save

  18. ax = fig.add_subplot(111)
  19. ax.axis([0,102, -20,220])
  20. ax.set_xlabel('x',fontsize=40)
  21. ax.set_ylabel('y',fontsize=40)
  22. fig.suptitle('2 dimensional dataset',fontsize=40)
  23. fig.savefig('pca_data.png')


  24. #calc PCA
  25. mean_x = np.mean(x)
  26. mean_y = np.mean(y)
  27. mean_vector = np.array([[mean_x],[mean_y]])
  28. u_x = (x- mean_x)/np.std(x)
  29. u_y = (y-mean_y)/np.std(y)
  30. sigma = np.cov([u_x,u_y])
  31. print sigma
  32. eig_vals, eig_vecs = np.linalg.eig(sigma)

  33. eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i])
  34.              for i in range(len(eig_vals))]
  35.             
  36. eig_pairs.sort()
  37. eig_pairs.reverse()
  38. print eig_pairs
  39. v1 = eig_pairs[0][1]
  40. #leading eigenvector:
  41. x_v1 = v1[0]*np.std(x)+mean_x
  42. y_v1 = v1[1]*np.std(y)+mean_y
  43. print x_v1,'-',y_v1,'slope:',(y_v1)/(x_v1)

  44. from sklearn.decomposition import PCA
  45. #X = np.array([x,y])
  46. X = np.array([u_x,u_y])
  47. X = X.T
  48. #print X
  49. pca = PCA(n_components=1)
  50. pca.fit(X)
  51. V = pca.components_
  52. print V,'-',V[0][1]/V[0][0]
  53. #transform in reduced space
  54. X_red_sklearn = pca.fit_transform(X)
  55. print X_red_sklearn.shape

  56. W = np.array(v1.reshape(2,1))
  57. X_red = W.T.dot(X.T)
  58. #check the reduced matrices are equal
  59. assert X_red.T.all() == X_red_sklearn.all(), 'problem with the pca algorithm'
  60. print X_red.T[0],'-',X_red_sklearn[0]
复制代码

板凳
ReneeBK 发表于 2017-5-20 19:32:35
  1. import numpy as np
  2. from copy import copy

  3. class HMM:
  4.     def __init__(self,pi,A,B):
  5.         self.pi = pi
  6.         self.A = A
  7.         self.B = B
  8.         
  9.     def MostLikelyStateSequence(self,observations):
  10.         
  11.         #calc combinations:
  12.         N = self.A.shape[0]
  13.         T = len(observations)
  14.         sequences = [str(i) for i in range(N)]
  15.         probs = np.array([self.pi[i]*self.B[i,observations[0]] for i in range(N)])
  16.         print probs
  17.         for i in range(1,T):
  18.             newsequences = []
  19.             newprobs = np.array([])
  20.             for s in range(len(sequences)):
  21.                 for j in range(N):
  22.                     newsequences.append(sequences[s]+str(j))
  23.                     bef = int(sequences[s][-1])
  24.                     tTpprob = probs[s]*self.A[bef,j]*self.B[j,observations[i]]
  25.                     newprobs = np.append(newprobs,[tTpprob])
  26.                     print sequences[s]+str(j),'-',tTpprob
  27.             sequences = newsequences
  28.             probs = newprobs
  29.         return max((probs[i],sequences[i]) for i in range(len(sequences)))
  30.             
  31.     def ViterbiSequence(self,observations):
  32.         deltas = [{}]
  33.         seq = {}
  34.         N = self.A.shape[0]
  35.         states = [i for i in range(N)]
  36.         T = len(observations)
  37.         #initialization
  38.         for s in states:
  39.             deltas[0][s] = self.pi[s]*self.B[s,observations[0]]
  40.             seq[s] = [s]
  41.         #compute Viterbi
  42.         for t in range(1,T):
  43.             deltas.append({})
  44.             newseq = {}
  45.             for s in states:
  46.                 (delta,state) = max((deltas[t-1][s0]*self.A[s0,s]*self.B[s,observations[t]],s0) for s0 in states)
  47.                 deltas[t][s] = delta
  48.                 newseq[s] = seq[state] + [s]
  49.             seq = newseq
  50.             
  51.         (delta,state) = max((deltas[T-1][s],s) for s in states)
  52.         return  delta,' sequence: ', seq[state]
  53.         
  54.     def maxProbSequence(self,observations):
  55.         N = self.A.shape[0]
  56.         states = [i for i in range(N)]
  57.         T = len(observations)
  58.         M = self.B.shape[1]
  59.         # alpha_t(i) = P(O_1 O_2 ... O_t, q_t = S_i | hmm)
  60.         # Initialize alpha
  61.         alpha = np.zeros((N,T))
  62.         c = np.zeros(T) #scale factors
  63.         alpha[:,0] = pi.T * self.B[:,observations[0]]
  64.         c[0] = 1.0/np.sum(alpha[:,0])
  65.         alpha[:,0] = c[0] * alpha[:,0]
  66.         # Update alpha for each observation step
  67.         for t in range(1,T):
  68.             alpha[:,t] = np.dot(alpha[:,t-1].T, self.A).T * self.B[:,observations[t]]
  69.             c[t] = 1.0/np.sum(alpha[:,t])
  70.             alpha[:,t] = c[t] * alpha[:,t]

  71.         # beta_t(i) = P(O_t+1 O_t+2 ... O_T | q_t = S_i , hmm)
  72.         # Initialize beta
  73.         beta = np.zeros((N,T))
  74.         beta[:,T-1] = 1
  75.         beta[:,T-1] = c[T-1] * beta[:,T-1]
  76.         # Update beta backwards froT end of sequence
  77.         for t in range(len(observations)-1,0,-1):
  78.             beta[:,t-1] = np.dot(self.A, (self.B[:,observations[t]] * beta[:,t]))
  79.             beta[:,t-1] = c[t-1] * beta[:,t-1]
  80.             
  81.         norm = np.sum(alpha[:,T-1])
  82.         seq = ''
  83.         for t in range(T):
  84.             g,state = max(((beta[i,t]*alpha[i,t])/norm,i) for i in states)
  85.             seq +=str(state)
  86.             
  87.         return seq
  88.         
  89.     def simulate(self,time):

  90.         def drawFromNormal(probs):
  91.             return np.where(np.random.multinomial(1,probs) == 1)[0][0]

  92.         observations = np.zeros(time)
  93.         states = np.zeros(time)
  94.         states[0] = drawFromNormal(self.pi)
  95.         observations[0] = drawFromNormal(self.B[states[0],:])
  96.         for t in range(1,time):
  97.             states[t] = drawFromNormal(self.A[states[t-1],:])
  98.             observations[t] = drawFromNormal(self.B[states[t],:])
  99.         return observations,states


  100.     def train(self,observations,criterion):

  101.         N = self.A.shape[0]
  102.         T = len(observations)
  103.         M = self.B.shape[1]

  104.         A = self.A
  105.         B = self.B
  106.         pi = copy(self.pi)
  107.         
  108.         convergence = False
  109.         while not convergence:

  110.             # alpha_t(i) = P(O_1 O_2 ... O_t, q_t = S_i | hmm)
  111.             # Initialize alpha
  112.             alpha = np.zeros((N,T))
  113.             c = np.zeros(T) #scale factors
  114.             alpha[:,0] = pi.T * self.B[:,observations[0]]
  115.             c[0] = 1.0/np.sum(alpha[:,0])
  116.             alpha[:,0] = c[0] * alpha[:,0]
  117.             # Update alpha for each observation step
  118.             for t in range(1,T):
  119.                 alpha[:,t] = np.dot(alpha[:,t-1].T, self.A).T * self.B[:,observations[t]]
  120.                 c[t] = 1.0/np.sum(alpha[:,t])
  121.                 alpha[:,t] = c[t] * alpha[:,t]

  122.             #P(O=O_0,O_1,...,O_T-1 | hmm)
  123.             P_O = np.sum(alpha[:,T-1])
  124.             # beta_t(i) = P(O_t+1 O_t+2 ... O_T | q_t = S_i , hmm)
  125.             # Initialize beta
  126.             beta = np.zeros((N,T))
  127.             beta[:,T-1] = 1
  128.             beta[:,T-1] = c[T-1] * beta[:,T-1]
  129.             # Update beta backwards froT end of sequence
  130.             for t in range(len(observations)-1,0,-1):
  131.                 beta[:,t-1] = np.dot(self.A, (self.B[:,observations[t]] * beta[:,t]))
  132.                 beta[:,t-1] = c[t-1] * beta[:,t-1]

  133.             gi = np.zeros((N,N,T-1));

  134.             for t in range(T-1):
  135.                 for i in range(N):
  136.                     
  137.                     gamma_num = alpha[i,t] * self.A[i,:] * self.B[:,observations[t+1]].T * \
  138.                             beta[:,t+1].T
  139.                     gi[i,:,t] = gamma_num / P_O
  140.   
  141.             # gamma_t(i) = P(q_t = S_i | O, hmm)
  142.             gamma = np.squeeze(np.sum(gi,axis=1))
  143.             # Need final gamma element for new B
  144.             prod =  (alpha[:,T-1] * beta[:,T-1]).reshape((-1,1))
  145.             gamma_T = prod/P_O
  146.             gamma = np.hstack((gamma,  gamma_T)) #append one Tore to gamma!!!

  147.             newpi = gamma[:,0]
  148.             newA = np.sum(gi,2) / np.sum(gamma[:,:-1],axis=1).reshape((-1,1))
  149.             newB = copy(B)
  150.             
  151.             sumgamma = np.sum(gamma,axis=1)
  152.             for ob_k in range(M):
  153.                 list_k = observations == ob_k
  154.                 newB[:,ob_k] = np.sum(gamma[:,list_k],axis=1) / sumgamma

  155.             if np.max(abs(pi - newpi)) < criterion and \
  156.                    np.max(abs(A - newA)) < criterion and \
  157.                    np.max(abs(B - newB)) < criterion:
  158.                 convergence = True;
  159.   
  160.             A[:],B[:],pi[:] = newA,newB,newpi

  161.         self.A[:] = newA
  162.         self.B[:] = newB
  163.         self.pi[:] = newpi
  164.         self.gamma = gamma
  165.         

  166. if __name__ == '__main__':
  167.       
  168.     pi = np.array([0.6, 0.4])
  169.     A = np.array([[0.7, 0.3],
  170.                            [0.6, 0.4]])
  171.     B = np.array([[0.7, 0.1, 0.2],
  172.                            [0.1, 0.6, 0.3]])
  173.     hmmguess = HMM(pi,A,B)
  174.     print 'Viterbi sequence:',hmmguess.ViterbiSequence(np.array([0,1,0,2]))
  175.     print 'max prob sequence:',hmmguess.maxProbSequence(np.array([0,1,0,2]))   
  176.     #obs,states = hmmguess.simulate(4)
  177.    
  178.     hmmguess.train(np.array([0,1,0,2]),0.000001)

  179.     print 'Estimated initial probabilities:',hmmguess.pi

  180.     print 'Estimated state transition probabililities:',hmmguess.A

  181.     print 'Estimated observation probabililities:',hmmguess.B
复制代码

报纸
ReneeBK 发表于 2017-5-20 19:37:28
  1. import pandas as pd
  2. import numpy as np
  3. from sklearn import cross_validation
  4. from sklearn import svm
  5. from sklearn.tree import DecisionTreeClassifier
  6. from sklearn.ensemble import RandomForestClassifier
  7. from sklearn.naive_bayes import MultinomialNB
  8. from sklearn.linear_model import LogisticRegression
  9. from sklearn.neighbors import KNeighborsClassifier
  10. from sklearn.metrics import f1_score
  11. from sklearn.metrics import precision_score
  12. from sklearn.metrics import recall_score

  13. #read data in
  14. df = pd.read_csv('data_cars.csv',header=None)
  15. for i in range(len(df.columns)):
  16.     df[i] = df[i].astype('category')
  17. df.head()

  18. #map catgories to values
  19. map0 = dict( zip( df[0].cat.categories, range( len(df[0].cat.categories ))))
  20. #print map0
  21. map1 = dict( zip( df[1].cat.categories, range( len(df[1].cat.categories ))))
  22. map2 = dict( zip( df[2].cat.categories, range( len(df[2].cat.categories ))))
  23. map3 = dict( zip( df[3].cat.categories, range( len(df[3].cat.categories ))))
  24. map4 = dict( zip( df[4].cat.categories, range( len(df[4].cat.categories ))))
  25. map5 = dict( zip( df[5].cat.categories, range( len(df[5].cat.categories ))))
  26. map6 = dict( zip( df[6].cat.categories, range( len(df[6].cat.categories ))))

  27. cat_cols = df.select_dtypes(['category']).columns
  28. df[cat_cols] = df[cat_cols].apply(lambda x: x.cat.codes)

  29. df = df.iloc[np.random.permutation(len(df))]
  30. print df.head()

  31. cv = 10
  32. method = 'linear support vector machine'
  33. clf = svm.SVC(kernel='linear',C=50)
  34. y_pred = cross_validation.cross_val_predict(clf, X,Y, cv=cv)
  35. CalcMeasures(method,y_pred,Y)

  36. method = 'rbf support vector machine'
  37. clf = svm.SVC(kernel='rbf',C=50)
  38. y_pred = cross_validation.cross_val_predict(clf, X,Y, cv=cv)
  39. CalcMeasures(method,y_pred,Y)

  40. method = 'poly support vector machine'
  41. clf = svm.SVC(kernel='poly',C=50)
  42. y_pred = cross_validation.cross_val_predict(clf, X,Y, cv=cv)
  43. CalcMeasures(method,y_pred,Y)

  44. method = 'decision tree'
  45. clf = DecisionTreeClassifier(random_state=0)
  46. y_pred = cross_validation.cross_val_predict(clf, X,Y, cv=cv)
  47. CalcMeasures(method,y_pred,Y)

  48. method = 'random forest'
  49. clf = RandomForestClassifier(n_estimators=50,random_state=0,max_features=None)
  50. y_pred = cross_validation.cross_val_predict(clf, X,Y, cv=cv)
  51. CalcMeasures(method,y_pred,Y)

  52. method = 'naive bayes'
  53. clf = MultinomialNB()
  54. y_pred = cross_validation.cross_val_predict(clf, X,Y, cv=cv)
  55. CalcMeasures(method,y_pred,Y)

  56. method = 'logistic regression'
  57. clf = LogisticRegression()
  58. y_pred = cross_validation.cross_val_predict(clf, X,Y, cv=cv)
  59. CalcMeasures(method,y_pred,Y)

  60. method = 'k nearest neighbours'
  61. clf = KNeighborsClassifier(weights='distance',n_neighbors=5)
  62. y_pred = cross_validation.cross_val_predict(clf, X,Y, cv=cv)
  63. CalcMeasures(method,y_pred,Y)
复制代码

地板
ReneeBK 发表于 2017-5-20 19:39:47
  1. In [1]:
  2. import numpy as np
  3. from matplotlib import pyplot as plt
  4. %matplotlib inline
  5. In [2]:
  6. #line y = 2*x
  7. x = np.arange(1,101,1).astype(float)
  8. y = 2*np.arange(1,101,1).astype(float)
  9. #add noise
  10. noise = np.random.normal(0, 10, 100)
  11. y += noise

  12. fig = plt.figure(figsize=(10,10))
  13. #plot
  14. plt.plot(x,y,'ro')
  15. plt.axis([0,102, -20,220])
  16. plt.quiver(60, 100,10-0, 20-0, scale_units='xy', scale=1)
  17. plt.arrow(60, 100,10-0, 20-0,head_width=2.5, head_length=2.5, fc='k', ec='k')
  18. plt.text(70, 110, r'$v^1$', fontsize=20)

  19. #save
  20. ax = fig.add_subplot(111)
  21. ax.axis([0,102, -20,220])
  22. ax.set_xlabel('x',fontsize=40)
  23. ax.set_ylabel('y',fontsize=40)
  24. fig.suptitle('2 dimensional dataset',fontsize=40)
  25. fig.savefig('pca_data.png')

  26. In [3]:
  27. #calc PCA
  28. mean_x = np.mean(x)
  29. mean_y = np.mean(y)
  30. mean_vector = np.array([[mean_x],[mean_y]])
  31. u_x = (x- mean_x)/np.std(x)
  32. u_y = (y-mean_y)/np.std(y)
  33. sigma = np.cov([u_x,u_y])
  34. print sigma
  35. eig_vals, eig_vecs = np.linalg.eig(sigma)

  36. eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i])
  37.              for i in range(len(eig_vals))]
  38.             
  39. eig_pairs.sort()
  40. eig_pairs.reverse()
  41. print eig_pairs
  42. v1 = eig_pairs[0][1]
  43. #leading eigenvector:
  44. x_v1 = v1[0]*np.std(x)+mean_x
  45. y_v1 = v1[1]*np.std(y)+mean_y
  46. print 'slope:',(y_v1)/(x_v1)

  47. from sklearn.decomposition import PCA
  48. #X = np.array([x,y])
  49. X = np.array([u_x,u_y])
  50. X = X.T
  51. #print X
  52. pca = PCA(n_components=1)
  53. pca.fit(X)
  54. V = pca.components_
  55. print V,'-',V[0][1]/V[0][0]
  56. #transform in reduced space
  57. X_red_sklearn = pca.fit_transform(X)
  58. print X_red_sklearn.shape

  59. W = np.array(v1.reshape(2,1))
  60. X_red = W.T.dot(X.T)
  61. #check the reduced matrices are equal
  62. assert X_red.T.all() == X_red_sklearn.all(), 'problem with the pca algorithm'
  63. print X_red.T[0],'-',X_red_sklearn[0]
复制代码

7
auirzxp 学生认证  发表于 2017-5-20 19:58:36
提示: 作者被禁止或删除 内容自动屏蔽

8
taotao@sina 发表于 2017-5-20 20:41:02
感谢楼主分享

9
h2h2 发表于 2017-5-20 23:29:27
谢谢分享

10
yrhkxg 发表于 2017-5-21 00:24:37

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-31 10:01