| 所在主题: | |
| 文件名: Spark GMM-master.zip | |
| 资料下载链接地址: https://bbs.pinggu.org/a-2233571.html | |
| 附件大小: | |
|
GMM[hide][/hide]
Gaussian Mixture Model Implementation in Pyspark GMM algorithm models the entire data set as a finite mixture of Gaussian distributions,each parameterized by a mean vector, a covariance matrix and a mixture weights. Here the probability of each point to belong to each cluster is computed along with the cluster statistics. This distributed implementation of GMM in pyspark estimates the parameters using the Expectation-Maximization algorithm and considers only diagonal covariance matrix for each component. How to RunThere are two ways to run this code.
You can train the GMM model by invoking the function GMMModel.trainGMM(data,k,n_iter,ct) where data is an RDD(of dense or Sparse Vector), k is the number of components/clusters, n_iter is the number of iterations(default 100), ct is the convergence threshold(default 1e-3).To use this library in your program simply download the GMMModel.py and GMMClustering.py and add them as Python files along with your own user code as shown below: ``` wgethttps://raw.githubusercontent.com/FlytxtRnD/GMM/master/GMMModel.py wgethttps://raw.githubusercontent.com/FlytxtRnD/GMM/master/GMMClustering.py ./bin/spark-submit --master <master> --py-files GMMModel.py,GMMclustering.py <your-program.py> <input_file> <num_of_clusters> [--n_iter <num_of_iterations>] [--ct <convergence_threshold>] ``` The returned object "model" has the following attributes **model.Means,model.Covars,model.Weights**. To get the cluster labels and responsibilty matrix(membership values): responsibility_matrix,cluster_labels = GMMModel.resultPredict(model, data)
|
|
熟悉论坛请点击新手指南
|
|
| 下载说明 | |
|
1、论坛支持迅雷和网际快车等p2p多线程软件下载,请在上面选择下载通道单击右健下载即可。 2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。 3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品,拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知,将积极的采取必要措施;同时,本站也将在技术手段和能力范围内,履行版权保护的注意义务。 (如有侵权,欢迎举报) |
|
京ICP备16021002号-2 京B2-20170662号
京公网安备 11010802022788号
论坛法律顾问:王进律师
知识产权保护声明
免责及隐私声明