摘要翻译:
动机:基于相似性度量的聚类是科学数据分析中的一个关键问题。最近,Frey和Dueck\cite提出了一种强大的基于消息传递技术的新算法&亲和传播(AP)。在AP中,每个聚类由一个共同的样例来标识,同一聚类中的所有其他数据点都引用,样例必须引用自己。尽管它被证明是强大的,但美联社目前的形式存在许多缺陷。每个聚类只有一个样本的硬约束将AP限制在规则形状的聚类中,并导致在分析基因表达数据时的次优性能(例如)。结果:放宽AP硬约束可以克服这一局限性。一个新的参数控制约束的重要性与最大化整体相似度的目标相比,并允许在每个数据点选择其最近邻作为样本的简单情况和原始AP之间进行插值。由此产生的软约束亲和传播(SCAP)变得更有信息量,更准确,并导致更稳定的聚类。尽管引入了新的{It a priori}自由参数,但由于鲁棒性增强和参数选择的最优策略更自然地出现,算法对外部调整的总体依赖性降低。SCAP是在生物基准数据上测试的,特别是包括与各种癌症类型相关的微阵列数据。结果表明,该算法有效地揭示了数据集中存在的层次聚类结构。更进一步,它允许为每个簇提取稀疏的基因表达特征。
---
英文标题:
《Clustering by soft-constraint affinity propagation: Applications to
gene-expression data》
---
作者:
Michele Leone, Sumedha, Martin Weigt
---
最新提交年份:
2007
---
分类信息:
一级分类:Quantitative Biology 数量生物学
二级分类:Quantitative Methods 定量方法
分类描述:All experimental, numerical, statistical and mathematical contributions of value to biology
对生物学价值的所有实验、数值、统计和数学贡献
--
一级分类:Physics 物理学
二级分类:Statistical Mechanics 统计力学
分类描述:Phase transitions, thermodynamics, field theory, non-equilibrium phenomena, renormalization group and scaling, integrable models, turbulence
相变,热力学,场论,非平衡现象,重整化群和标度,可积模型,湍流
--
一级分类:Physics 物理学
二级分类:Data Analysis, Statistics and Probability 数据分析、统计与概率
分类描述:Methods, software and hardware for physics data analysis: data processing and storage; measurement methodology; statistical and mathematical aspects such as parametrization and uncertainties.
物理数据分析的方法、软硬件:数据处理与存储;测量方法;统计和数学方面,如参数化和不确定性。
--
---
英文摘要:
Motivation: Similarity-measure based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck \cite{Frey07}. In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, {\it e.g.}, in analyzing gene expression data. Results: This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new {\it a priori} free-parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.
---
PDF链接:
https://arxiv.org/pdf/705.2646


雷达卡



京公网安备 11010802022788号







