摘要翻译:
氨基酸中蛋白质的长度分布服从CoHSI(Hartley-Shannon信息守恒)概率分布。在以前的论文中,我们已经使用Uniprot数据库验证了对此的各种预测,但在这里,我们探索了最长蛋白质与进化时间之间的一种新的预测关系。本文从理论和实验两方面论证了最长蛋白质与蛋白质总数之间的信息论关系,并给出了一个简单的公式。我们强调,没有进化的解释是必要的;它是CoHSI系统的固有特性。CoHSI分布有利于750个氨基酸以下的蛋白质(大多数功能蛋白或其组成结构域的特征)的出现,其固有的渐近幂律也有利于异常长的蛋白质的出现;我们预测到目前为止还存在着超过45,000个氨基酸的未发现的蛋白质。在这样做的时候,我们将蛋白质折叠过程与CoHSI在离散系统中施加约束的优先信息途径进行类比,前者是由通过蛋白质构象能量景观的有利途径(或漏斗)驱动的蛋白质折叠过程。最后,我们证明CoHSI预测了最长蛋白质在进化时间的近期出现,特别是在真核生物中,因为它们具有丰富独特的氨基酸字母表,并通过与独立的系统发育数据合并,我们证实了最长蛋白质与已记录和潜在的未记录的大规模灭绝之间的预测一致关系。
---
英文标题:
《CoHSI III: Long proteins and implications for protein evolution》
---
作者:
Les Hatton, Gregory Warr
---
最新提交年份:
2018
---
分类信息:
一级分类:Quantitative Biology 数量生物学
二级分类:Other Quantitative Biology 其他定量生物学
分类描述:Work in quantitative biology that does not fit into the other q-bio classifications
不适合其他q-bio分类的定量生物学工作
--
---
英文摘要:
The length distribution of proteins measured in amino acids follows the CoHSI (Conservation of Hartley-Shannon Information) probability distribution. In previous papers we have verified various predictions of this using the Uniprot database but here we explore a novel predicted relationship between the longest proteins and evolutionary time. We demonstrate from both theory and experiment that the longest protein and the total number of proteins are intimately related by Information Theory and we give a simple formula for this. We stress that no evolutionary explanation is necessary; it is an intrinsic property of a CoHSI system. While the CoHSI distribution favors the appearance of proteins with fewer than 750 amino acids (characteristic of most functional proteins or their constituent domains) its intrinsic asymptotic power-law also favors the appearance of unusually long proteins; we predict that there are as yet undiscovered proteins longer than 45,000 amino acids. In so doing, we draw an analogy between the process of protein folding driven by favorable pathways (or funnels) through the energy landscape of protein conformations, and the preferential information pathways through which CoHSI exerts its constraints in discrete systems. Finally, we show that CoHSI predicts the recent appearance in evolutionary time of the longest proteins, specifically in eukaryotes because of their richer unique alphabet of amino acids, and by merging with independent phylogenetic data, we confirm a predicted consistent relationship between the longest proteins and documented and potential undocumented mass extinctions.
---
PDF链接:
https://arxiv.org/pdf/1810.08614