摘要翻译:
幂律分布出现在许多具有科学意义的情况下,对我们理解自然和人为现象有重要影响。不幸的是,幂律的检测和表征由于分布尾部(分布中代表大而罕见事件的部分)出现的大波动以及难以确定幂律行为的范围而变得复杂。常用的分析幂律数据的方法,如最小二乘拟合,对幂律分布的参数估计基本上是不准确的,即使在这些方法返回准确答案的情况下,它们仍然不能令人满意,因为它们根本没有给出数据是否服从幂律的指示。在这里,我们提出了一个原则的统计框架,用于识别和量化经验数据中的幂律行为。我们的方法将最大似然拟合方法与基于Kolmogorov-Smirnov统计量和似然比的拟合优度检验相结合。我们通过对合成数据的测试评估了该方法的有效性,并给出了与以前方法的关键比较。我们还将所提出的方法应用于24个来自不同学科的真实世界数据集,每个数据集都被推测遵循幂律分布。在某些情况下,我们发现这些猜想与数据一致,而在其他情况下,幂律被排除。
---
英文标题:
《Power-law distributions in empirical data》
---
作者:
Aaron Clauset, Cosma Rohilla Shalizi, M. E. J. Newman
---
最新提交年份:
2009
---
分类信息:
一级分类:Physics 物理学
二级分类:Data Analysis, Statistics and Probability 数据分析、统计与概率
分类描述:Methods, software and hardware for physics data analysis: data processing and storage; measurement methodology; statistical and mathematical aspects such as parametrization and uncertainties.
物理数据分析的方法、软硬件:数据处理与存储;测量方法;统计和数学方面,如参数化和不确定性。
--
一级分类:Physics 物理学
二级分类:Disordered Systems and Neural Networks 无序系统与神经网络
分类描述:Glasses and spin glasses; properties of random, aperiodic and quasiperiodic systems; transport in disordered media; localization; phenomena mediated by defects and disorder; neural networks
眼镜和旋转眼镜;随机、非周期和准周期系统的性质;无序介质中的传输;本地化;由缺陷和无序介导的现象;神经网络
--
一级分类:Statistics 统计学
二级分类:Applications 应用程序
分类描述:Biology, Education, Epidemiology, Engineering, Environmental Sciences, Medical, Physical Sciences, Quality Control, Social Sciences
生物学,教育学,流行病学,工程学,环境科学,医学,物理科学,质量控制,社会科学
--
一级分类:Statistics 统计学
二级分类:Methodology 方法论
分类描述:Design, Surveys, Model Selection, Multiple Testing, Multivariate Methods, Signal and Image Processing, Time Series, Smoothing, Spatial Statistics, Survival Analysis, Nonparametric and Semiparametric Methods
设计,调查,模型选择,多重检验,多元方法,信号和图像处理,时间序列,平滑,空间统计,生存分析,非参数和半参数方法
--
---
英文摘要:
Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution -- the part of the distribution representing large but rare events -- and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out.
---
PDF链接:
https://arxiv.org/pdf/706.1062


雷达卡



京公网安备 11010802022788号







