摘要翻译:
最近的研究已经研究了稀疏性在高维回归和信号重建中的作用,建立了从稀疏数据中恢复稀疏模型的理论极限。这一工作表明,$\ell_1$-正则化最小二乘回归可以从$P$维的$N$噪声示例中准确估计稀疏线性模型,即使$P$比$N$大得多。本文研究了该问题的一种变体,即用随机线性变换将原始的$n$输入变量压缩到$p$维的$m\lln$示例,并建立了从压缩数据中成功恢复稀疏线性模型的条件。这种压缩过程的主要动机是匿名数据,并通过透露关于原始数据的少量信息来保护隐私。我们刻划了$ell_1$-正则化压缩回归所需的随机投影数,以识别概率接近1的真模型中的非零系数,这是一个称为“稀疏性”的性质。此外,我们还证明了$\ell_1$-正则化压缩回归的渐近预测性以及oracle线性模型的持久性。最后,我们用信息论的术语刻画了压缩过程的隐私性质,建立了压缩数据和未压缩数据之间的互信息衰减到零的上界。
---
英文标题:
《Compressed Regression》
---
作者:
Shuheng Zhou, John Lafferty, Larry Wasserman
---
最新提交年份:
2008
---
分类信息:
一级分类:Statistics 统计学
二级分类:Machine Learning 机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
一级分类:Computer Science 计算机科学
二级分类:Information Theory 信息论
分类描述:Covers theoretical and experimental aspects of information theory and coding. Includes material in ACM Subject Class E.4 and intersects with H.1.1.
涵盖信息论和编码的理论和实验方面。包括ACM学科类E.4中的材料,并与H.1.1有交集。
--
一级分类:Mathematics 数学
二级分类:Information Theory 信息论
分类描述:math.IT is an alias for cs.IT. Covers theoretical and experimental aspects of information theory and coding.
它是cs.it的别名。涵盖信息论和编码的理论和实验方面。
--
---
英文摘要:
Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. This line of work shows that $\ell_1$-regularized least squares regression can accurately estimate a sparse linear model from $n$ noisy examples in $p$ dimensions, even if $p$ is much larger than $n$. In this paper we study a variant of this problem where the original $n$ input variables are compressed by a random linear transformation to $m \ll n$ examples in $p$ dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for $\ell_1$-regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called ``sparsistence.'' In addition, we show that $\ell_1$-regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called ``persistence.'' Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the mutual information between the compressed and uncompressed data that decay to zero.
---
PDF链接:
https://arxiv.org/pdf/706.0534


雷达卡



京公网安备 11010802022788号







