楼主: 能者818
267 0

[统计数据] 样本披露风险估计的平滑模型 [推广有奖]

  • 0关注
  • 6粉丝

会员

学术权威

78%

还不是VIP/贵宾

-

威望
10
论坛币
10 个
通用积分
39.6240
学术水平
0 点
热心指数
1 点
信用等级
0 点
经验
24699 点
帖子
4115
精华
0
在线时间
1 小时
注册时间
2022-2-24
最后登录
2024-12-24

楼主
能者818 在职认证  发表于 2022-3-7 16:55:00 来自手机 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
摘要翻译:
当样本频率表被公布时,当一些个人可以根据他们在表中称为关键变量的某些属性中的值来识别时,就会产生泄露风险,然后可能推断出他们在其他属性中的值,并侵犯他们的隐私。根据将要发布的样本,以及可能对整个人口的某些部分了解,考虑发布样本的机构必须估计披露风险。风险来自于代表小群体细胞的非空样本细胞,特别是来自群体单一细胞。因此,风险估计需要评估有多少相关的群体细胞可能是小的。为了实现这一任务,已经提出了各种方法,我们提出了一种基于使用该细胞的局部邻域平滑来估计群体细胞频率的方法,即在所有属性中具有相似或接近值的细胞。我们用这种方法提供了一些初步的结果和实验。并与另外两种方法进行了比较:1。一种对数线性模型方法,其中对给定单元的推断是基于由对数线性模型确定的单元的“邻域”。这样的邻域与所讨论的单元格有一个或一些共同的属性,但其他一些属性可能显著不同。2 Argus方法,在这种方法中,对给定单元的推断只基于特定单元中的样本频率、样本设计和一些已知的总体边际分布,而不从给定单元的任何类型的“邻域”学习,也不从使用表结构的任何模型学习。
---
英文标题:
《A smoothing model for sample disclosure risk estimation》
---
作者:
Yosef Rinott, Natalie Shlomo
---
最新提交年份:
2007
---
分类信息:

一级分类:Statistics        统计学
二级分类:Methodology        方法论
分类描述:Design, Surveys, Model Selection, Multiple Testing, Multivariate Methods, Signal and Image Processing, Time Series, Smoothing, Spatial Statistics, Survival Analysis, Nonparametric and Semiparametric Methods
设计,调查,模型选择,多重检验,多元方法,信号和图像处理,时间序列,平滑,空间统计,生存分析,非参数和半参数方法
--

---
英文摘要:
  When a sample frequency table is published, disclosure risk arises when some individuals can be identified on the basis of their values in certain attributes in the table called key variables, and then their values in other attributes may be inferred, and their privacy is violated. On the basis of the sample to be released, and possibly some partial knowledge of the whole population, an agency which considers releasing the sample, has to estimate the disclosure risk. Risk arises from non-empty sample cells which represent small population cells and from population uniques in particular. Therefore risk estimation requires assessing how many of the relevant population cells are likely to be small. Various methods have been proposed for this task, and we present a method in which estimation of a population cell frequency is based on smoothing using a local neighborhood of this cell, that is, cells having similar or close values in all attributes. We provide some preliminary results and experiments with this method. Comparisons are made to two other methods: 1. a log-linear models approach in which inference on a given cell is based on a ``neighborhood'' of cells determined by the log-linear model. Such neighborhoods have one or some common attributes with the cell in question, but some other attributes may differ significantly. 2 The Argus method in which inference on a given cell is based only on the sample frequency in the specific cell, on the sample design and on some known marginal distributions of the population, without learning from any type of ``neighborhood'' of the given cell, nor from any model which uses the structure of the table.
---
PDF链接:
https://arxiv.org/pdf/708.098
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:neighborhood Multivariate distribution Time Series comparisons sample table estimation 披露 some

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
扫码
拉您进交流群
GMT+8, 2026-2-6 06:02