-
公开(公告)号:US20040049517A1
公开(公告)日:2004-03-11
申请号:US10232372
申请日:2002-09-03
Applicant: Research Triangle Institute
Inventor: Avinash C. Singh
IPC: G06F007/00
CPC classification number: G06F21/6254 , G06F17/30303 , Y10S707/99945
Abstract: A method and system for ensuring statistical disclosure limitation (SDL) of categorical or continuous micro data, while maintaining the analytical quality of the micro data. The new SDL methodology exploits the analogy between (1) taking a sample (instead of a census,) along with some adjustments, including imputation, for missing information, and (2) releasing a subset, instead of the original data set, along with some adjustments for records still at disclosure risk. Survey sampling reduces monetary cost in comparison to a census, but entails some loss of information. Similarly, releasing a subset reduces disclosure cost in comparison to the full database, but entails some loss of information. Thus, optimal survey sampling methods can be used for statistical disclosure limitation. The method includes partitioning the database into risk strata, optimal probabilistic substitution, optimal probabilistic subsampling, and optimal sampling weight calibration.
Abstract translation: 一种确保分类或连续微数据统计公开限制(SDL)的方法和系统,同时保持微数据的分析质量。 新的SDL方法利用(1)采样(而不是普查)之间的类比,以及对缺失信息的一些调整,包括插补,以及(2)发布一个子集,而不是原始数据集,以及 一些调整记录仍处于披露风险。 调查抽样与普查相比降低了货币成本,但却导致信息丢失。 类似地,释放子集可以减少与完整数据库相比的披露成本,但会导致信息丢失。 因此,最佳调查抽样方法可用于统计披露限制。 该方法包括将数据库划分为风险层次,最优概率替代,最优概率子抽样和最优抽样权重校准。