一种面向差分隐私保护的k均值聚类方法

发明授权

请登陆查看更多内容

专利标题： 一种面向差分隐私保护的k均值聚类方法
申请号： CN201810347108.6

申请日： 2018-04-18
公开(公告)号： CN108280491B

公开(公告)日： 2020-03-06
发明人: 杨庚 , 胡闯 , 白云璐 , 王璇 , 唐海霞
申请人： 东莞市盟大塑化科技有限公司
申请人地址： 广东省东莞市南城区周溪隆溪路5号高盛科技园二期之高盛科技大厦第7层701-703室
专利权人： 东莞市盟大塑化科技有限公司
当前专利权人： 东莞盟大集团有限公司
当前专利权人地址： 广东省东莞市南城区周溪隆溪路5号高盛科技园二期之高盛科技大厦第7层701-703室
代理机构： 北京权智天下知识产权代理事务所
代理商 王新爱
主分类号： G06K9/62
IPC分类号： G06K9/62 ; G06F21/62

摘要：

本发明公开了一种面向差分隐私保护的k均值聚类方法，包括数据预处理；用C表示聚类后的中心点集，C,表示给定的数据集和簇中心C下的误差平方和；判断C,的大小；循环执行，直到retry大于给定的重试次数最大值retrymax，然后返回最优的中心点Cbest；遍历数据集X中的每个点，将它分类到最近的中心点；设置添加的随机噪声；重新计算每个簇的数据点的总和、点的数量，添加噪声，最后更新簇的质心；重复步骤直到误差平方和收敛或迭代次数达到上限。本发明在k均值聚类算法的迭代过程中增加了满足特定分布的适当的随机噪声，使得聚类结果在一定程度上失真，达到隐私保护的目的，同时保证了数据的可用性。

摘要（英）：

The invention discloses a differential privacy protection-oriented k-means clustering method. The K-means clustering method comprises the following steps: performing data preprocessing; ensuring thatC indicates a clustered centered point set, and C indicates a sum of error square of a given data set and a cluster center C; judging the volume of C; performing cyclic execution until retry is greater than a maximum value retrymatx of given retry times, and then returning to the best central point Cbest; traversing each point of the data set X, classifying the points to the nearest central point;setting added random noises; renewedly calculating the sum of the data points of each cluster and the quantity of the points, and adding the noises and finally updating the quality center of the cluster; and repeatedly carrying out the steps until the sum of error square is converged or iteration times reach the upper limit. According to the differential privacy protection-oriented k-means clustering method disclosed by the invention, the appropriate random noises which are specially distributed are added in an iteration process of a k-means clustering algorithm, so that a clustering result is distorted to a certain extent, the aim of privacy protection is fulfilled, and meanwhile, the availability of data is ensured.

公开/授权文献

CN108280491A 一种面向差分隐私保护的k均值聚类方法公开/授权日：2018-07-13

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )
G06K9/62	.应用电子设备进行识别的方法或装置