-
公开(公告)号:US20190005099A1
公开(公告)日:2019-01-03
申请号:US16121696
申请日:2018-09-05
Applicant: Danny Harnik , Kty Khaitzin , Dmitry Sotnikov
Inventor: Danny Harnik , Kty Khaitzin , Dmitry Sotnikov
IPC: G06F17/30
Abstract: Methods, computing systems and computer program products implement embodiments of the present invention that include partitioning a dataset into a full set of logical data units, and selecting a sample subset of the full set, the sample subset including a random sample of the full set based on a sampling ratio. A set of target hash values are selected from a full range of hash values, and, using a hash function, a respective unit hash value is calculated for each of the logical data units in the sample subset. A histogram is computed that indicates a duplication count of each of the unit hash values that matches a given target hash value, and based on the histogram, a number of distinct logical data units in the full set is estimated.
-
2.
公开(公告)号:US20140052699A1
公开(公告)日:2014-02-20
申请号:US13589197
申请日:2012-08-20
Applicant: Danny Harnik , Oded Margalit , Dalit Naor , Dmitry Sotnikov , Gil Vernik
Inventor: Danny Harnik , Oded Margalit , Dalit Naor , Dmitry Sotnikov , Gil Vernik
CPC classification number: G06F3/0605 , G06F3/0608 , G06F3/0641 , G06F3/0653 , G06F3/067
Abstract: Systems and methods for estimating data reduction ratio for a data set is provided. The method comprises selecting a plurality of m elements from a data set comprising a plurality of N elements; associating an identifier hi for each of the plurality of m elements; associating an identifier he for each of the plurality of elements in the data set; tracking number of times an element i appears in a base set that includes the plurality of m elements selected from the data set; calculating a value counti that indicates the number of times an identifier he matches an identifier hi; and estimating data reduction ratio for the plurality of N elements in the data set, based on number of m number elements selected from the data set and the value counti.
Abstract translation: 提供了一种用于估计数据集的数据缩减率的系统和方法。 该方法包括从包括多个N个元素的数据集中选择多个m个元素; 将多个m个元素中的每个元素的标识符hi相关联; 将数据集中的多个元素中的每一个的标识符he相关联; 跟踪元素i出现在基本集合中的次数,其包括从数据集中选择的多个m个元素; 计算表示与标识符hi匹配的标识符的次数的值counti; 并且基于从数据集中选择的m个数量的数量和值counti来估计数据集中的多个N个元素的数据缩减比率。
-
3.
公开(公告)号:US08650163B1
公开(公告)日:2014-02-11
申请号:US13589197
申请日:2012-08-20
Applicant: Danny Harnik , Oded Margalit , Dalit Naor , Dmitry Sotnikov , Gil Vernik
Inventor: Danny Harnik , Oded Margalit , Dalit Naor , Dmitry Sotnikov , Gil Vernik
IPC: G06F17/30
CPC classification number: G06F3/0605 , G06F3/0608 , G06F3/0641 , G06F3/0653 , G06F3/067
Abstract: Systems and methods for estimating data reduction ratio for a data set is provided. The method comprises selecting a plurality of m elements from a data set comprising a plurality of N elements; associating an identifier hi for each of the plurality of m elements; associating an identifier he for each of the plurality of elements in the data set; tracking number of times an element i appears in a base set that includes the plurality of m elements selected from the data set; calculating a value counti that indicates the number of times an identifier he matches an identifier hi; and estimating data reduction ratio for the plurality of N elements in the data set, based on number of m number elements selected from the data set and the value counti.
Abstract translation: 提供了一种用于估计数据集的数据缩减率的系统和方法。 该方法包括从包括多个N个元素的数据集中选择多个m个元素; 将多个m个元素中的每个元素的标识符hi相关联; 将数据集中的多个元素中的每一个的标识符he相关联; 跟踪元素i出现在基本集合中的次数,其包括从数据集中选择的多个m个元素; 计算表示与标识符hi匹配的标识符的次数的值counti; 并且基于从数据集中选择的m个数量的数量和值counti来估计数据集中的多个N个元素的数据缩减比率。
-
-