-
公开(公告)号:US08688655B2
公开(公告)日:2014-04-01
申请号:US13218530
申请日:2011-08-26
CPC分类号: G06F17/30324
摘要: A method for providing a compressed index for a stream of binary data records comprises steps of indexing a field from each record in a bitmap index, compressing stored bits in each column of the bitmap index by replacing a group of successive bits with a code and outputting the code. There is provided at least one of a first code for replacing a sequence of a first filling, a literal and a second filling, and a second code for replacing a sequence of a first literal, a filling and a second literal. In this context, a filling is a sequence of bits with the same value and a literal is a sequence of bits with different values.
摘要翻译: 一种用于为二进制数据记录流提供压缩索引的方法包括以下步骤:从位图索引中的每个记录中对字段进行索引,通过用代码替换一组连续位来压缩位图索引的每列中的存储位,并输出 代码。 提供了用于替换第一填充,文字和第二填充的序列的第一代码和用于替换第一文字,填充和第二文字的序列的第二代码中的至少一个。 在这种情况下,填充是具有相同值的位序列,文字是具有不同值的位序列。
-
公开(公告)号:US20120054160A1
公开(公告)日:2012-03-01
申请号:US13218530
申请日:2011-08-26
IPC分类号: G06F7/00
CPC分类号: G06F17/30324
摘要: A method for providing a compressed index for a stream of binary data records comprises steps of indexing a field from each record in a bitmap index, compressing stored bits in each column of the bitmap index by replacing a group of successive bits with a code and outputting the code. There is provided at least one of a first code for replacing a sequence of a first filling, a literal and a second filling, and a second code for replacing a sequence of a first literal, a filling and a second literal. In this context, a filling is a sequence of bits with the same value and a literal is a sequence of bits with different values.
摘要翻译: 一种用于为二进制数据记录流提供压缩索引的方法包括以下步骤:从位图索引中的每个记录中对字段进行索引,通过用代码替换一组连续位来压缩位图索引的每列中的存储位,并输出 代码。 提供了用于替换第一填充,文字和第二填充的序列的第一代码和用于替换第一文字,填充和第二文字的序列的第二代码中的至少一个。 在这种情况下,填充是具有相同值的位序列,文字是具有不同值的位序列。
-
公开(公告)号:US08782012B2
公开(公告)日:2014-07-15
申请号:US13218566
申请日:2011-08-26
CPC分类号: G06F17/30324 , H04L69/04
摘要: Methods and a device for providing a compressed index of binary records. A method includes: sorting the records by content of a predetermined field of the record, indexing the field from one of the records in a line of a bitmap index, compressing bits in a column of the bitmap index by replacing a group of successive bits with a code, where the sorting includes the steps of assigning, for each record, a hash bucket of a hash table on a basis of a locality sensitive hash function on the contents of the predetermined field, so that the probability for two of the records to be assigned to the same has bucket increases with the similarity of the contents of the predetermined field between the records, and where at least one step of the computer implemented method is executed on a computer device.
摘要翻译: 方法和用于提供二进制记录的压缩索引的设备。 一种方法包括:通过记录的预定字段的内容对记录进行排序,从位图索引的行中的一个记录索引该字段,通过用位置索引的一列替换一组连续位来压缩位图索引的列中的位 代码,其中排序包括以下步骤:基于对预定字段的内容的位置敏感哈希函数,为每个记录分配哈希表的哈希桶,使得两个记录的概率 被分配给具有与记录之间的预定字段的内容的相似性相同的桶,并且其中在计算机设备上执行计算机实现的方法的至少一个步骤。
-
公开(公告)号:US20120054161A1
公开(公告)日:2012-03-01
申请号:US13218566
申请日:2011-08-26
IPC分类号: G06F17/30
CPC分类号: G06F17/30324 , H04L69/04
摘要: Methods and a device for providing a compressed index of binary records. A method includes: sorting the records by content of a predetermined field of the record, indexing the field from one of the records in a line of a bitmap index, compressing bits in a column of the bitmap index by replacing a group of successive bits with a code, where the sorting includes the steps of assigning, for each record, a hash bucket of a hash table on a basis of a locality sensitive hash function on the contents of the predetermined field, so that the probability for two of the records to be assigned to the same has bucket increases with the similarity of the contents of the predetermined field between the records, and where at least one step of the computer implemented method is executed on a computer device.
摘要翻译: 方法和用于提供二进制记录的压缩索引的设备。 一种方法包括:通过记录的预定字段的内容对记录进行排序,从位图索引的行中的一个记录索引该字段,通过用位置索引的一列替换一组连续位来压缩位图索引的列中的位 代码,其中排序包括以下步骤:基于对预定字段的内容的位置敏感哈希函数,为每个记录分配哈希表的哈希桶,使得两个记录的概率 被分配给具有与记录之间的预定字段的内容的相似性相同的桶,并且其中在计算机设备上执行计算机实现的方法的至少一个步骤。
-
公开(公告)号:US09286333B2
公开(公告)日:2016-03-15
申请号:US13553457
申请日:2012-07-19
IPC分类号: G06F7/00 , G06F17/00 , G06K9/36 , G06K9/46 , H04N7/12 , H04N11/02 , G06F17/30 , G06F9/45 , G06F17/22 , H03M7/30 , G06F3/12
CPC分类号: G06F17/30306 , G06F3/1297 , G06F8/4434 , G06F17/2252 , G06F17/30516 , H03M7/30
摘要: A method for compressing a sequence of records, each record comprising a sequence of fields, comprises steps of buffering a record in a line of a matrix, reordering the lines of the matrix according to locality sensitive hash values of the buffered records such that records with similar contents in corresponding fields are placed in proximity, and consolidating fields in columns of the matrix into a block of codes. In this, consolidating yields codes of one of a first type comprising a sequence of individual fields and a second type comprising a sequence of fields with at least one repetition. The second type of code comprises a presence field indicating repeated fields and an iteration field indicating a number of respective repetitions. Decompression of the records from the block codes compressed above is also described.
摘要翻译: 一种用于压缩记录序列的方法,每个记录包括字段序列,包括以下步骤:将矩阵的行中的记录缓冲,根据缓冲记录的局部敏感散列值对矩阵的行重新排序, 将相应字段中的相似内容放置在邻近位置,并将矩阵列中的字段合并为代码块。 在此,合并产生包括单个字段序列的第一类型之一的代码,以及包括具有至少一个重复的字段序列的第二类型。 第二类型的代码包括表示重复字段的存在字段和指示各个重复次数的迭代字段。 还对来自上面压缩的块代码的记录进行解压缩。
-
-
-
-