Compression of sorted value indexes using common prefixes
    1.
    发明授权
    Compression of sorted value indexes using common prefixes 有权
    使用公共前缀压缩排序值索引

    公开(公告)号:US08255398B2

    公开(公告)日:2012-08-28

    申请号:US12241458

    申请日:2008-09-30

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30631 H03M7/30

    摘要: A method, information processing system, and computer program storage product to compress sorted values. At least a first prefix and a second prefix in a plurality of prefixes are compared. Each prefix comprises at least a portion of a plurality of sorted values. A respective prefix comprises a set of consecutive characters including at least a first character of a respective sorted value. The respective sorted value further comprising a respective suffix comprising consecutive characters of the respective sorted value that are after the respective prefix. At least a respective first character of the first prefix and a respective first character of the second prefix are determined to be substantially identical. The first prefix is merged with the second prefix into a single prefix comprising the first character. A set of suffixes associated with the first prefix is updated to reflect an association with the second prefix.

    摘要翻译: 一种方法,信息处理系统和计算机程序存储产品来压缩排序值。 比较多个前缀中的至少第一前缀和第二前缀。 每个前缀包括多个排序值的至少一部分。 相应的前缀包括一组包括相应排序值的至少第一个字符的连续字符。 相应的排序值还包括相应的后缀,该后缀包括在相应前缀之后的相应排序值的连续字符。 至少第一前缀的相应第一字符和第二前缀的相应第一个字符被确定为基本相同。 第一前缀与第二前缀合并成包括第一个字符的单个前缀。 与第一前缀相关联的一组后缀被更新以反映与第二前缀的关联。

    COMPRESSION OF SORTED VALUE INDEXES USING COMMON PREFIXES
    2.
    发明申请
    COMPRESSION OF SORTED VALUE INDEXES USING COMMON PREFIXES 有权
    使用共同条款压缩价值指数

    公开(公告)号:US20100082545A1

    公开(公告)日:2010-04-01

    申请号:US12241458

    申请日:2008-09-30

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30631 H03M7/30

    摘要: A method, information processing system, and computer program storage product for compressing sorted values is disclosed. At least a first prefix and a second prefix in a plurality of prefixes are compared. Each prefix comprises at least a portion of a plurality of sorted values. A respective prefix comprises a set of consecutive characters including at least a first character of a respective sorted value. The respective sorted value further comprising a respective suffix comprising consecutive characters of the respective sorted value that are after the respective prefix. At least a respective first character of the first prefix and a respective first character of the second prefix are determined to be substantially identical. The first prefix is merged with the second prefix into a single prefix comprising the first character. A set of suffixes associated with the first prefix is updated to reflect an association with the second prefix.

    摘要翻译: 公开了一种用于压缩排序值的方法,信息处理系统和计算机程序存储产品。 比较多个前缀中的至少第一前缀和第二前缀。 每个前缀包括多个排序值的至少一部分。 相应的前缀包括一组包括相应排序值的至少第一个字符的连续字符。 相应的排序值还包括相应的后缀,该后缀包括在相应前缀之后的相应排序值的连续字符。 至少第一前缀的相应第一字符和第二前缀的相应第一个字符被确定为基本相同。 第一前缀与第二前缀合并成包括第一个字符的单个前缀。 与第一前缀相关联的一组后缀被更新以反映与第二前缀的关联。

    Compressibility estimation of non-unique indexes in a database management system
    3.
    发明授权
    Compressibility estimation of non-unique indexes in a database management system 失效
    数据库管理系统中非唯一索引的压缩性估计

    公开(公告)号:US07895171B2

    公开(公告)日:2011-02-22

    申请号:US12057055

    申请日:2008-03-27

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30312 H03M7/30

    摘要: A method, information processing system, and computer readable storage product estimate a compression factor. A set of key values within an index are analyzed. Each key value is associated with a record identifier (“RID”) list comprising a set of RIDs. The index is in an uncompressed format and includes a total byte length. A number of RIDs associated with each key value is estimated for each key value in the set of key values. A total byte length for all RID deltas between each at least two consecutive RIDs within a RID list is estimated for each RID list based on the number of RIDs that have been determined. The total byte length estimated for each RID list is accumulated. A compression factor associated with the index is determined by dividing the total byte length that has been accumulated by the byte length of the index.

    摘要翻译: 一种方法,信息处理系统和计算机可读存储产品估计压缩因子。 分析索引中的一组关键值。 每个密钥值与包括一组RID的记录标识符(“RID”)列表相关联。 索引为未压缩格式,包括总字节长度。 与密钥值集合中的每个密钥值估计与每个密钥值相关联的多个RID。 基于已经确定的RID的数量,针对每个RID列表估计RID列表内的每个至少两个连续RID之间的所有RID差分的总字节长度。 累积为每个RID列表估计的总字节长度。 通过将累加的总字节长度除以索引的字节长度来确定与索引相关联的压缩因子。

    COMPRESSABILITY ESTIMATION OF NON-UNIQUE INDEXES IN A DATABASE MANAGEMENT SYSTEM
    4.
    发明申请
    COMPRESSABILITY ESTIMATION OF NON-UNIQUE INDEXES IN A DATABASE MANAGEMENT SYSTEM 失效
    数据库管理系统中非特定索引的可压缩性估计

    公开(公告)号:US20090248725A1

    公开(公告)日:2009-10-01

    申请号:US12057055

    申请日:2008-03-27

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30312 H03M7/30

    摘要: A method, information processing system, and computer readable storage product estimate a compression factor. A set of key values within an index are analyzed. Each key value is associated with a record identifier (“RID”) list comprising a set of RIDs. The index is in an uncompressed format and includes a total byte length. A number of RIDs associated with each key value is estimated for each key value in the set of key values. A total byte length for all RID deltas between each at least two consecutive RIDs within a RID list is estimated for each RID list based on the number of RIDs that have been determined. The total byte length estimated for each RID list is accumulated. A compression factor associated with the index is determined by dividing the total byte length that has been accumulated by the byte length of the index.

    摘要翻译: 一种方法,信息处理系统和计算机可读存储产品估计压缩因子。 分析索引中的一组关键值。 每个密钥值与包括一组RID的记录标识符(“RID”)列表相关联。 索引为未压缩格式,包括总字节长度。 与密钥值集合中的每个密钥值估计与每个密钥值相关联的多个RID。 基于已经确定的RID的数量,针对每个RID列表估计RID列表内的每个至少两个连续RID之间的所有RID差分的总字节长度。 累积为每个RID列表估计的总字节长度。 通过将累加的总字节长度除以索引的字节长度来确定与索引相关联的压缩因子。

    Method and apparatus for selecting an optimal delete-safe compression method on list of delta encoded integers
    5.
    发明授权
    Method and apparatus for selecting an optimal delete-safe compression method on list of delta encoded integers 有权
    用于在增量编码整数列表上选择最佳删除安全压缩方法的方​​法和装置

    公开(公告)号:US08990173B2

    公开(公告)日:2015-03-24

    申请号:US12056979

    申请日:2008-03-27

    IPC分类号: G06F7/00 H03M7/30

    CPC分类号: H03M7/30

    摘要: Techniques are disclosed for selecting a delete-safe compression method for a plurality of delta encoded data values (e.g., delta encoded integers or deltas). For example, a computer-implemented method for selecting an optimal delete-safe compression algorithm from among two or more compression algorithms for use on a plurality of delta encoded data values includes the following steps. The maximum number of data values eliminated by each of the two or more compression algorithms is computed. For the plurality of delta encoded data values to be compressed, the minimum size of the plurality of delta encoded data values before compression thereof is computed. A delete-safe threshold value is computed based on the minimum size of the plurality of delta encoded data values. Then, the compression algorithm is selected from the two or more compression algorithms that achieves the delete-safe threshold value.

    摘要翻译: 公开了用于为多个增量编码的数据值(例如,增量编码的整数或三角形)选择删除安全的压缩方法的技术。 例如,用于从用于多个增量编码数据值的两个或更多个压缩算法中选择最佳删除安全压缩算法的计算机实现的方法包括以下步骤。 计算由两个或更多个压缩算法中的每一个消除的数据值的最大数目。 对于要压缩的多个delta编码数据值,计算其压缩之前的多个Δ编码数据值的最小大小。 基于多个增量编码数据值的最小大小来计算删除安全阈值。 然后,从实现删除安全​​阈值的两个或更多个压缩算法中选择压缩算法。

    Method and apparatus for encoding list of variable length structures to support bi-directional scans
    6.
    发明授权
    Method and apparatus for encoding list of variable length structures to support bi-directional scans 失效
    用于编码可变长度结构列表以支持双向扫描的方法和装置

    公开(公告)号:US08126929B2

    公开(公告)日:2012-02-28

    申请号:US12057012

    申请日:2008-03-27

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30958 G06F17/30286

    摘要: Techniques are disclosed for encoding a variable length structure such that it facilitates forward and reverse scans of a list of such structures as needed. While the techniques are applicable to a wide variety of applications, they are particularly well-suited for use with structures such as those found in compressed database indexes. For example, a computer-implemented method for processing one or more variable length data structures includes the following steps. Each variable length data structure is obtained. Each variable length structure comprises one or more data block. A variable length encoding process is applied to the one or more blocks of each variable length data structure which comprises setting a continuation data value in each block to a first value or a second value, wherein the setting of the continuation data values enables bi-directional scanning of each variable length structure.

    摘要翻译: 公开了用于编码可变长度结构的技术,使得其有助于正向和反向扫描所需的这种结构的列表。 虽然这些技术适用于各种各样的应用,但它们特别适用于诸如在压缩数据库索引中发现的结构。 例如,用于处理一个或多个可变长度数据结构的计算机实现的方法包括以下步骤。 获得每个可变长度数据结构。 每个可变长度结构包括一个或多个数据块。 可变长度编码处理被应用于每个可变长度数据结构的一个或多个块,其包括将每个块中的连续数据值设置为第一值或第二值,其中,连续数据值的设置允许双向 扫描每个可变长度结构。

    Method and Apparatus for Encoding List of Variable Length Structures to Support Bi-Directional Scans
    7.
    发明申请
    Method and Apparatus for Encoding List of Variable Length Structures to Support Bi-Directional Scans 失效
    用于编码可变长度结构列表以支持双向扫描的方法和装置

    公开(公告)号:US20090248724A1

    公开(公告)日:2009-10-01

    申请号:US12057012

    申请日:2008-03-27

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30958 G06F17/30286

    摘要: Techniques are disclosed for encoding a variable length structure such that it facilitates forward and reverse scans of a list of such structures as needed. While the techniques are applicable to a wide variety of applications, they are particularly well-suited for use with structures such as those found in compressed database indexes. For example, a computer-implemented method for processing one or more variable length data structures includes the following steps. Each variable length data structure is obtained. Each variable length structure comprises one or more data block. A variable length encoding process is applied to the one or more blocks of each variable length data structure which comprises setting a continuation data value in each block to a first value or a second value, wherein the setting of the continuation data values enables bi-directional scanning of each variable length structure.

    摘要翻译: 公开了用于编码可变长度结构的技术,使得其有助于正向和反向扫描所需的这种结构的列表。 虽然这些技术适用于各种各样的应用,但它们特别适用于诸如在压缩数据库索引中发现的结构。 例如,用于处理一个或多个可变长度数据结构的计算机实现的方法包括以下步骤。 获得每个可变长度数据结构。 每个可变长度结构包括一个或多个数据块。 可变长度编码处理被应用于每个可变长度数据结构的一个或多个块,其包括将每个块中的连续数据值设置为第一值或第二值,其中,连续数据值的设置允许双向 扫描每个可变长度结构。

    Method and Apparatus for Selecting an Optimal Delete-Safe Compression Method on List of Delta Encoded Integers
    8.
    发明申请
    Method and Apparatus for Selecting an Optimal Delete-Safe Compression Method on List of Delta Encoded Integers 有权
    用于在Delta编码整数列表中选择最佳删除安全压缩方法的方​​法和装置

    公开(公告)号:US20090248723A1

    公开(公告)日:2009-10-01

    申请号:US12056979

    申请日:2008-03-27

    IPC分类号: G06F17/30

    CPC分类号: H03M7/30

    摘要: Techniques are disclosed for selecting a delete-safe compression method for a plurality of delta encoded data values (e.g., delta encoded integers or deltas). For example, a computer-implemented method for selecting an optimal delete-safe compression algorithm from among two or more compression algorithms for use on a plurality of delta encoded data values includes the following steps. The maximum number of data values eliminated by each of the two or more compression algorithms is computed. For the plurality of delta encoded data values to be compressed, the minimum size of the plurality of delta encoded data values before compression thereof is computed. A delete-safe threshold value is computed based on the minimum size of the plurality of delta encoded data values. Then, the compression algorithm is selected from the two or more compression algorithms that achieves the delete-safe threshold value.

    摘要翻译: 公开了用于为多个增量编码的数据值(例如,增量编码的整数或三角形)选择删除安全的压缩方法的技术。 例如,用于从用于多个增量编码数据值的两个或更多个压缩算法中选择最佳删除安全压缩算法的计算机实现的方法包括以下步骤。 计算由两个或更多个压缩算法中的每一个消除的数据值的最大数目。 对于要压缩的多个delta编码数据值,计算其压缩之前的多个Δ编码数据值的最小大小。 基于多个增量编码数据值的最小大小来计算删除安全阈值。 然后,从实现删除安全​​阈值的两个或更多个压缩算法中选择压缩算法。

    Method and apparatus for organizing data sources
    9.
    发明授权
    Method and apparatus for organizing data sources 有权
    组织数据源的方法和装置

    公开(公告)号:US07529740B2

    公开(公告)日:2009-05-05

    申请号:US11503713

    申请日:2006-08-14

    IPC分类号: G06F17/30

    摘要: A method for organizing deep Web services is provided. In one aspect, the method obtains a collection of sources and their associated attributes and/or input modes, for instance, using a crawling algorithm. The method uses this information to organize the sources into communities. A mining algorithm such as the hyperclique mining algorithm is used to obtain cliques of highly correlated attributes. A clustering algorithm such as the hierarchical agglomerative clustering algorithm is used to further cluster the cliques of attributes into larger cliques, which in the present disclosure is referred to as signatures. The sources that are associated with each signature form a community and a graph representation of the communities is constructed, where the vertices are communities and the edges are the shared attributes.

    摘要翻译: 提供了组织深度Web服务的方法。 在一个方面,该方法获得源及其相关属性和/或输入模式的集合,例如使用爬行算法。 该方法使用这些信息将资源组织到社区。 使用诸如超临界挖掘算法的挖掘算法来获得高度相关属性的集合。 使用诸如分层聚类聚类算法的聚类算法进一步将属性集合聚类成更大的团块,其在本公开中被称为签名。 与每个签名相关联的源构成社区,并构建社区的图形表示,其中顶点是社区,边是共享属性。

    SYSTEM AND METHOD FOR SEARCHING DEEP WEB SERVICES
    10.
    发明申请
    SYSTEM AND METHOD FOR SEARCHING DEEP WEB SERVICES 审中-公开
    用于搜索深层WEB服务的系统和方法

    公开(公告)号:US20080270367A1

    公开(公告)日:2008-10-30

    申请号:US12173545

    申请日:2008-07-15

    IPC分类号: G06F17/30

    CPC分类号: G06F16/958 Y10S707/99933

    摘要: A system and method for searching deep web services are provided. The system and method in one aspect allow organizing communities, sources and schema attributes in a multi-tier containment relationship; searching representative schema attributes in one or more communities; searching representative services in one or more communities; searching for related schema attributes; and searching for related communities.

    摘要翻译: 提供了一种用于搜索深度Web服务的系统和方法。 一方面的系统和方法允许在多层遏制关系中组织社区,来源和模式属性; 在一个或多个社区中搜索代表性模式属性; 在一个或多个社区寻找代表服务; 搜索相关的模式属性; 并搜索相关社区。