Storing compression units in relational tables
    1.
    发明授权
    Storing compression units in relational tables 有权
    将压缩单元存储在关系表中

    公开(公告)号:US08645337B2

    公开(公告)日:2014-02-04

    申请号:US12769205

    申请日:2010-04-28

    IPC分类号: G06F7/00 G06F17/00 G06F17/30

    摘要: A database server stores compressed units in data blocks of a database. A table (or data from a plurality of rows thereof) is first compressed into a “compression unit” using any of a wide variety of compression techniques. The compression unit is then stored in one or more data block rows across one or more data blocks. As a result, a single data block row may comprise compressed data for a plurality of table rows, as encoded within the compression unit. Storage of compression units in data blocks maintains compatibility with existing data block-based databases, thus allowing the use of compression units in preexisting databases without modification to the underlying format of the database. The compression units may, for example, co-exist with uncompressed tables. Various techniques allow a database server to optimize access to data in the compression unit, so that the compression is virtually transparent to the user.

    摘要翻译: 数据库服务器将压缩单位存储在数据库的数据块中。 使用各种各样的压缩技术中的任一种,首先将表(或其多行的数据)压缩为“压缩单位”。 然后,压缩单元被存储在跨越一个或多个数据块的一个或多个数据块行。 结果,单个数据块行可以包括在压缩单元内编码的多个表行的压缩数据。 数据块中的压缩单元的存储与现有的基于数据块的数据库保持兼容,从而允许在预先存在的数据库中使用压缩单元,而无需修改数据库的底层格式。 压缩单元可以例如与未压缩的表共存。 各种技术允许数据库服务器优化对压缩单元中的数据的访问,使得压缩对于用户实际上是透明的。

    STORING COMPRESSION UNITS IN RELATIONAL TABLES
    2.
    发明申请
    STORING COMPRESSION UNITS IN RELATIONAL TABLES 有权
    在关系表中存储压缩单位

    公开(公告)号:US20100281004A1

    公开(公告)日:2010-11-04

    申请号:US12769205

    申请日:2010-04-28

    IPC分类号: G06F17/30

    摘要: A database server stores compressed units in data blocks of a database. A table (or data from a plurality of rows thereof) is first compressed into a “compression unit” using any of a wide variety of compression techniques. The compression unit is then stored in one or more data block rows across one or more data blocks. As a result, a single data block row may comprise compressed data for a plurality of table rows, as encoded within the compression unit. Storage of compression units in data blocks maintains compatibility with existing data block-based databases, thus allowing the use of compression units in preexisting databases without modification to the underlying format of the database. The compression units may, for example, co-exist with uncompressed tables. Various techniques allow a database server to optimize access to data in the compression unit, so that the compression is virtually transparent to the user.

    摘要翻译: 数据库服务器将压缩单位存储在数据库的数据块中。 使用各种各样的压缩技术中的任一种,首先将表(或其多行的数据)压缩为“压缩单元”。 然后,压缩单元被存储在跨越一个或多个数据块的一个或多个数据块行。 结果,单个数据块行可以包括在压缩单元内编码的多个表行的压缩数据。 数据块中的压缩单元的存储与现有的基于数据块的数据库保持兼容,从而允许在预先存在的数据库中使用压缩单元,而无需修改数据库的底层格式。 压缩单元可以例如与未压缩的表共存。 各种技术允许数据库服务器优化对压缩单元中的数据的访问,使得压缩对于用户实际上是透明的。

    COMPRESSION ANALYZER
    3.
    发明申请
    COMPRESSION ANALYZER 有权
    压缩分析仪

    公开(公告)号:US20100281079A1

    公开(公告)日:2010-11-04

    申请号:US12769508

    申请日:2010-04-28

    IPC分类号: G06F17/30

    CPC分类号: H03M7/30 G06F17/30595

    摘要: Techniques are described herein for automatically selecting the compression techniques to be used on tabular data. A compression analyzer gives users high-level control over the selection process without requiring the user to know details about the specific compression techniques that are available to the compression analyzer. Users are able to specify, for a given set of data, a “balance point” along the spectrum between “maximum performance” and “maximum compression”. The point thus selected is used by the compression analyzer in a variety of ways. For example, in one embodiment, the compression analyzer uses the user-specified balance point to determine which of the available compression techniques qualify as “candidate techniques” for the given set of data. The compression analyzer selects the compression technique to use on a set of data by actually testing the candidate compression techniques against samples from the set of data. After testing the candidate compression techniques against the samples, the resulting compression ratios are compared. The compression technique to use on the set of data is then selected based, in part, on the compression ratios achieved during the compression tests performed on the sample data.

    摘要翻译: 这里描述了用于自动选择要在表格数据上使用的压缩技术的技术。 压缩分析仪为用户提供了对选择过程的高级控制,而不需要用户了解有关压缩分析器可用的特定压缩技术的细节。 用户可以为给定的数据集指定沿“最大性能”和“最大压缩”之间的“平衡点”。 所选择的点由压缩分析器以各种方式使用。 例如,在一个实施例中,压缩分析器使用用户指定的平衡点来确定哪个可用的压缩技术符合给定的数据集合的“候选技术”。 压缩分析仪通过对来自该组数据的样本实际测试候选压缩技术来选择对一组数据使用的压缩技术。 在针对样品测试候选压缩技术之后,比较所得到的压缩比。 然后,部分地基于在对样本数据执行的压缩测试期间实现的压缩比来选择在该组数据上使用的压缩技术。

    Compression analyzer
    4.
    发明授权
    Compression analyzer 有权
    压缩分析仪

    公开(公告)号:US08356060B2

    公开(公告)日:2013-01-15

    申请号:US12769508

    申请日:2010-04-28

    IPC分类号: G06F7/00

    CPC分类号: H03M7/30 G06F17/30595

    摘要: Techniques are described herein for automatically selecting the compression techniques to be used on tabular data. A compression analyzer gives users high-level control over the selection process without requiring the user to know details about the specific compression techniques that are available to the compression analyzer. Users are able to specify, for a given set of data, a “balance point” along the spectrum between “maximum performance” and “maximum compression”. The point thus selected is used by the compression analyzer in a variety of ways. For example, in one embodiment, the compression analyzer uses the user-specified balance point to determine which of the available compression techniques qualify as “candidate techniques” for the given set of data. The compression analyzer selects the compression technique to use on a set of data by actually testing the candidate compression techniques against samples from the set of data. After testing the candidate compression techniques against the samples, the resulting compression ratios are compared. The compression technique to use on the set of data is then selected based, in part, on the compression ratios achieved during the compression tests performed on the sample data.

    摘要翻译: 这里描述了用于自动选择要在表格数据上使用的压缩技术的技术。 压缩分析仪为用户提供了对选择过程的高级控制,而不需要用户了解有关压缩分析器可用的特定压缩技术的细节。 用户可以为给定的数据集指定沿最大性能和最大压缩之间的平衡点。 所选择的点由压缩分析器以各种方式使用。 例如,在一个实施例中,压缩分析器使用用户指定的平衡点来确定哪些可用的压缩技术被鉴定为用于给定的一组数据的候选技术。 压缩分析仪通过对来自该组数据的样本实际测试候选压缩技术来选择对一组数据使用的压缩技术。 在针对样品测试候选压缩技术之后,比较所得到的压缩比。 然后,部分地基于在对样本数据执行的压缩测试期间实现的压缩比来选择在该组数据上使用的压缩技术。

    Techniques for compression and processing optimizations by using data transformations
    5.
    发明授权
    Techniques for compression and processing optimizations by using data transformations 有权
    使用数据转换进行压缩和处理优化的技术

    公开(公告)号:US08239421B1

    公开(公告)日:2012-08-07

    申请号:US12871862

    申请日:2010-08-30

    IPC分类号: G06F17/20

    CPC分类号: H03M7/30 H03M7/3084

    摘要: Described herein are compression and processing optimizations by using data transformation techniques. In example embodiments, a byte-wise differential transformation is applied to columnar data represented as a list of length-value pairs to determine a list of delta pairs that is subsequently compressed and stored on persistent storage. A length separation transformation is applied to separate a list of length-value pairs into a length array and a corresponding data value array, where these two arrays are subsequently compressed and stored separately on persistent storage. A native number transformation is applied to a set of number values to remove the lengths stored in the number values, where the transformed set is stored on persistent storage instead of the original set of number values. A native datetime-type transformation is applied to a set of datetime values to generate an encoding that is used to encode the set of datetime values into an encoded set that is stored on persistent storage instead of the original set.

    摘要翻译: 这里描述的是使用数据变换技术的压缩和处理优化。 在示例实施例中,将逐字节差分变换应用于表示为长度值对列表的列数据,以确定随后压缩并存储在持久存储器上的增量对列表。 应用长度分离变换将长度值对列表分隔成长度数组和对应的数据值数组,其中这两个数组随后被压缩并分别存储在持久存储器上。 原始数字变换被应用于一组数字值以去除存储在数字值中的长度,其中变换的集合存储在永久存储器上而不是原始的数字集合。 本机datetime类型转换应用于一组datetime值,以生成用于将datetime值集合编码为存储在永久存储而不是原始集合的编码集中的编码。

    STRUCTURE OF HIERARCHICAL COMPRESSED DATA STRUCTURE FOR TABULAR DATA
    6.
    发明申请
    STRUCTURE OF HIERARCHICAL COMPRESSED DATA STRUCTURE FOR TABULAR DATA 审中-公开
    用于数据数据的分层压缩数据结构的结构

    公开(公告)号:US20120143833A1

    公开(公告)日:2012-06-07

    申请号:US13371354

    申请日:2012-02-10

    IPC分类号: G06F7/00

    CPC分类号: G06F16/221

    摘要: A highly flexible and extensible structure is provided for physically storing tabular data. The structure, referred to as a compression unit, may be used to store tabular data that logically resides in any type of table-like structure. According to one embodiment, compression units are recursive. Thus, a compression unit may have a “parent” compression unit to which it belongs, and may have one or more “child” compression units that belong to it. In one embodiment, compression units include metadata that indicates how the tabular data is stored within them. The metadata for a compression unit may indicate, for example, whether the data is stored in row-major or column major-format the order of the columns within the compression unit (which may differ from the logical order of the columns dictated by the definition of their logical container), a compression technique for the compression unit, the child compression units (if any), etc.

    摘要翻译: 提供了高度灵活和可扩展的结构,用于物理存储表格数据。 称为压缩单元的结构可用于存储逻辑上位于任何类型的类似桌面结构的表格数据。 根据一个实施例,压缩单元是递归的。 因此,压缩单元可以具有它所属的“父”压缩单元,并且可以具有属于它的一个或多个“子”压缩单元。 在一个实施例中,压缩单元包括指示表格数据如何被存储在其中的元数据。 压缩单元的元数据例如可以指示数据是以行主列还是列主格式存储在压缩单元内的列的顺序(其可以不同于由定义指定的列的逻辑顺序 的逻辑容器),压缩单元的压缩技术,子压缩单元(如果有的话)等等

    DDL and DML support for hybrid columnar compressed tables
    7.
    发明授权
    DDL and DML support for hybrid columnar compressed tables 有权
    DDL和DML支持混合柱状压缩表

    公开(公告)号:US08583692B2

    公开(公告)日:2013-11-12

    申请号:US12871882

    申请日:2010-08-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30595

    摘要: Techniques for storing and manipulating tabular data are provided. According to one embodiment, a user may control whether tabular data is stored in row-level or column-major format. Furthermore, the user may control the level of data compression to achieve an optimal balance between query performance and compression ratios. Tabular data from within the same table may be stored in both column-major and row-major format and compressed at different levels. In addition, tabular data can migrate between column-major format and row-major format in response to various events. For example, in response to a request to update or lock a row stored in column-major format, the row may be migrated and subsequently stored into row-major format. In one embodiment, table partitions are used to enhance data compression techniques. For example, compression tests are performed on a representative table partition, and a compression map is generated and applied to other table partitions.

    摘要翻译: 提供了存储和操作表格数据的技术。 根据一个实施例,用户可以控制表格数据是以行主列还是列主格式存储。 此外,用户可以控制数据压缩的水平,以实现查询性能和压缩比之间的最佳平衡。 来自同一表格的表格数据可以以列主和行主格式存储,并在不同级别进行压缩。 此外,响应于各种事件,表格数据可以在列主格式和行主格式之间迁移。 例如,响应于更新或锁定以列主格式存储的行的请求,该行可以被迁移并且随后被存储为行主格式。 在一个实施例中,表分区用于增强数据压缩技术。 例如,压缩测试在代表性的表格分区上执行,生成压缩映射并将其应用于其他表格分区。

    DDL and DML support for hybrid columnar compressed tables

    公开(公告)号:US08521784B2

    公开(公告)日:2013-08-27

    申请号:US12871882

    申请日:2010-08-30

    IPC分类号: G06F17/30

    摘要: Techniques for storing and manipulating tabular data are provided. According to one embodiment, a user may control whether tabular data is stored in row-level or column-major format. Furthermore, the user may control the level of data compression to achieve an optimal balance between query performance and compression ratios. Tabular data from within the same table may be stored in both column-major and row-major format and compressed at different levels. In addition, tabular data can migrate between column-major format and row-major format in response to various events. For example, in response to a request to update or lock a row stored in column-major format, the row may be migrated and subsequently stored into row-major format. In one embodiment, table partitions are used to enhance data compression techniques. For example, compression tests are performed on a representative table partition, and a compression map is generated and applied to other table partitions.

    Structure of hierarchical compressed data structure for tabular data
    9.
    发明授权
    Structure of hierarchical compressed data structure for tabular data 有权
    表格数据的分层压缩数据结构的结构

    公开(公告)号:US08935223B2

    公开(公告)日:2015-01-13

    申请号:US12617669

    申请日:2009-11-12

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30315

    摘要: A highly flexible and extensible structure is provided for physically storing tabular data. The structure, referred to as a compression unit, may be used to store tabular data that logically resides in any type of table-like structure. According to one embodiment, compression units are recursive. Thus, a compression unit may have a “parent” compression unit to which it belongs, and may have one or more “child” compression units that belong to it. In one embodiment, compression units include metadata that indicates how the tabular data is stored within them. The metadata for a compression unit may indicate, for example, whether the data is stored in row-major or column major-format, the order of the columns within the compression unit (which may differ from the logical order of the columns dictated by the definition of their logical container), a compression technique for the compression unit, the child compression units (if any), etc.

    摘要翻译: 提供了高度灵活和可扩展的结构,用于物理存储表格数据。 称为压缩单元的结构可用于存储逻辑上位于任何类型的类似桌面结构的表格数据。 根据一个实施例,压缩单元是递归的。 因此,压缩单元可以具有它所属的“父”压缩单元,并且可以具有属于它的一个或多个“子”压缩单元。 在一个实施例中,压缩单元包括指示表格数据如何被存储在其中的元数据。 压缩单元的元数据例如可以指示数据是以主要还是列主格式存储的,压缩单元内的列的顺序(其可以不同于由 其逻辑容器的定义),压缩单元的压缩技术,子压缩单元(如果有的话)等等

    LAZY OPERATIONS ON HIERARCHICAL COMPRESSED DATA STRUCTURE FOR TABULAR DATA
    10.
    发明申请
    LAZY OPERATIONS ON HIERARCHICAL COMPRESSED DATA STRUCTURE FOR TABULAR DATA 审中-公开
    用于数据数据的分层压缩数据结构的LAZY操作

    公开(公告)号:US20120117038A1

    公开(公告)日:2012-05-10

    申请号:US13296435

    申请日:2011-11-15

    IPC分类号: G06F7/00

    CPC分类号: G06F16/221

    摘要: A highly flexible and extensible structure is provided for physically storing tabular data. The structure, referred to as a compression unit, may be used to physically store tabular data that logically resides in any type of table-like structure. Techniques are employed to avoid changing tabular data within existing compression units. Deleting tabular data within compression units is avoided by merely tracking deletion requests, without actually deleting the data. Inserting new tabular data into existing compression units is avoided by storing the new data external to the compression units. If the number of deletions exceeds a threshold, and/or the number of new inserts exceeds a threshold, new compression units may be generated. When new compression units are generated, the previously-existing compression units may be discarded to reclaim storage, or retained to allow reconstruction of prior states of the tabular data.

    摘要翻译: 提供了高度灵活和可扩展的结构,用于物理存储表格数据。 称为压缩单元的结构可以用于物理地存储逻辑上驻留在任何类型的类似桌面结构中的表格数据。 采用技术来避免在现有压缩单元内改变表格数据。 只有跟踪删除请求才能避免在压缩单元内删除表格数据,而不会实际删除数据。 通过将新的数据存储在压缩单元外部来避免将新的表格数据插入现有的压缩单元。 如果删除的数量超过阈值,和/或新插入的数量超过阈值,则可能产生新的压缩单元。 当生成新的压缩单元时,先前存在的压缩单元可能被丢弃以回收存储,或保留以允许重建表格数据的先前状态。