摘要:
Techniques for storing and manipulating tabular data are provided. According to one embodiment, a user may control whether tabular data is stored in row-level or column-major format. Furthermore, the user may control the level of data compression to achieve an optimal balance between query performance and compression ratios. Tabular data from within the same table may be stored in both column-major and row-major format and compressed at different levels. In addition, tabular data can migrate between column-major format and row-major format in response to various events. For example, in response to a request to update or lock a row stored in column-major format, the row may be migrated and subsequently stored into row-major format. In one embodiment, table partitions are used to enhance data compression techniques. For example, compression tests are performed on a representative table partition, and a compression map is generated and applied to other table partitions.
摘要:
A database server stores compressed units in data blocks of a database. A table (or data from a plurality of rows thereof) is first compressed into a “compression unit” using any of a wide variety of compression techniques. The compression unit is then stored in one or more data block rows across one or more data blocks. As a result, a single data block row may comprise compressed data for a plurality of table rows, as encoded within the compression unit. Storage of compression units in data blocks maintains compatibility with existing data block-based databases, thus allowing the use of compression units in preexisting databases without modification to the underlying format of the database. The compression units may, for example, co-exist with uncompressed tables. Various techniques allow a database server to optimize access to data in the compression unit, so that the compression is virtually transparent to the user.
摘要:
A highly flexible and extensible structure is provided for physically storing tabular data. The structure, referred to as a compression unit, may be used to store tabular data that logically resides in any type of table-like structure. According to one embodiment, compression units are recursive. Thus, a compression unit may have a “parent” compression unit to which it belongs, and may have one or more “child” compression units that belong to it. In one embodiment, compression units include metadata that indicates how the tabular data is stored within them. The metadata for a compression unit may indicate, for example, whether the data is stored in row-major or column major-format the order of the columns within the compression unit (which may differ from the logical order of the columns dictated by the definition of their logical container), a compression technique for the compression unit, the child compression units (if any), etc.
摘要:
A database server stores compressed units in data blocks of a database. A table (or data from a plurality of rows thereof) is first compressed into a “compression unit” using any of a wide variety of compression techniques. The compression unit is then stored in one or more data block rows across one or more data blocks. As a result, a single data block row may comprise compressed data for a plurality of table rows, as encoded within the compression unit. Storage of compression units in data blocks maintains compatibility with existing data block-based databases, thus allowing the use of compression units in preexisting databases without modification to the underlying format of the database. The compression units may, for example, co-exist with uncompressed tables. Various techniques allow a database server to optimize access to data in the compression unit, so that the compression is virtually transparent to the user.
摘要:
Techniques for storing and manipulating tabular data are provided. According to one embodiment, a user may control whether tabular data is stored in row-level or column-major format. Furthermore, the user may control the level of data compression to achieve an optimal balance between query performance and compression ratios. Tabular data from within the same table may be stored in both column-major and row-major format and compressed at different levels. In addition, tabular data can migrate between column-major format and row-major format in response to various events. For example, in response to a request to update or lock a row stored in column-major format, the row may be migrated and subsequently stored into row-major format. In one embodiment, table partitions are used to enhance data compression techniques. For example, compression tests are performed on a representative table partition, and a compression map is generated and applied to other table partitions.
摘要:
Techniques for storing and manipulating tabular data are provided. According to one embodiment, a user may control whether tabular data is stored in row-level or column-major format. Furthermore, the user may control the level of data compression to achieve an optimal balance between query performance and compression ratios. Tabular data from within the same table may be stored in both column-major and row-major format and compressed at different levels. In addition, tabular data can migrate between column-major format and row-major format in response to various events. For example, in response to a request to update or lock a row stored in column-major format, the row may be migrated and subsequently stored into row-major format. In one embodiment, table partitions are used to enhance data compression techniques. For example, compression tests are performed on a representative table partition, and a compression map is generated and applied to other table partitions.
摘要:
A method and apparatus is provided for optimizing queries received by a database system that relies on an intelligent data storage server to manage storage for the database system. Storing compression units in hybrid columnar format, the storage manager evaluates simple predicates and only returns data blocks containing rows that satisfy those predicates. The returned data blocks are not necessarily stored persistently on disk. That is, the storage manager is not limited to returning disc block images. The hybrid columnar format enables optimizations that provide better performance when processing typical database workloads including both fetching rows by identifier and performing table scans.
摘要:
A highly flexible and extensible structure is provided for physically storing tabular data. The structure, referred to as a compression unit, may be used to store tabular data that logically resides in any type of table-like structure. According to one embodiment, compression units are recursive. Thus, a compression unit may have a “parent” compression unit to which it belongs, and may have one or more “child” compression units that belong to it. In one embodiment, compression units include metadata that indicates how the tabular data is stored within them. The metadata for a compression unit may indicate, for example, whether the data is stored in row-major or column major-format, the order of the columns within the compression unit (which may differ from the logical order of the columns dictated by the definition of their logical container), a compression technique for the compression unit, the child compression units (if any), etc.
摘要:
A highly flexible and extensible structure is provided for physically storing tabular data. The structure, referred to as a compression unit, may be used to physically store tabular data that logically resides in any type of table-like structure. Techniques are employed to avoid changing tabular data within existing compression units. Deleting tabular data within compression units is avoided by merely tracking deletion requests, without actually deleting the data. Inserting new tabular data into existing compression units is avoided by storing the new data external to the compression units. If the number of deletions exceeds a threshold, and/or the number of new inserts exceeds a threshold, new compression units may be generated. When new compression units are generated, the previously-existing compression units may be discarded to reclaim storage, or retained to allow reconstruction of prior states of the tabular data.
摘要:
Techniques are described herein for automatically selecting the compression techniques to be used on tabular data. A compression analyzer gives users high-level control over the selection process without requiring the user to know details about the specific compression techniques that are available to the compression analyzer. Users are able to specify, for a given set of data, a “balance point” along the spectrum between “maximum performance” and “maximum compression”. The point thus selected is used by the compression analyzer in a variety of ways. For example, in one embodiment, the compression analyzer uses the user-specified balance point to determine which of the available compression techniques qualify as “candidate techniques” for the given set of data. The compression analyzer selects the compression technique to use on a set of data by actually testing the candidate compression techniques against samples from the set of data. After testing the candidate compression techniques against the samples, the resulting compression ratios are compared. The compression technique to use on the set of data is then selected based, in part, on the compression ratios achieved during the compression tests performed on the sample data.