摘要:
A database server stores compressed units in data blocks of a database. A table (or data from a plurality of rows thereof) is first compressed into a “compression unit” using any of a wide variety of compression techniques. The compression unit is then stored in one or more data block rows across one or more data blocks. As a result, a single data block row may comprise compressed data for a plurality of table rows, as encoded within the compression unit. Storage of compression units in data blocks maintains compatibility with existing data block-based databases, thus allowing the use of compression units in preexisting databases without modification to the underlying format of the database. The compression units may, for example, co-exist with uncompressed tables. Various techniques allow a database server to optimize access to data in the compression unit, so that the compression is virtually transparent to the user.
摘要:
For automatic data placement of database data, a plurality of access-tracking data is maintained. The plurality of access-tracking data respectively corresponds to a plurality of data rows that are managed by a database server. While the database server is executing normally, it is automatically determined whether a data row, which is stored in first one or more data blocks, has been recently accessed based on the access-tracking data that corresponds to that data row. After determining that the data row has been recently accessed, the data row is automatically moved from the first one or more data blocks to one or more hot data blocks that are designated for storing those data rows, from the plurality of data rows, that have been recently accessed.
摘要:
A highly flexible and extensible structure is provided for physically storing tabular data. The structure, referred to as a compression unit, may be used to store tabular data that logically resides in any type of table-like structure. According to one embodiment, compression units are recursive. Thus, a compression unit may have a “parent” compression unit to which it belongs, and may have one or more “child” compression units that belong to it. In one embodiment, compression units include metadata that indicates how the tabular data is stored within them. The metadata for a compression unit may indicate, for example, whether the data is stored in row-major or column major-format the order of the columns within the compression unit (which may differ from the logical order of the columns dictated by the definition of their logical container), a compression technique for the compression unit, the child compression units (if any), etc.
摘要:
A database server stores compressed units in data blocks of a database. A table (or data from a plurality of rows thereof) is first compressed into a “compression unit” using any of a wide variety of compression techniques. The compression unit is then stored in one or more data block rows across one or more data blocks. As a result, a single data block row may comprise compressed data for a plurality of table rows, as encoded within the compression unit. Storage of compression units in data blocks maintains compatibility with existing data block-based databases, thus allowing the use of compression units in preexisting databases without modification to the underlying format of the database. The compression units may, for example, co-exist with uncompressed tables. Various techniques allow a database server to optimize access to data in the compression unit, so that the compression is virtually transparent to the user.
摘要:
For automatic data placement of database data, a plurality of access-tracking data is maintained. The plurality of access-tracking data respectively corresponds to a plurality of data rows that are managed by a database server. While the database server is executing normally, it is automatically determined whether a data row, which is stored in first one or more data blocks, has been recently accessed based on the access-tracking data that corresponds to that data row. After determining that the data row has been recently accessed, the data row is automatically moved from the first one or more data blocks to one or more hot data blocks that are designated for storing those data rows, from the plurality of data rows, that have been recently accessed.
摘要:
A method and apparatus is provided for optimizing queries received by a database system that relies on an intelligent data storage server to manage storage for the database system. Storing compression units in hybrid columnar format, the storage manager evaluates simple predicates and only returns data blocks containing rows that satisfy those predicates. The returned data blocks are not necessarily stored persistently on disk. That is, the storage manager is not limited to returning disc block images. The hybrid columnar format enables optimizations that provide better performance when processing typical database workloads including both fetching rows by identifier and performing table scans.
摘要:
Techniques are described herein for automatically selecting the compression techniques to be used on tabular data. A compression analyzer gives users high-level control over the selection process without requiring the user to know details about the specific compression techniques that are available to the compression analyzer. Users are able to specify, for a given set of data, a “balance point” along the spectrum between “maximum performance” and “maximum compression”. The point thus selected is used by the compression analyzer in a variety of ways. For example, in one embodiment, the compression analyzer uses the user-specified balance point to determine which of the available compression techniques qualify as “candidate techniques” for the given set of data. The compression analyzer selects the compression technique to use on a set of data by actually testing the candidate compression techniques against samples from the set of data. After testing the candidate compression techniques against the samples, the resulting compression ratios are compared. The compression technique to use on the set of data is then selected based, in part, on the compression ratios achieved during the compression tests performed on the sample data.
摘要:
Described herein are compression and processing optimizations by using data transformation techniques. In example embodiments, a byte-wise differential transformation is applied to columnar data represented as a list of length-value pairs to determine a list of delta pairs that is subsequently compressed and stored on persistent storage. A length separation transformation is applied to separate a list of length-value pairs into a length array and a corresponding data value array, where these two arrays are subsequently compressed and stored separately on persistent storage. A native number transformation is applied to a set of number values to remove the lengths stored in the number values, where the transformed set is stored on persistent storage instead of the original set of number values. A native datetime-type transformation is applied to a set of datetime values to generate an encoding that is used to encode the set of datetime values into an encoded set that is stored on persistent storage instead of the original set.
摘要:
A highly flexible and extensible structure is provided for physically storing tabular data. The structure, is referred to as a compression unit, and may be used to physically store tabular data that logically resides in any type of table-like structure. According to one embodiment, compression units are recursive. Thus, a compression unit may have a “parent” compression unit to which it belongs, and may have one or more “child” compression units that belong to it. In one embodiment, compression units include metadata that indicates how the tabular data is stored within them. The metadata for a compression unit may indicate, for example, whether the data within the compression unit is stored in row-major or column major-format (or some combination thereof), the order of the columns within the compression unit (which may differ from the logical order of the columns dictated by the definition of their logical container), a compression technique for the compression unit, the child compression units (if any), etc.
摘要:
Techniques for storing and manipulating tabular data are provided. According to one embodiment, a user may control whether tabular data is stored in row-level or column-major format. Furthermore, the user may control the level of data compression to achieve an optimal balance between query performance and compression ratios. Tabular data from within the same table may be stored in both column-major and row-major format and compressed at different levels. In addition, tabular data can migrate between column-major format and row-major format in response to various events. For example, in response to a request to update or lock a row stored in column-major format, the row may be migrated and subsequently stored into row-major format. In one embodiment, table partitions are used to enhance data compression techniques. For example, compression tests are performed on a representative table partition, and a compression map is generated and applied to other table partitions.