Abstract:
Methods and apparatus, including computer program products, for block compression of tables with repeated values. In general, value identifiers representing a compressed column of data may be sorted to render repeated values contiguous, and block dictionaries may be generated. A block dictionary may be generated for each block of value identifiers. Each block dictionary may include a list of block identifiers, where each block identifier is associated with a value identifier and there is a block identifier for each unique value in a block. Blocks may have standard sizes and block dictionaries may be reused for multiple blocks.
Abstract:
In a business system, one or more methods can be used to reduce an amount of redundancy in the storage of data. One implementation includes a method of reducing a memory footprint of a database table having multiple rows and one or more columns, in which each of the one or more columns has a cardinality, and the cardinality is a total number of different values in the rows of each column. The method includes comparing the cardinality with a total number of possible values in the rows of at least one column based on a width of the column. The method also includes reducing the width of the column if the cardinality is less than a threshold based on the total number of possible values in the rows of the column.
Abstract:
A first data storage schema in which a characteristic in a first dimension table is mapped by a first table and a second table can be received and a second data storage schema can be received. The first table maps the characteristic to a first object that include attributes to which time information is irrelevant to data processing activities and the second maps the characteristic to a second object that includes attributes to which time information is relevant to data processing activities. The second data storage schema includes a fact table including at least some facts drawn from the first data storage schema and a second dimension table that includes at least some characteristics drawn from at least one of the first object and the second object.
Abstract:
Methods and apparatus, including computer systems and program products, relating to an information management system and aggregating data by performing table scans. In general, in one aspect, the technique includes receiving a query for a response to a search on a database, loading data from the database into memory, filtering the data based on the query to generate a list of results, buffering at least one key figure corresponding to a result, buffering at least one dimension value corresponding to each key figure, aggregating the dimension values to generate an aggregate key, aggregating key figures corresponding to the same aggregate key to generate one or more aggregate key figures, and displaying the response to the search on a display device. Loading the data may include compressing the data. Filtering the data may be performed blockwise.
Abstract:
Methods and apparatus, including computer systems and program products, relating to an information management system and aggregating data by performing table scans. In general, in one aspect, the technique includes receiving a query for a response to a search on a database, loading data from the database into memory, filtering the data based on the query to generate a list of results, buffering at least one key figure corresponding to a result, buffering at least one dimension value corresponding to each key figure, aggregating the dimension values to generate an aggregate key, aggregating key figures corresponding to the same aggregate key to generate one or more aggregate key figures, and displaying the response to the search on a display device. Loading the data may include compressing the data. Filtering the data may be performed blockwise.
Abstract:
The subject matter disclosed herein provides methods for the dual storage of data using an in-memory array and an on-disk page structure. An in-memory array holding a column of data can be maintained. One or more pages can be maintained. Each of the one or more pages can have one or more rows for storing the column of data. Random access can be provided to a subset of the one or more rows by at least loading the subset of rows from the one or more pages to the in-memory array without loading all of the rows from the one or more pages. Related apparatus, systems, techniques, and articles are also described.
Abstract:
Methods and apparatus, including computer program products, for compression of tables based on occurrence of values. In general, a number representing an amount of occurrences of a frequently occurring value in a group of adjacent rows of a column is generated, a vector representing whether the frequently occurring value exists in a row of the column is generated, and the number and the vector are stored to enable searches of the data represented by the number and the vector. The vector may omit a portion representing the group of adjacent rows. The values may be dictionary-based compression values representing business data such as business objects. The compression may be performed in-memory, in parallel, to improve memory utilization, network bandwidth consumption, and processing performance.
Abstract:
In business systems, one or more methods can be used to reduce an amount of redundant data. In one implementation, a method to reduce redundancy within a data model in a database, in which the data model is represented by at least one table, includes determining a number of distinct values of partial keys in a table. Each partial key represents at least one row in the table. The method includes reordering one or more columns of the table by cardinality of partial keys, in which the cardinality of a partial key represents a number of distinct values of the partial key. The method further includes determining whether pairs of partial keys are functionally dependent and eliminating one or more columns having functional dependencies from the table.
Abstract:
Methods and apparatus, including computer program products, for compression of tables based on occurrence of values. In general, a number representing an amount of occurrences of a frequently occurring value in a group of adjacent rows of a column is generated, a vector representing whether the frequently occurring value exists in a row of the column is generated, and the number and the vector are stored to enable searches of the data represented by the number and the vector. The vector may omit a portion representing the group of adjacent rows. The values may be dictionary-based compression values representing business data such as business objects. The compression may be performed in-memory, in parallel, to improve memory utilization, network bandwidth consumption, and processing performance.
Abstract:
In a business system, one or more methods can be used to reduce an amount of redundancy in the storage of data. One implementation includes a method of reducing a memory footprint of a database table having multiple rows and one or more columns, in which each of the one or more columns has a cardinality, and the cardinality is a total number of different values in the rows of each column. The method includes comparing the cardinality with a total number of possible values in the rows of at least one column based on a width of the column. The method also includes reducing the width of the column if the cardinality is less than a threshold based on the total number of possible values in the rows of the column.