Abstract:
In business systems, one or more methods can be used to reduce an amount of redundant data. In one implementation, a method to reduce redundancy within a data model in a database, in which the data model is represented by at least one table, includes determining a number of distinct values of partial keys in a table. Each partial key represents at least one row in the table. The method includes reordering one or more columns of the table by cardinality of partial keys, in which the cardinality of a partial key represents a number of distinct values of the partial key. The method further includes determining whether pairs of partial keys are functionally dependent and eliminating one or more columns having functional dependencies from the table.
Abstract:
Methods and apparatus, including computer program products, for compression of tables based on occurrence of values. In general, a number representing an amount of occurrences of a frequently occurring value in a group of adjacent rows of a column is generated, a vector representing whether the frequently occurring value exists in a row of the column is generated, and the number and the vector are stored to enable searches of the data represented by the number and the vector. The vector may omit a portion representing the group of adjacent rows. The values may be dictionary-based compression values representing business data such as business objects. The compression may be performed in-memory, in parallel, to improve memory utilization, network bandwidth consumption, and processing performance.
Abstract:
A query having multiple parts may be processed to form an intermediate results set. This intermediate results set may be partitioned into a plurality of groups. Thereafter, the groups may be sorted into a plurality of containers so that each container contains data sufficient to calculate one requested result in the multipart query. Related techniques, apparatuses, systems, and computer program products are also described.
Abstract:
Methods and apparatus, including computer systems and program products, for executing a query on a subset of data, for example, to facilitate a fast search with a very large result set. In one general aspect, a method of executing a query includes receiving a query for execution on data in the data repository; generating an estimate of a number of results of the query; defining a subset of data in the data repository; determining whether to execute the query on the subset of the data; executing the query on the subset of the data to generate a partial set of results if the query is to be executed on the subset of the data, otherwise executing the query on the data repository to generate a complete set of results; and providing query results.
Abstract:
A method is disclosed for modeling application-level objects in terms of join graphs defined over tables containing structured data residing in a relational database. In accordance with the disclosed metamodel, each object is modeled logically as a join graph. A query received from an application that requests the return of objects meeting certain conditions is reformulated to refer to the metamodel. The metamodel includes an index structure having a plurality of indexes and a set of join conditions that specify relationships between the indexes. Some series of join conditions form join paths, such that each join path originates on an anchor table and ends on a table corresponding to one of the plurality of indexes. The metamodel further includes at least one view representing a subgraph of the join graph having at least one anchor table as a key.
Abstract:
Methods and apparatus, including computer program products, for block compression of tables with repeated values. In general, value identifiers representing a compressed column of data may be sorted to render repeated values contiguous, and block dictionaries may be generated. A block dictionary may be generated for each block of value identifiers. Each block dictionary may include a list of block identifiers, where each block identifier is associated with a value identifier and there is a block identifier for each unique value in a block. Blocks may have standard sizes and block dictionaries may be reused for multiple blocks.
Abstract:
A compression manager may store, within an index vector, a plurality of value identifiers (IDs), each value ID representing a value within a database. A page generator may designate a number of the value IDs as defining a page within the index vector, so that the index vector includes a plurality of pages, each page including the number of value IDs. The page generator may store the index vector in a secondary memory of a main memory database. An iterator may access a requested value ID, and a page loader may load a corresponding page of the index vector that contains the requested value ID into the main memory database.
Abstract:
Methods and apparatus, including computer program products, for compression of tables based on occurrence of values. In general, a number representing an amount of occurrences of a frequently occurring value in a group of adjacent rows of a column is generated, a vector representing whether the frequently occurring value exists in a row of the column is generated, and the number and the vector are stored to enable searches of the data represented by the number and the vector. The vector may omit a portion representing the group of adjacent rows. The values may be dictionary-based compression values representing business data such as business objects. The compression may be performed in-memory, in parallel, to improve memory utilization, network bandwidth consumption, and processing performance.
Abstract:
A compression manager may store, within an index vector, a plurality of value identifiers (IDs), each value ID representing a value within a database. A page generator may designate a number of the value IDs as defining a page within the index vector, so that the index vector includes a plurality of pages, each page including the number of value IDs. The page generator may store the index vector in a secondary memory of a main memory database. An iterator may access a requested value ID, and a page loader may load a corresponding page of the index vector that contains the requested value ID into the main memory database.
Abstract:
Methods and apparatus, including computer program products, for compression of tables based on occurrence of values. In general, a number representing an amount of occurrences of a frequently occurring value in a group of adjacent rows of a column is generated, a vector representing whether the frequently occurring value exists in a row of the column is generated, and the number and the vector are stored to enable searches of the data represented by the number and the vector. The vector may omit a portion representing the group of adjacent rows. The values may be dictionary-based compression values representing business data such as business objects. The compression may be performed in-memory, in parallel, to improve memory utilization, network bandwidth consumption, and processing performance.