摘要:
System, method and computer program products for storing data by computing a plurality of hash functions of data values in a data item, and determining a corresponding memory location for one of the plurality of hash functions of data values in the data item. Each memory location is of a cacheline size wherein a data item is stored in a memory location. Each memory location can store a plurality of data items. A key portion of all data items is contiguously stored within the memory location, and a payload portion is contiguously stored within the memory location. Payload portions are packed as bit-aligned in a fixed-sized memory location, comprising a bucket in a bucketized hash table, each bucket sized to store multiple key portions and payload portions that are packed as bit-aligned in a fixed-sized bucket. Corresponding key portions are stored as compressed keys in said fixed-sized bucket.
摘要:
Embodiments of the present invention provide query processing for column stores by accumulating table record attributes during application of query plan operators on a table. The attributes and associated attribute values are compacted when said attribute values are to be consumed for an operation in the query plan, during the execution of the query plan. Table column record values are materialized late in query plan execution.
摘要:
Embodiments of the present invention provide query processing for column stores by accumulating table record attributes during application of query plan operators on a table. The attributes and associated attribute values are compacted when said attribute values are to be consumed for an operation in the query plan, during the execution of the query plan. Table column record values are materialized late in query plan execution.
摘要:
A query processing method intersects two or more unsorted lists based on a conjunction of predicates. Each list comprises a union of multiple sorted segments. The method performs lazy segment merging and an adaptive n-ary intersecting process. The lazy segment merging comprises starting with each list being a union of completely unmerged segments, such that lookups into a given list involve separate lookups into each segment of the given list. The method intersects the lists according to the predicates while performing the lazy segment merging, such that the lazy segment merging reads in only those portions of each segment that are needed for the intersecting. As the intersecting proceeds and the lookups are performed, the intersecting selectively merges the segments together, based on a cost-benefit analysis of the cost of merging compared to the benefit produced by reducing a number of lookups.
摘要:
According to one embodiment of the present invention, a method for dictionary encoding data without using three-valued logic is provided. According to one embodiment of the invention, a method includes encoding data in a database table using a dictionary, wherein the data includes values representing NULLs. A query having a predicate is received and the predicate is evaluated on the encoded data, whereby the predicate is evaluated on both the encoded data and on the encoded NULLs.
摘要:
A cell-specific dictionary is applied adaptively to adequate cells, where the cell-specific dictionary subsequently optimizes the handling of frequency-partitioned multi-dimensional data. This includes improved data partitioning with super cells or adjusting resulting cells by sub-dividing very large cells and merging multiple small cells, both of which avoid the highly skewed data distribution in cells and improve the query processing. In addition, more efficient encoding is taught within a cell in case the distinct values that actually appear in that cell are much smaller than the size of the column dictionary.
摘要:
A computer-implemented method for scan sharing across multiple cores in a business intelligence (BI) query. The method includes receiving a plurality of BI queries, storing a block of data in a first cache, scanning the block of data in the first cache against a first batch of queries on a first processor core, and scanning the block of data against a second batch of queries on a second processor core. The first cache is associated with a first processor core. The block of data includes a subset of data stored in an in-memory database (IMDB). The first batch of queries includes two or more of the BI queries. The second batch of queries includes one or more of the BI queries that are not included in the first batch of queries.
摘要:
A system is described for creating compact aggregation working areas for efficient grouping and aggregation using multi-core CPUs. The system implements operations including computing a running aggregate for a group within a business intelligence (BI) query, and identifying a location to store running aggregate information within an aggregation working area of a cache. The aggregation working area includes first and second data structures. The first data structure stores running aggregate information that is associated with a group that is accessed frequently relative to a threshold. The second data structure stores running aggregate information that is associated with a group that is accessed infrequently relative to the threshold. The operations also include storing the running aggregate information in either the first or second data structure of the aggregation working area based on a characterization of the group as a frequently or infrequently accessed group.
摘要:
A novel method is described for applying various hash methods used in conjunction with a query with a Group By clause. A plurality of drawers are identified, wherein each of the drawers is made up of a collection of cells from a single partition of a Group By column and each of the drawers being defined for a specific query. A separate hash table is independently computed for each of the drawers and a hashing scheme (picked from among a plurality of hashing schemes) is independently applied for each of the drawers.
摘要:
A method, system, and article are provided for employment of a hybrid layout of representation of data objects in computer memory. Columns of the database are separated based upon a classification of the columns. A vertical partition in the form of a bank is provided to receive an assignment of one or more data objects identified in the columns. Each bank is sized to be a divisor of a size of an associated hardware register. Assignment of data objects to banks organizes the data in a manner that support efficient query processing that mitigates the quantity of banks required to respond to the query.