摘要:
In a database, a database manager can generate a view, which can be considered as a subset of the database, and which is placed outside the database for use without disturbing the database. However, if the database changes, the views will not reflect those changes, because the views are separate from the database. To solve this problem, a process called “refreshing” keeps the views consistent with the data within the database. But different refreshing approaches are used: some views require immediate refreshing when the database changes, other types can be refreshed at later times, and still other types can be refreshed at different times and intervals. The invention presents a system which keeps data consistent among the views and the database, despite the different times of refreshing undertaken.
摘要:
In a database, a database manager can generate a view, which, in concept, is a subset of the database, which is placed outside the database for use without disturbing the database, and without disturbance by others using the database. The subset, or view, can be understood as a collection of rows, or tuples, of data copied from the database. With views existing, multiple copies of data within the database now exist: the original in the database, and copies in the views. If one of these is changed, without corresponding changes made in the others, then inconsistencies occur, which cannot be tolerated. Under the invention, when a user seeks a lock on a view, indicating that a change may be imminent, the invention locks a superset of the tuples in the database from which the view is derived. A superset is a set which contains the set of tuples of the view, plus possibly others. Thus, more tuples are locked than strictly necessary. The excess locking is tolerated because other benefits are obtained.
摘要:
A method of incrementally maintaining a first materialized view of data in a database, by means of an additional materialized view, first determines whether a cost in time of incrementally maintaining the first materialized view with the additional materialized view is less than the cost of incrementally maintaining the first materialized view without the additional materialized view. The method creates the additional materialized view only if the cost in time is less therewith. Determining whether the cost of employing an additional materialized view is less includes using an expression directed acyclic graph that corresponds to the first materialized view. Another method of determining whether the cost is less includes pruning an expression directed acyclic graph to produce a single expression tree, and using the single expression tree to determine whether the cost is less. Both the expression directed acyclic graph and the single expression tree contain equivalence nodes. One or more possible materialized views are selected by marking the equivalence nodes, and materializing one or more views corresponding to the marked equivalence nodes. One or more possible materialized views are also selected by determining which of the views, if materialized, would result in a lowest cost of incrementally maintaining the first materialized view. The method is also used to reduce the cost in time of maintaining a first materialized view employed to check an integrity constraint of the database.
摘要:
A technique for efficiently joining multiple large tables in a database system which utilizes a join index. The technique uses a join index and minimizes the number of input/output operations while maximizing the use of the small main memory through a buffer allocation process based on the join index entries. The technique uses multi-dimensional partitioning and assigns partition identifiers to each buffer which are used to coordinate the resultant output files when the technique is complete. The output is vertically fragmented with one fragment for each input table which further allows the individual processing of each input table. The technique performs self-joins in a very efficient manner by requiring the records of the input table to be read only once.
摘要:
A system and method for managing a cache includes monitoring a temperature of regions on a secondary storage based on a cumulative cost to access pages from each region of the secondary storage. Similar temperature pages are grouped in logical blocks. Data is written to a cache in a logical block granularity by overwriting cooler blocks with hotter blocks.
摘要:
A system and method for managing a cache includes monitoring a temperature of regions on a secondary storage based on a cumulative cost to access pages from each region of the secondary storage. Similar temperature pages are grouped in logical blocks. Data is written to a cache in a logical block granularity by overwriting cooler blocks with hotter blocks.
摘要:
A technique for efficiently joining multiple large tables in a database system with a processor using a small main memory. The technique utilizes a join index and minimizes the number of Input/Output operations while maximizing the use of the small main memory through a buffer allocation process. Three embodiments of the technique are described all of which use the parallel-merge operation. The first technique, slam-join, is for joining two tables and does not require any pre-allocation of buffers to perform the join operation. The second technique, multi-slam-join, is for joining three or more tables and adds the parallel-merge technique to a join technique which partitions memory to be used for an efficient join operation. The third technique, called parallel-join, processes each input table completely independently using the parallel-merge technique. The parallel-merge technique identifies the lowest value from multiple files and orders all the values from lowest to highest. This enables sequential reading of input files saving I/O operations.
摘要:
Cache sensitive search tree (CSS-tree) index structures for providing improved search and lookup performance compared with conventional searching schemes. The CSS-tree index structures include a directory tree structure which is stored in an array (216) and serves as an index for a sorted array of elements. The nodes (215) in the directory tree structure may be of sizes selected to correspond to the cache line size in the computer system utilizing the CSS-tree index structures. Child nodes (213) within the directory tree structure are located by performing arithmetic operations on array offsets. Thus, it is not necessary to store internal child node pointers, thereby reducing memory storage requirements. In addition, the CSS-tree index structures are organized so that traversing each level in the tree yields good data reference locality, and therefore relatively few cache misses. Thus, the CSS-tree index structures consider cache-related parameters such as reference locality and cache behavior, without requiring substantial additional amounts of memory.
摘要:
A method and apparatus of calculating data cubes is shown in which a data set is partitioned into memory sized data fragments and cuboid tuples are calculated from the data fragments. A search lattice of the data cube is used as a basis for ordering calculations of lower dimensional cuboids in the data cube. Identification of a minimum number of paths through the lattice that is sufficient to traverse all nodes in the lattice is achieved by iteratively duplicating twice all paths in a lower dimensional space, distributing a new attribute to the first duplicate, moving end points from paths of the second duplicate to a corresponding path in the first duplicate and merging the first and second duplicates.
摘要:
A technique for efficiently joining multiple large tables in a database system with a processor using a small main memory. The technique utilizes a join index and minimizes the number of Input/Output operations while maximizing the use of the small main memory through a buffer allocation process. The technique partitions available main memory into buffers and assigns conditions to the buffers to ensure that each buffer will receive a substantially equal amount of data in the join result. The technique then processes each input table separately based on the assigned conditions and sequentially reads and processes each input table. The output is vertically fragmented with one fragment for each input table which further allows the individual processing of each input table. Also described is a method for creating a join index if one is not present.