摘要:
A space-efficient system and method for generating an approximate &phgr;-quantile data element of a data set in a single pass over the data set, without a priori knowledge of the size of the data set. The approximate &phgr;-quantile is guaranteed to lie within a user-specified approximation error &egr; of the true quantile being sought with a probability of at least 1−&dgr;, with &dgr; being a user-defined probability of failure. B buffers, each having a capacity of k elements, initially are filled with elements from the data set, with the values of b and k depending on approximation error e and the probability &dgr;. The buffers are then collapsed into an output buffer, with the remaining buffers then being refilled with elements, collapsed (along with the previous output buffer), and so on until the entire data set has been processed and a single output remains. The element of the output corresponding to the approximate quantile is then output as the approximate quantile. In later iterations (when the height of the tree is at least equal to a predetermined height that depends on &dgr; and &egr;), the data is sampled non-uniformly to populate the buffers to render the desired performance. Parallel processors can be used, with the final output buffers of the processors being sent to a collecting processor P0 as input buffers to the collecting processor P0.
摘要:
A system and method for finding an .epsilon.-approximate .phi.-quantile data element of a data set with N data elements in a single pass over the data set. The .epsilon.-approximate .phi.-quantile data element is guaranteed to lie within a user-specified approximation error .epsilon. of a true .phi.-quantile data element being sought. B buffers, each having a capacity of k elements, initially are filled with sorted data elements from the data set, with the values of b and k depending on .epsilon. and N. The buffers are then collapsed into an output buffer, with the remaining buffers then being refilled with data elements, collapsed (along with the previous output buffer), and so on until the entire data set has been processed and a single output buffer remains. A data element of the output buffer corresponding to the .epsilon.-approximate .phi.-quantile is then output as the approximate .phi.-quantile data element. If desired, the system and method can be practiced with sampling to even further reduce the amount of space required to find a desired .epsilon.-approximate .phi.-quantile data element.
摘要:
Provided are techniques for partitioning a physical index into one or more physical partitions; assigning each of the one or more physical partitions to a node in a cluster of nodes; for each received document, assigning an assigned-doc-ID comprising an integer document identifier; and, in response to assigning the assigned-doc-ID to a document, determining a cut-off of assignment of new documents to a current virtual-index-epoch comprising a first set of physical partitions and placing the new documents into a new virtual-index-epoch comprising a second set of physical partitions by inserting each new document to a specific one of the physical partitions in the second set using one or more functions that direct the placement based on one of the assigned-doc-id, a field value derived from a set of fields obtained from the document, and a combination of the assigned-doc-id and the field value.
摘要:
A system and method for joining a build table to a probe table in response to a query for data includes over partitioning the build table into “N” build partitions using a uniform hash function and writing the build partitions into main memory of a database computer. When the main memory becomes full, one or more partitions is selected as a victim partition to be written to disk storage, and the process continues until all build table rows or tuples have either been written into main memory or spilled to disk. Then, a packing algorithm is used to initially designate never-spilled partitions as “winners” and spilled partitions as “losers”, and then to randomly select one or more winners for prospective swapping with one or more losers. The I/O savings associated with each prospective swap is determined and if any savings would be realized, the winners are designated as losers the losers are designated as winners. The swap determination can be made multiple times, e.g., 256, after which losers are moved entirely to disk and winners are moved entirely to memory. At the end of the swapping, probe table rows associated with winner partitions are joined to rows in the winner build partitions while probe table rows associated with loser partitions are spilled to disk. Then, the loser build partitions are written to main memory for joining with corresponding probe table partitions, to undertake the requested join of the build table and probe table in an I/O- and memory-efficient manner.
摘要:
A method, apparatus, and article of manufacture for providing to a signature hash for checking versions of abstract data types. An identifier is constructed for the abstract data type that is substantially unique to the abstract data type, wherein the identifier comprises a concatenation of various attributes for the abstract data type. The constructed identifier is hashed to generate a signature hash value for the abstract data type, which is then stored both in the database and a class definition for the abstract data type. When the class definition is instantiated as a library function, it accesses the abstract data type from the database, and compares the signature hash value from the database and the signature hash value from the class definition in order to verify that the class definition is not outdated. The class definition is outdated when the abstract data type has been altered without the signature hash value being re-generated and re-stored in the database and the class definition.
摘要:
A system and method for joining a build table to a probe table in response to a query for data includes executing a hash loops join of the build table and the probe table. Matched rows are joined and output when the rows match each other by satisfying a join predicate. In an outer join, unmatched rows in the probe table are joined to a NULL build table field values and output, such that all rows of the probe table are output regardless of whether they have matched rows in the build table. In an early-out join, on the other hand, a “match once” table defines the probe table and in response to a query for unique probe table outputs, the joining of a probe table row, once joined and output a first time, to any other rows in the other table is prevented regardless of whether the row might match other rows. In both the hash loops early-out join and the hash loops outer join, when the build table is larger than main memory, the roles of the build and probe tables are reversed.
摘要:
The system, method, and program of this invention avoids potential write/write conflicts and read/write conflicts when a subcomponent of a composite object (e.g., an ADT) is mutated. The embodiments of this invention define a copy semantic for the mutation function. In one embodiment, a copy function is inserted prior to any mutation function. In a another embodiment, a global compile-time analysis is performed to determine if a write/write or read/write conflict exists; and to eliminate redundant copy constructors if a conflict does exist. In a preferred embodiment, only a local analysis is performed during the parsing phase, thereby avoiding a global compile-time analysis. A mutation safe flag is associated with each parse tree node. A read target leaf parse tree node is set to false while non-leaf parse tree nodes (functions) derive their value from an incoming node, except that constructors and copy constructor functions are always true. Whether or not a copy is made of the composite object (i.e., whether or not a copy constructor is inserted) prior to a mutation is determined according to the setting of the mutation safe flags and according to the following. If a mutation safe flag for a mutation function is false, a copy constructor is inserted for the mutated composite object and the mutation safe flag is set to true. In addition, for update and trigger statements, the mutation safe flag for a mutated target is defaulted to true. Furthermore, related update entries are grouped together and a copy is generated for the common target. The generated copy is used as the common target for all of the mutations caused by the update entries grouped together in order to accumulate all of the desired mutations in a same copy of the composite object.
摘要:
Disclosed is a data processing system implemented method, a data processing system and an article of manufacture for executing a query having a union operator. A data processing system implemented method direct the data processing system to execute a query against a database having data objects. The query has sub-queries and having a union operator. The union operator is operable on sub-queries associated with the query. The database is operatively coupled to the data processing system. The data processing system implemented method including grouping the sub-queries of the union operator according to identified structural similarities, the identified structural similarities being based on an analysis of the sub-queries, grouping the data objects of the database according to the grouped sub-queries, replacing the grouped data objects and any sub-queries associated with the grouped data objects with a reference to a representative data object and a representative sub-query, and accessing at least one member of the grouped data objects, the accessing of the at least one member of the grouped data object being based on the reference.
摘要:
Disclosed is a data processing system implemented method, a data processing system and an article of manufacture for executing a query having a union operator. The data processing system implemented method directs the data processing system to process a query against data objects. The data objects are operatively coupled to the data processing system. The query includes a parent operator. The parent operator references a union operator. The union operator references sub-queries. The sub-queries reference the data objects. The data processing system implemented method includes noting a set of partitionings for the union operator, the noted set of partitionings being based on the sub-queries and being based on the data objects reference by the sub-queries, and executing the query having the union operator, the execution of the query being based on the noted set of partitionings and the parent operator.
摘要:
A method, apparatus, and article of manufacture for a computer implemented storage mechanism for persistent objects in a database management system. A statement is executed in a computer. The statement is performed by the computer to manipulate data in a database stored on a data storage device connected to the computer. It is determined that an object is to be stored in an inline buffer. When the object can be entirely stored in the inline buffer, the object is stored in the inline buffer. When the object cannot be entirely stored in the inline buffer, a selected portion of the object is stored in the inline buffer and the remaining portion of the object is stored as a large object.