摘要:
A system and associated method load an input data stream into a multi-dimensional clustering (MDC) table or other structure containing data clustered along one or more dimensions, by assembling blocks of data in a partial block cache in which each partial block is associated with a distinct logical cell. A minimum threshold number of partial blocks may be maintained. Partial blocks may be spilled from the partial block cache to make room for new logical cells. Last partial pages of spilled partial blocks may be stored in a partial page cache to limit I/O if the cell associated with a spilled block is encountered later in the input data stream. Buffers may be reassigned from the partial block cache to the partial page cache if the latter is filled. Parallelism may be employed for efficiency during sorting of input data subsets and during storage of blocks to secondary storage.
摘要:
A method and system for optimizing data redistribution in a database. In one embodiment, the method includes moving, during a first scan, outgoing records from a sending partition to one or more receiving partitions, where free space is created in the sending partition due to the outgoing records leaving the sending partition. The method also includes filling, during the first scan, some of the free space with remaining records that do not leave the sending partition.
摘要:
A system and method for data redistribution. In one embodiment, the method includes dividing data into batches at a sending partition; populating a first data structure with the first pages and the first control information in a first data structure; storing the first data structure in a cache at the sending partition; sending the changes over the network to the receiving partition; receiving a notification that the changes have been successfully stored in the second hard disk at the receiving partition; and storing, in response to the notification, the changes on the first hard disk at the sending partition.
摘要:
A method and system for facilitating an undo operation. In one embodiment, the method includes generating a plurality of control files, where each of the control files is associated with a batch of data that is received from a sending partition during a redistribution process, and where each control file includes a list of pages and corresponding ranges of rows of data that have been appended to the pages. The method also includes writing the control files to a persistent memory for each control file where all of the associated rows of the respective consistency point have been appended to pages and written to a persistent memory. The method also includes, in response to an interruption in the redistribution process, identifying pages and rows to be deleted during an undo operation based on the plurality of control files.