Abstract:
Herein are resource-constrained techniques that plan ahead for resiliently moving pluggable databases between container databases after a failure in a high-availability database cluster. In an embodiment that has a database cluster that hierarchically contains many pluggable databases in many container databases in many virtual machines, a computer identifies many alternative placements that respectively assign each pluggable database instance (PDB) to a respective container database management system (CDBMS). For each alternative placement, a respective placement score is calculated based on the PDBs and the CDBMSs. Based on the placement scores of the alternative placements, a particular placement is selected with a best placement score that indicates optimal resilience for accommodating adversity such as failover and overcrowding.
Abstract:
A shared-nothing database system is provided in which parallelism and workload balancing are increased by assigning the rows of each table to “slices”, and storing multiple copies (“duplicas”) of each slice across the persistent storage of multiple nodes of the shared-nothing database system. When the data for a table is distributed among the nodes of a shared-nothing system in this manner, requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the “primary duplica”. All DML operations (e.g. inserts, deletes, updates, etc.) that target a particular row of the table are performed by the node that has the primary duplica of the slice to which the particular row is assigned. The changes made by the DML operations are then propagated from the primary duplica to the other duplicas (“secondary duplicas”) of the same slice.
Abstract:
Herein are resource-constrained techniques that plan ahead for resiliently moving pluggable databases between container databases after a failure in a high-availability database cluster. In an embodiment that has a database cluster that hierarchically contains many pluggable databases in many container databases in many virtual machines, a computer identifies many alternative placements that respectively assign each pluggable database instance (PDB) to a respective container database management system (CDBMS). For each alternative placement, a respective placement score is calculated based on the PDBs and the CDBMSs. Based on the placement scores of the alternative placements, a particular placement is selected with a best placement score that indicates optimal resilience for accommodating adversity such as failover and overcrowding.
Abstract:
A shared-nothing database system is provided in which parallelism and workload balancing are increased by assigning the rows of each table to “slices”, and storing multiple copies (“duplicas”) of each slice across the persistent storage of multiple nodes of the shared-nothing database system. When the data for a table is distributed among the nodes of a shared-nothing system in this manner, requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the “primary duplica”. All DML operations (e.g. inserts, deletes, updates, etc.) that target a particular row of the table are performed by the node that has the primary duplica of the slice to which the particular row is assigned. The changes made by the DML operations are then propagated from the primary duplica to the other duplicas (“secondary duplicas”) of the same slice.
Abstract:
A shared-nothing database system is provided in which parallelism and workload balancing are increased by assigning the rows of each table to “slices”, and storing multiple copies (“duplicas”) of each slice across the persistent storage of multiple nodes of the shared-nothing database system. When the data for a table is distributed among the nodes of a shared-nothing system in this manner, requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the “primary duplica”. All DML operations (e.g. inserts, deletes, updates, etc.) that target a particular row of the table are performed by the node that has the primary duplica of the slice to which the particular row is assigned. The changes made by the DML operations are then propagated from the primary duplica to the other duplicas (“secondary duplicas”) of the same slice.
Abstract:
A hashing scheme includes a cache-friendly, latchless, non-blocking dynamically resizable hash index with constant-time lookup operations that is also amenable to fast lookups via remote memory access. Specifically, the hashing scheme provides each of the following features: latchless reads, fine grained lightweight locks for writers, non-blocking dynamic resizability, cache-friendly access, constant-time lookup operations, amenable to remote memory access via RDMA protocol through one sided read operations, as well as non-RDMA access.
Abstract:
A method and apparatus for reconfiguring hardware structures to pipeline the execution of multiple special purpose hardware implemented functions, without saving intermediate results to memory, is provided. Pipelining functions in a program is typically performed by a first function saving its results (the “intermediate results”) to memory, and a second function subsequently accessing the memory to use the intermediate results as input. Saving and accessing intermediate results stored in memory incurs a heavy performance penalty, requires more power, consumes more memory bandwidth, and increases the memory footprint. Due to the ability to redirect the input and output of the hardware structures, intermediate results are passed directly from one special purpose hardware implemented function to another without storing the intermediate results in memory. Consequently, a program that utilizes the method or apparatus, reduces power consumption, consumes less memory bandwidth, and reduces the program's memory footprint.
Abstract:
Techniques for processing a query are provided. One or more operations that are required to process a query are performed by a coprocessor that is separate from a general purpose microprocessor that executes query processing software. The query processing software receives a query, determines one or more operations that are required to be executed to fully process the query, and issues one or more commands to one or more coprocessors that are programmed to perform one of the operations, such as a table scan operation and/or a lookup operation. The query processing software obtains results from the coprocessor(s) and performs one or more additional operations thereon to generate a final result of the query.
Abstract:
One or more engine instances are executed on each host to form an engine cluster. A plurality of control instances are executed on a first set of hosts to form a control cluster and comprise a control instance leader and one or more control instance followers. In response to a first host indicating a failure of a neighbor host, a pair-wise focused investigation is initiated to check peer-to-peer connections between the first host and the neighbor host. In response to one or more additional hosts indicating failures of neighbor hosts while the pair-wise focused investigation is being performed, a wide investigation is performed to check connections between the control cluster and the plurality of hosts. One or more hosts are added to an eviction list and an eviction protocol is performed to evict the one or more hosts from the engine cluster using the eviction list.
Abstract:
One or more engine instances are executed on each host to form an engine cluster. A plurality of control instances are executed on a first set of hosts to form a control cluster and comprise a control instance leader and one or more control instance followers. In response to a first host indicating a failure of a neighbor host, a pair-wise focused investigation is initiated to check peer-to-peer connections between the first host and the neighbor host. In response to one or more additional hosts indicating failures of neighbor hosts while the pair-wise focused investigation is being performed, a wide investigation is performed to check connections between the control cluster and the plurality of hosts. One or more hosts are added to an eviction list and an eviction protocol is performed to evict the one or more hosts from the engine cluster using the eviction list.