Abstract:
A bit vector having a bit vector length is accessed. A select operator directory tree can be generated using the bit vector. The select operator directory tree includes a first level of superblocks including large superblocks and small superblocks, a second level of blocks including large blocks and small blocks, each block associated with one of the superblocks, and a third level of sub-blocks, each sub-block associated with a block. The large superblocks each have, a length greater than a first constant that is independent of the bit vector length and the large blocks each have a length greater than a second constant that is independent of the bit vector length. The select operator directory tree can be stored. Related apparatus, systems, techniques and articles are also described.
Abstract:
A method and system, the system including a plurality of machines each having a processor and a main memory component; a shared distributed storage facility storing a set of data and accessible by the plurality of machines over a communication network; a controller to select, in response to a state of a query execution plan comprising a plurality of executable jobs for the set of data, which one of a set of scheduling algorithms to execute; an execution engine to execute the selected scheduling algorithm to determine, for each job in the plurality of jobs, which server to schedule to execute the respective job; and providing an indication of the scheduling of the servers determined to be schedules for the execution of the jobs.
Abstract:
A system includes reception of a first write request from a client including a first key and a first one or more stream, payload pairs associated with the first key, copying of the first key and the first one or more stream, payload pairs to a first buffer of a volatile memory, storage of data of the first buffer in one or more blocks of a raw block non-volatile memory device, providing of the first buffer to a stream store server, reception of the first buffer at the stream store server, adding of the first key and the first one or more stream, payload pairs to a second buffer of the volatile memory, in key-order, storage of the data of the second buffer in a filesystem storage device, according to stream, and transmission of an indication of the durability of the key to the tail store server.
Abstract:
A plus-minus-one array in which adjacent entries vary by no more than positive one and no less than negative one is accessed. A range minimum query directory tree including blocks and subblocks of the plus-minus-one array is determined. Blocks are contained in the plus-minus-one array and subblocks are contained in the blocks. A data structure characterizing positions of minimum elements within the range minimum query directory tree is generated. The characterization includes positions of minimums within each subblock, between subblocks in a respective block, within each block, and between blocks. The data structure is stored. Related apparatus, systems, techniques and articles are also described.
Abstract:
A method includes a primary storage unit receiving a first write request including a first key and a first value; persisting the first value in a first non-volatile memory in association with the first key; broadcasting the first write request and a first set of globally-durable keys to secondary storage units; receiving, from the secondary storage units, an acknowledgement of the first write request and a first set of locally-durable keys, each of the first sets of locally-durable keys including the first key; the primary storage unit receiving a second write request including a second key and a second value; persisting the second value in the first non-volatile memory in association with the second key; and broadcasting the second write request and a second set of globally-durable keys to the secondary storage units, the second set of locally-durable keys including the first key. A system is also disclosed.
Abstract:
Disclosed herein are system, method, and computer program product embodiments for quorum-based replication of data records. In one embodiment, a read request for reading a record is received from a user node on a replica node of a cluster of replica nodes. The record is then determined not committed on the replica node. In response to the determination, an update message indicative of whether the number of replica nodes on which the record is durable exceeds a threshold is received on the replica node. In response to the number of replica nodes exceeds the threshold, a value of the record on the replica node is transmitted to the user node.
Abstract:
A plus-minus-one array in which adjacent entries vary by no more than positive one and no less than negative one is accessed. A range minimum query directory tree including blocks and subblocks of the plus-minus-one array is determined. Blocks are contained in the plus-minus-one array and subblocks are contained in the blocks. A data structure characterizing positions of minimum elements within the range minimum query-directory tree is generated. The characterization includes positions of minimums within each subblock, between subblocks in a respective block, within each block, and between blocks. The data structure is stored. Related apparatus, systems, techniques and articles are also described.
Abstract:
A bit vector having a bit vector length is accessed. A select operator directory tree can be generated using the bit vector. The select operator directory tree includes a first level of superblocks including large superblocks and small superblocks, a second level of blocks including large blocks and small blocks, each block associated with one of the superblocks, and a third level of sub-blocks, each sub-block associated with a block. The large superblocks each have, a length greater than a first constant that is independent of the bit vector length and the large blocks each have a length greater than a second constant that is independent of the bit vector length. The select operator directory tree can be stored. Related apparatus, systems, techniques and articles are also described.
Abstract:
A system includes reception of a first write request from a client including a first key and a first one or more stream, payload pairs associated with the first key, copying of the first key and the first one or more stream, payload pairs to a first buffer of a volatile memory, storage of data of the first buffer in one or more blocks of a raw block non-volatile memory device, providing of the first buffer to a stream store server, reception of the first buffer at the stream store server, adding of the first key and the first one or more stream, payload pairs to a second buffer of the volatile memory, in key-order, storage of the data of the second buffer in a filesystem storage device, according to stream, and transmission of an indication of the durability of the key to the tail store server.
Abstract:
Systems and methods include a set of delta copies received from cluster node replicas of a replica set and stored on a main data storage on the cloud. A cloud storage service internally replicates the data from the delta copies and provides fault-tolerance and high availability against storage failures. All cluster node replicas participate in a delta copies merge. Each replica writes their deltas to an independent location in a shared storage on the cloud. Then, the delta merge includes deltas from all replicas when building a new main storage. This ensures that the data from all replicas are included in the delta merge.