Abstract:
The present technology proposes techniques for generating globally coherent timestamps. This technology may allow distributed systems to causally order transactions without incurring various types of communication delays inherent in explicit synchronization. By globally deploying a number of time masters that are based on various types of time references, the time masters may serve as primary time references. Through an interactive interface, the techniques may track, calculate and record data relative to each time master thus providing the distributed systems with causal timestamps.
Abstract:
The present technology proposes techniques for generating globally coherent timestamps. This technology may allow distributed systems to causally order transactions without incurring various types of communication delays inherent in explicit synchronization. By globally deploying a number of time masters that are based on various types of time references, the time masters may serve as primary time references. Through an interactive interface, the techniques may track, calculate and record data relative to each time master thus providing the distributed systems with causal timestamps.
Abstract:
A system, computer-readable storage medium storing at least one program, and a computer-implemented method for identifying a storage group in a distributed storage system into which data is to be stored is presented. A data structure including information relating to storage groups in a distributed storage system is maintained, where a respective entry in the data structure for a respective storage group includes placement metrics for the respective storage group. A request to identify a storage group into which data is to be stored is received from a computer system. The data structure is used to determine an identifier for a storage group whose placement metrics satisfy a selection criterion. The identifier for the storage group whose placement metrics satisfy the selection criterion is returned to the computer system.
Abstract:
Paxos transactions are pipelined in a distributed database formed by a plurality of replica servers. A leader server is selected by consensus of the replicas, and receives a lock on leadership for an epoch. The leader gets Paxos log numbers for the current epoch, which are greater than the numbers allocated in previous epochs. The leader receives database write requests, and assigns a Paxos number to each request. The leader constructs a proposed transaction for each request, which includes the assigned Paxos number and incorporates the request. The leader transmits the proposed transactions to the replicas. Two or more write requests that access distinct objects in the database can proceed simultaneously. The leader commits a proposed transaction to the database after receiving a plurality of confirmations for the proposed transaction from the replicas. After all the Paxos numbers have been assigned, inter-epoch tasks are performed before beginning a subsequent epoch.
Abstract:
The subject matter described herein provides techniques to ensure that queries of a distributed database observe a consistent read of the database without locking or logging. In this regard, next-write timestamps uniquely identify a set of write transactions whose updates can be observed by reads. By publishing the next-write timestamps from within an extendable time lease and tracking a “safe timestamp,” the database queries can be executed without logging read operations or blocking future write transactions, and clients issuing the queries at the “safe timestamp” observe a consistent view of the database as it exists on or before that timestamp. Aspects of this disclosure also provide for extensions, done cheaply and without the need for logging, to the range of timestamps at which read transactions can be executed.
Abstract:
In a distributed system where a client's call to commit a transaction occurs outside the transaction's lock-hold interval, computation of timestamp information for the transaction is moved to a client library, while ensuring that no conflicting reads or writes are performed between a time of the computation and acquiring all locks for the transaction. The transaction is committed in phases, with each phase being initiated by the client library. Timestamp information is added to the locks to ensure that timestamps are generated during lock-hold intervals. An increased number of network messages is thereby overlapped with a commit wait period in which a write in a distributed database is delayed in time to ensure concurrency in the database.
Abstract:
The various embodiments described herein include methods, devices, and systems for reading and writing data from a database table. In one aspect, a method of reading and writing data from a database table, includes: (1) initiating a write transaction to write data to a first non-key column of a row of the database table, the database table having a plurality of rows, each row comprising a primary key and a plurality of non-key columns; (2) locking the first non-key column; and (3) in accordance with a determination that the second non-key column is not locked, initiating a read transaction to read data from the second non-key column, where initiation of the read transaction occurs prior to completion of the write transaction.
Abstract:
A system, computer-readable storage medium storing at least one program, and a computer-implemented method for identifying a storage group in a distributed storage system into which data is to be stored is presented. A data structure including information relating to storage groups in a distributed storage system is maintained, where a respective entry in the data structure for a respective storage group includes placement metrics for the respective storage group. A request to identify a storage group into which data is to be stored is received from a computer system. The data structure is used to determine an identifier for a storage group whose placement metrics satisfy a selection criterion. The identifier for the storage group whose placement metrics satisfy the selection criterion is returned to the computer system.
Abstract:
A system, computer-readable storage medium storing at least one program, and a computer-implemented method for performing operations on target servers is presented. A request including an operation is received. A set of target servers associated with the operation is identified. The following request processing operations are performed until a predetermined termination condition has been satisfied: a target server in the set of target servers to which the request has not been issued and whose health metrics satisfy health criteria is identified, the request to perform the operation is issued to the target server, and when the request to perform the operation fails at the target server, health metrics for the target server are updated to indicate that the request to perform the operation failed at the target server and health check operation is scheduled to be performed with respect to the target server.
Abstract:
A method reads and writes data from a database table. Each row in the table has a primary key and multiple non-key columns. Each non-key column has one or more column values, and each column value has an associated timestamp that identifies when the column value was stored. The timestamps associated with the column values in each non-key column provide a unique order for the column values. A read transaction is initiated to read from a first non-key column of a first row. A write transaction is in progress that is updating a second non-key column of the first row, where the second non-key column is distinct from the first non-key column. The write transaction holds a lock on the second non-key column of the first row. The method concurrently reads the data from the first non-key column and writes a new column value to the second non-key column.