摘要:
A system that implements a scaleable data storage service may maintain tables in a non-relational data store on behalf of service clients. Each table may include multiple items. Each item may include one or more attributes, each containing a name-value pair. The system may provide an API through which clients can query tables maintained by the service. Items may be partitioned and indexed in a table according to a simple or composite primary key contained in all items in the table. A composite primary key may include a hash key attribute, and a range key attribute. The range key attribute may be usable to order items having the same hash key attribute value, and to partition them dependent on a range of range key attribute values. A query request may specify a logical or mathematical expression dependent on range key attribute values and may be directed to multiple partitions.
摘要:
A system that implements a scalable data storage service may maintain tables in a data store on behalf of storage service clients. The service may maintain table data in multiple replicas of partitions that are stored on respective computing nodes in the system. In response to detecting an anomaly in the system, detecting a change in data volume on a partition or service request traffic directed to a partition, or receiving a service request from a client to split a partition, the data storage service may create additional copies of a partition replica using a physical copy mechanism. The data storage service may issue a split command defined in an API for the data store to divide the original and additional replicas into multiple replica groups, and to configure each replica group to maintain a respective portion of the table data that was stored in the partition before the split.
摘要:
A system that implements a data storage service may maintain tables in a data store on behalf of clients. The service may maintain table data in multiple replicas of partitions of the data that are stored on respective computing nodes in the system. In response to detecting a failure or fault condition, or receiving a service request from a client to move or copy a partition replica, the data store may copy a partition replica to another computing node using a physical copy mechanism. The physical copy mechanism may copy table data from physical storage locations in which it is stored to physical storage locations allocated to a destination replica on the other computing node. During copying, service requests to modify table data may be logged and applied to the replica being copied. A catch-up operation may be performed to apply modification requests received during copying to the destination replica.
摘要:
A system that implements a scalable data storage service may maintain tables in a non-relational data store on behalf of clients. The system may provide a Web services interface through which service requests are received, and an API usable to request that a table be created, deleted, or described; that an item be stored, retrieved, deleted, or its attributes modified; or that a table be queried (or scanned) with filtered items and/or their attributes returned. An asynchronous workflow may be invoked to create or delete a table. Items stored in tables may be partitioned and indexed using a simple or composite primary key. The system may not impose pre-defined limits on table size, and may employ a flexible schema. The service may provide a best-effort or committed throughput model. The system may automatically scale and/or re-partition tables in response to detecting workload changes, node failures, or other conditions or anomalies.
摘要:
A system that implements a scalable data storage service may maintain tables in a non-relational data store on behalf of clients. The system may provide a Web services interface through which service requests are received, and an API usable to request that a table be created, deleted, or described; that an item be stored, retrieved, deleted, or its attributes modified; or that a table be queried (or scanned) with filtered items and/or their attributes returned. An asynchronous workflow may be invoked to create or delete a table. Items stored in tables may be partitioned and indexed using a simple or composite primary key. The system may not impose pre-defined limits on table size, and may employ a flexible schema. The service may provide a best-effort or committed throughput model. The system may automatically scale and/or re-partition tables in response to detecting workload changes, node failures, or other conditions or anomalies.
摘要:
Techniques for producing a gentle reduction in throughput in a distributed service when a node of the service encounters a very large backlog of requests and/or when a previously offline node of the service is brought back online. These techniques may utilize multiple different algorithms to determine an amount of work that the distributed service is able to accept at any given time, rather than a single algorithm.
摘要:
A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of partitions that are stored on respective computing nodes in the system. A master replica for a replica group may increment a membership version indicator for the group, and may propagate metadata (including the membership version indicator) indicating a membership change for the group to other members of the group. Propagating the metadata may include sending a log record containing the metadata to the other replicas to be appended to their respective logs. Once the membership change becomes durable, it may be committed. A replica attempting to become the master of a replica group may determine that another replica in the group has observed a more recent membership version, in which case logs may be synchronized or snipped, or the attempt may be abandoned.
摘要:
A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of various partitions that are stored on respective computing nodes in the system. The system may employ a single master failover protocol, usable when a replica attempts to become the master replica for a replica group of which it is a member. Attempting to become the master replica may include acquiring a lock associated with the replica group, and gathering state information from the other replicas in the group. The state information may indicate whether another replica supports the attempt (in which case it is included in a failover quorum) or stores more recent data or metadata than the replica attempting to become the master (in which case synchronization may be required). If the failover quorum includes enough replicas, the replica may become the master.
摘要:
A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of partitions that are stored on respective computing nodes in the system. The system may split a data partition into two new partitions, and may split the replica group that stored the original partitions into two new replica groups, each storing one of the new partitions. To split the replica group, the master replica may propagate membership changes to the other members of the replica group for adding members to the original replica group and for splitting the expanded replica group into two new replica groups. Subsequent to the split, replicas may attempt to become the master for the original replica group or for a new replica group. If an attempt to become master replica for the original replica group succeeds, the split may fail.
摘要:
A system that provides services to clients may receive and service requests, various ones of which may require different amounts of work. The system may determine whether it is operating in an overloaded or underloaded state based on a current work throughput rate, a target work throughput rate, a maximum request rate, or an actual request rate, and may dynamically adjust the maximum request rate in response. For example, if the maximum request rate is being exceeded, the maximum request rate may be raised or lowered, dependent on the current work throughput rate. If the target or committed work throughput rate is being exceeded, but the maximum request rate is not being exceeded, a lower maximum request rate may be proposed. Adjustments to the maximum request rate may be made using multiple incremental adjustments. Service request tokens may be added to a leaky token bucket at the maximum request rate.