Abstract:
The various embodiments described herein include methods, systems, and devices for managing load in a distributed storage system. In one aspect, a method is performed at a first instance server in the distributed storage system, the first instance server having memory and at least one processor coupled to the memory. The method includes: (i) issuing a first plurality of requests to a second instance server; (ii) obtaining one or more messages from the second instance server in response to the first plurality of requests, the messages indicating a utilization rate of the second instance server; (iii) determining a transaction rate limit for the second instance server based on the utilization rate of the second instance server; and (iv) issuing a second plurality of requests to the second instance server, where the second plurality of requests are issued at a rate no greater than the transaction rate limit.
Abstract:
Managing placement of object replicas is performed at a first instance of a distributed storage system. One or more journals are opened for storage of object chunks. Each journal is associated with a single placement policy. A first object is received comprising at least a first object chunk. The first object is associated with a first placement policy. The first object chunk is stored in a first journal whose associated placement policy matches the first placement policy. The first journal stores only object chunks for objects whose placement policies match the first placement policy. For the first journal, the receiving and storing operations are repeated for multiple objects whose associated placement policies match the first placement policy, until a first termination condition occurs. Then, the first journal is closed. Subsequently, the first journal is replicated to a second instance of the distributed storage system according to the first placement policy.
Abstract:
A method is performed by a device of a group of devices in a distributed data replication system. The method includes storing an index of objects in the distributed data replication system, the index being replicated while the objects are stored locally by the plurality of devices in the distributed data replication system. The method also includes conducting a scan of at least a portion of the index and identifying a redundant replica(s) of the at least one of the objects based on the scan of the index. The method further includes de-duplicating the redundant replica(s), and updating the index to reflect the status of the redundant replica.
Abstract:
A method allocates object replicas in a distributed storage system. The method identifies a plurality of objects in the distributed storage system. Each object has an associated storage policy that specifies a target number of object replicas stored at distinct instances of the distributed storage system. The method identifies an object of the plurality of objects whose number of object replicas exceeds the target number of object replicas specified by the storage policy associated with the object. The method selects a first replica of the object for removal based on last access times for replicas of the object, and transmits a request to a first instance of the distributed storage system that stores the first replica. The request instructs the first instance to remove the first replica of the object.
Abstract:
Systems and methods for controlling access to relationship information in a social network are described. One described method comprises receiving a request from an observer for relationship information indicating at least a first relationship between a target in a social network and a second entity in the social network, identifying at least a first privacy rule for the first relationship, and outputting at least part of the relationship information to the observer if the first privacy rule is satisfied.
Abstract:
A distributed storage system has multiple instances. There is a plurality of local instances, and at least some of the local instances are at physically distinct geographic locations. Each local instance is configured to store data for a non-empty set of blobs in a plurality of data stores having a plurality of distinct data store types. In addition, each local instance stores metadata for the respective set of blobs in a metadata store distinct from the data stores. There is also a plurality of global instances. Each global instance is configured to store data for zero or more blobs in zero or more data stores and store metadata for all blobs stored at any local or global instance. The system selects one global instance to run a replication module that replicates blobs between instances according to blob policies. Some systems also include dynamic replication based on user needs.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for throttling data probabilistically. One of the methods includes receiving, from a client device for a particular entity, a request to process data, determining a size of data to be processed, providing, to a throttler system, a bandwidth assignment request indicating the particular entity and the size of data to be processed, receiving, from the throttler system, a bandwidth assignment for the particular entity to use when serving the request, and probabilistically determining whether to currently serve the request based on the bandwidth assignment, the size of the data to be processed, and an accrued quantity of tokens for the particular entity.
Abstract:
A distributed storage system has a plurality of instances. A computer system simulates the state of the distributed storage system. The system obtains a current state of the distributed storage system and replication policies for objects in the distributed storage system. Each replication policy specifies criteria for placing copies of the relevant objects among the plurality of instances. The system receives proposed modifications to the state of the distributed storage system and simulates the state of the distributed storage system over time based on the current state of the distributed storage system, current statistical trends in the state of the distributed storage system, the replication policies for the objects in the distributed storage system, and the proposed modifications to the state of the distributed storage system. One or more reports are generated relating to time evolution of the state of the distributed storage system based on the simulation.
Abstract:
Managing consistency of object replicas is performed at a first instance of a distributed storage system. The first instance performs garbage collection on a shard that includes a first plurality of object chunks, thereby removing a second plurality of object chunks from the shard. This leaves a third plurality of object chunks in the shard, where the first plurality of object chunks is the union of the second and third pluralities of object chunks. The first instance sends a first list of identifiers to a second instance of the distributed storage system. The second instance has a replica of the shard. The first list of identifiers specifies the object chunks in the third plurality of object chunks. The second instance removes all object chunks from the replica of the shard that are not included in the first list.
Abstract:
In one implementation, groups of objects may be maintained, each group including one or more objects that are to be replicated at one or more of the storage clusters. The objects may be assigned to the groups based on replication choices where at least some of the objects are assigned to multiple ones of the groups. A priority value may be determined and associated with each of the groups, the priority value of a particular group being determined based on priority values associated with objects within the particular group. The objects may be selected, for replication, in a replication order based on the priority value of the groups and replication of the selected objects may be initiated.