Abstract:
A distributed storage system has a plurality of instances. A computer system simulates the state of the distributed storage system. The system obtains a current state of the distributed storage system and replication policies for objects in the distributed storage system. Each replication policy specifies criteria for placing copies of the relevant objects among the plurality of instances. The system receives proposed modifications to the state of the distributed storage system and simulates the state of the distributed storage system over time based on the current state of the distributed storage system, current statistical trends in the state of the distributed storage system, the replication policies for the objects in the distributed storage system, and the proposed modifications to the state of the distributed storage system. One or more reports are generated relating to time evolution of the state of the distributed storage system based on the simulation.
Abstract:
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.
Abstract:
In one aspect, a method is provided, including the following method operations: receiving a request to generate a first post data item for display at a first location, the first post data item including a reference to a content item located at a second location; determining, based on the reference to the content item, a content identifier associated with the content item; associating the content identifier with the post data item; retrieving one or more post data items based on the content identifier and displaying the one or more post data items at the second location, the one or more post data items including the first post data item.
Abstract:
In one aspect, a method is provided, including the following method operations: receiving a request to generate a post data item at a first location, the post data item including a reference to a content item located at a second location; accessing a notification tag associated with the content item, the notification tag identifying a destination for notification; and sending a notification to the destination, the notification identifying the post data item.
Abstract:
A method replicates data between instances of a distributed database. The method tracks changes to the distributed database at a first instance by storing deltas. Each delta includes a row identifier that identifies a row having a base value, and a sequence identifier that specifies an order in which the delta is applied to the base value to compute a current value for the row. The method identifies a set of deltas to send to a second instance based in part on an egress map at the first instance, wherein the egress map specifies which combinations of row identifier and sequence identifier have been acknowledged as received at the second instance. The method then transmits the identified set of deltas to the second instance. After receiving acknowledgement that the deltas in the identified set of deltas have been incorporated into the second instance, the egress map is updated accordingly.
Abstract:
A request to generate a first post data item for display at a first location may be received. The first post data item may include a reference to a content item. A content identifier may be determined for the first post data item based on the reference to the content item. One or more additional post data items may be retrieved based on the content identifier where each of the one or more additional post data items includes another reference to the content item. The first post data item and the one or more additional post data items may be provided based on whether an author of the first post data item or one or more additional authors of the one or more additional post data items are in a social graph of a viewer.
Abstract:
In one aspect, a method includes receiving an indication of a request from a user to view a stream associated with the user, generating a request for one or more items visible to the user for display within the stream, the request including a search query identifying search criteria including one or more tokens, the one or more tokens including at least a user token identifying the user, receiving one or more items in response to the request, the one or more items including at least one of the one or more tokens and further being visible to the user and providing the one or more items for display to the user within the stream in response to the request. Other aspects can be embodied in corresponding systems and apparatus, including computer program products.
Abstract:
A method allocates object replicas in a distributed storage system. The method identifies a plurality of objects in the distributed storage system. Each object has an associated storage policy that specifies a target number of object replicas stored at distinct instances of the distributed storage system. The method identifies an object of the plurality of objects whose number of object replicas exceeds the target number of object replicas specified by the storage policy associated with the object. The method selects a first replica of the object for removal based on last access times for replicas of the object, and transmits a request to a first instance of the distributed storage system that stores the first replica. The request instructs the first instance to remove the first replica of the object.
Abstract:
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.
Abstract:
The disclosure includes a system and method for providing a customized stream of content to a user. The system includes: an item sourcer for gathering one or more content items from one or more content sources; a behavior indicator module and scorer for determining one or more behavior scores for the one or more content items; a content indicator module and scorer for determining one or more content scores for the one or more content items; a score combiner for aggregating the one or more behavior scores and the one or more content scores to generate one or more item scores for the one or more content items; a content diversifier for determining one or more diverse items from the one or more content items; and a stream generator for generating a customized stream of content for the user from the one or more diverse items.