Abstract:
Resolving conflicting graph mutations in a distributed computing system. Graph data for at least a partition of a graph is stored in a worker system of a distributed computing system. The graph represents relationships among a set of tangible items that model a real-world condition having an associated problem. A plurality of conflicting mutation requests are received to mutate the graph. A conflict between the mutation requests is resolved with a conflict resolution function that lacks direct access to the graph data. The graph data is updated responsive to a result generated by resolving the conflict using the conflict resolution function.
Abstract:
Data are maintained in a distributed computing system that describe a directed graph representing relationships among items. The directed graph has a plurality of vertices representing the items and has edges with values representing distances between the items connected by the vertices. A multiple reference point algorithm is executed for a plurality of the vertices in the directed graph in parallel for a series of synchronized iterations to determine shortest distances between the vertices and the source vertex. After executing the algorithm on the vertices, value pairs associated with the vertices are aggregated. The aggregated value pairs indicate shortest distances from the respective vertices to the source vertex. The aggregated value pairs are outputted.
Abstract:
Data are received at a worker system in a distributed computing system that describe a graph representing relationships among a set of items. The graph models a condition having an associated problem. The graph has graph components having associated data fields. The received data are stored in a backup table, and the relationships are analyzed to identify a solution to the problem. As part of the analysis, a new value for the data field associated with a graph component is identified and compared with an existing value of the data field, and the data field is modified. The modified data field is stored in a delta table representing a change to the backup table.
Abstract:
A value is distributed in a distributed computing system having a master system in communication with a plurality of worker systems. Partitions of a graph are assigned to the worker systems. The graph represents relationships among a set of tangible items that model a real-world condition having an associated problem. Configuration information is determined that describes a configuration of the distributed computing system. A distribution scheme is selected for distributing a value from the master system to the plurality of worker systems based on the configuration information. The value is distributed from the master system to the worker systems according to the selected distribution scheme. The worker systems are configured to use the value to produce an output representing a solution to the real-world problem.
Abstract:
Executing a confined recovery in a distributed system having a plurality of worker systems including a failed worker system at a current superstep. The confined recovery includes determining states of the partitions of the worker systems during the supersteps preceding the current superstep, and determining a recovery initiation superstep preceding the current superstep in which all messages for recovery initiation superstep are available. The recovery initiation superstep is determined responsive to determining the states of the partitions. Additionally, a recovery set of partitions is determined for which messages in supersteps after the recovery initiation superstep are not available. The worker systems having the partitions in the recovery set are instructed to execute the defined function for the partitions in the recovery set starting at the recovery initiation superstep to recover the lost exchanged messages.
Abstract:
Data are maintained in a distributed computing system that describe a directed graph representing relationships among a set of items. The directed graph models a condition having an associated problem. The directed graph has graph components having associated data fields. The relationships are analyzed to identify a solution to the problem. As part of the analysis, a new value for the data field associated with a graph component is identified responsive to an operation performed during the analysis. The new value is compared with an existing value of the data field, and the data field is modified. The modification may include inserting the new value into an overflow vector of data, and replacing the existing value in the data field with exception information identifying the location of the new value. An exception flag associated with the data field is set to indicate that the exception information is being used.
Abstract:
Instructing a plurality of worker systems in a distributed computing system to perform a checkpoint. Instructing the worker systems includes receiving timing messages from the plurality of worker systems and determining, based on the received timing messages, a common checkpoint time indicating an estimated amount of time to be taken by the plurality of worker systems to write data to the persistent storage for a subsequent checkpoint. The common checkpoint time is used to determine a checkpoint threshold, and responsive to the determined checkpoint threshold, it is determined whether to perform the checkpoint. Responsive to determining to perform the checkpoint, messages are transmitted to the plurality of worker systems instructing the worker systems to perform the checkpoint.
Abstract:
Data are maintained in a distributed computing system that describe a graph. The graph represents relationships among items. The graph has a plurality of vertices that represent the items and a plurality of edges connecting the plurality of vertices. At least one vertex of the plurality of vertices includes a set of label values indicating the at least one vertex's strength of association with a label from a set of labels. The set of labels describe possible characteristics of an item represented by the at least one vertex. At least one edge of the plurality of edges includes a set of label weights for influencing label values that traverse the at least one edge. A label propagation algorithm is executed for a plurality of the vertices in the graph in parallel for a series of synchronized iterations to propagate labels through the graph.
Abstract:
An apparatus and method for vectorization of detected saturation and clipping operations in serial code loops of a source program are described. In one embodiment, the method includes the analysis of source program code to identify source code utilizing conditional constructs to perform saturation/clipping operations. Once analysis is complete, identified source code is vectorized to implement identified saturation/clipping operations utilizing single instruction, multiple data (SIMD) saturation/clipping instructions. Accordingly, utilizing embodiments of the present invention, conditional statements utilized to implement saturation arithmetic, as well as clipping of data values, such as pixel values within graphics applications, are replaced with SIMD saturation arithmetic instructions, as well as clipping instructions.