Abstract:
Systems and methods offer an efficient approach to computing similarity rankings in bipartite graphs. An example system includes at least one processor and memory storing a bipartite graph having a first set and a second set of nodes, with nodes in the first set being connected to nodes in the second set by edges. The memory also stores instructions that, when executed by the at least one processor, cause the system to assign each node in the second set to one of a plurality of categories and, for each of the plurality of categories, generate a subgraph. The subgraph comprises of a subset of nodes in the first set and edges linking the nodes in the subset, where the nodes in the subset are selected based on connection to a node in the second set that is assigned to the category. The system uses the subgraph to respond to queries.
Abstract:
A system and method for determining matching pairs between social networks is disclosed. The system comprises a matching module that includes an account retrieval engine, candidate pairing module, a match determination module, a social network engine, a personalizing engine and a graphical user interface engine. The candidate pairing module generates candidate pairs of accounts from different social networks that may represent the same user. The match pairing module generates scores for the pairs. The match determination module determines a subset of the pairs that most likely represent the same users.
Abstract:
Systems and methods for improving the time and cost to calculate connected components in a distributed graph are disclosed. One method includes reducing a quantity of map-reduce rounds used to determine a cluster assignment for a node in a large distributed graph by alternating between two hashing functions in the map stage of a map-reduce round and storing the cluster assignment for the node in a memory. Another method includes reducing a quantity of messages sent during map-reduce rounds by performing a predetermined quantity of rounds to generate, for each node, a set of potential cluster assignments, generating a data structure in memory to store a mapping between each node and its potential cluster assignment, and using the data structure during remaining map-reduce rounds, wherein the remaining map-reduce rounds do not send messages between nodes. The method can also include storing the cluster assignment for the node in a memory.
Abstract:
Systems and methods offer an efficient approach to computing similarity rankings in bipartite graphs. An example system includes at least one processor and memory storing a bipartite graph having a first set and a second set of nodes, with nodes in the first set being connected to nodes in the second set by edges. The memory also stores instructions that, when executed by the at least one processor, cause the system to assign each node in the second set to one of a plurality of categories and, for each of the plurality of categories, generate a subgraph. The subgraph comprises of a subset of nodes in the first set and edges linking the nodes in the subset, where the nodes in the subset are selected based on connection to a node in the second set that is assigned to the category. The system uses the subgraph to respond to queries.
Abstract:
Systems and methods for sending asynchronous messages include receiving, using at least one processor, at a node in a distributed graph, a message with a first value and determining, at the node, that the first value replaces a current value for the node. In response to determining that the first value replaces the current value, the method also includes setting a status of the node to active and sending messages including the first value to neighboring nodes. The method may also include receiving the messages to the neighboring nodes at a priority queue. The priority queue propagates messages in an intelligently asynchronous manner, and the priority queue propagates the messages to the neighboring nodes, the status of the node is set to inactive. The first value may be a cluster identifier or a shortest path identifier.
Abstract:
Systems and methods for sending asynchronous messages include receiving, using at least one processor, at a node in a distributed graph, a message with a first value and determining, at the node, that the first value replaces a current value for the node. In response to determining that the first value replaces the current value, the method also includes setting a status of the node to active and sending messages including the first value to neighboring nodes. The method may also include receiving the messages to the neighboring nodes at a priority queue. The priority queue propagates messages in an intelligently asynchronous manner, and the priority queue propagates the messages to the neighboring nodes, the status of the node is set to inactive. The first value may be a cluster identifier or a shortest path identifier.
Abstract:
The disclosure includes a system and method for generating weighted clustering coefficients for a social network graph. The system includes a processor and a memory storing instructions that when executed cause the system to: receive social graph data associated with a social network, the social graph data including nodes, edges that connect the nodes and weights associated with the edges in a social graph, determine a first probability of existence of an edge in the social graph based on the weights, determine a second probability that a first node forms a triangle with two neighbor nodes, and compute a weighted clustering coefficient for the first node based on the first and second probabilities.
Abstract:
Systems and methods for improving the time and cost to calculate connected components in a distributed graph are disclosed. One method includes reducing a quantity of map-reduce rounds used to determine a cluster assignment for a node in a large distributed graph by alternating between two hashing functions in the map stage of a map-reduce round and storing the cluster assignment for the node in a memory. Another method includes reducing a quantity of messages sent during map-reduce rounds by performing a predetermined quantity of rounds to generate, for each node, a set of potential cluster assignments, generating a data structure in memory to store a mapping between each node and its potential cluster assignment, and using the data structure during remaining map-reduce rounds, wherein the remaining map-reduce rounds do not send messages between nodes. The method can also include storing the cluster assignment for the node in a memory.