摘要:
A distributed data clustering system having an integrator and at least two computing units. Each computing unit is loaded with common global parameter values and a particular local data set. Each computing unit then generates local sufficient statistics based on the local data set and global parameter values. The integrator employs the local sufficient statistics of all the computing units to update the global parameter values.
摘要:
A nested chain of densest subgraphs is derived by a computer from a given graph that has multiple vertices and edges. The two ends of each edge are assigned with respective incident weights, and each vertex is given a vertex weight. A weight balancing process is carried out by the computer to iteratively go through the edges to adjust the incident weights of each edge and the vertex weights of the vertices connected by that edge to reduce a difference between the vertex weights of the two vertices. After the balancing, the vertex weights are put in an ordered sequence according to their values, and a nested chain of densest subgraphs is derived from the ordered sequence.
摘要:
A method is disclosed that includes computing, using at least one uniformly fine-grain data parallel computing unit, a mean-square error regression within a regression clustering algorithm. The mean-square error regression is represented in the form of at least one summation of a vector-vector multiplication. A computer program product and a computer system are also disclosed.
摘要:
One embodiment is a method that uses MapReduce and Relation Valued Functions (RVFs) with parallel processing to search a database and obtain search results.
摘要:
One embodiment is a method that uses MapReduce and Relation Valued Functions (RVFs) with parallel processing to search a database and obtain search results.
摘要:
A representation of a flow network having vertices connected by arcs is provided. The vertices include a first set of vertices that provide flow to a second set of vertices over arcs connecting the first set and second set of vertices. A balancing procedure in the network is performed that includes redistributing flows on arcs incident on the second set of vertices. The balancing procedure includes selecting a batch of the vertices in the second set, and redistributing flows on arcs incident on the selected batch of vertices. The selecting and redistributing are repeated for other batches of vertices in the second set.
摘要:
Densest subgraphs of a graph are determined. The graph includes vertices and edges interconnecting the vertices. Each edge connects two of the vertices and has a weight. The vertices and the edges form subgraphs from which the densest subgraphs are determined as those subgraphs having densities greater than a threshold. Clusters at levels of a hierarchy are determined based on the densest subgraphs. Each cluster includes a set of the vertices and a set of the edges of the graph. Each level corresponds to a different density of the clusters. The hierarchy is ordered from a most-dense level of the clusters to a least-dense level of the clusters.
摘要:
To coordinate tasks executed by a plurality of threads that each includes plural task sections, a call of a mark primitive to mark a first point after a first of the plural task sections is provided. Also, a call of a second primitive is provided to indicate that a second of the plural task sections is not allowed to begin until after the plurality of threads have each reached the first point.
摘要:
The current application discloses a database management system that provides multiple-input, multiple-output-per-input user-defined-function-based operations. The database management system comprises at least one processor and electronic memory, a database-query processor, executed on a computer processor controlled by computer instructions stored in a computer-readable memory, that makes multiple calls to a multiple-input, multiple-output-per-input user-defined-function, in each call transmitting a next input to the multiple-input, multiple-output-per-input user-defined-function, and the multiple-input, multiple-output-per-input user-defined-function, executed on a computer processor controlled by computer instructions stored in a computer-readable memory, that uses three different memory buffers, the contents of which are maintained for three different time periods, to compute and return to the database-query processor multiple outputs in response to at least one of the multiple inputs.
摘要:
Densest subgraphs of a graph are determined. The graph includes vertices and edges interconnecting the vertices. Each edge connects two of the vertices and has a weight. The vertices and the edges form subgraphs from which the densest subgraphs are determined as those subgraphs having densities greater than a threshold. Clusters at levels of a hierarchy are determined based on the densest subgraphs. Each cluster includes a set of the vertices and a set of the edges of the graph. Each level corresponds to a different density of the clusters. The hierarchy is ordered from a most-dense level of the clusters to a least-dense level of the clusters.