摘要:
A method, system and computer program product for managing and querying a graph. The method includes the steps of: receiving a graph; partitioning the graph into homogeneous blocks; compressing the homogeneous blocks; and storing the compressed homogeneous blocks in files where at least one of the steps is carried out using a computer device.
摘要:
Access is obtained to a parallel corpus including a problem corpus and a solution corpus. A first plurality of topics are mined from the problem corpus and a second plurality of topics are mined from the solution corpus. A transition probability from the first plurality of topics to the second plurality of topics is determined, to identify a most appropriate one of the topics from the solution corpus for a given one of the topics from the problem corpus.
摘要:
Common sub-process patterns in a plurality of deployed process models may be discovered, and performance measures associated with the sub-process patterns may be computed based on runtime events of the deployed process models. Positive or negative performance patterns among sub-process patterns may be identified and used for creating new process models or improving existing process models.
摘要:
The invention provides a method and system for visualization of a data set, the method comprises: dividing the data set into a plurality of information layers based on different information dimensions; and visually processing the plurality of information layers based on different information dimensions, respectively, in order to present respective views of the plurality of information layers. In the present invention, by visualizing the data set through presenting different overviews of the data set from different information dimensions, respectively, the presentation of comprehensive information of the data set to a data set analyst is ensured while distortion of presented contents as well as visual clutter are prevented.
摘要:
Computer-implemented methods, systems, and articles of manufacture for determining the importance of a data item. A method includes: (a) receiving a node graph; (b) approximating a number of neighbor nodes of a node; and (c) calculating a average shortest path length of the node to the remaining nodes using the approximation step, where this calculation demonstrates the importance of a data item represented by the node. Another method includes: (a) receiving a node graph; (b) building a decomposed line graph of the node graph; (c) calculating stationary probabilities of incident edges of a node graph node in the decomposed line graph, and (d) calculating a summation of the stationary probabilities of the incident edges associated with the node, where the summation demonstrates the importance of a data item represented by the node. Both methods have at least one step carried out using a computer device.
摘要:
In an exemplary embodiment, some of the main aspects of the present invention are the following: (i) Data model: We introduce tensor streams to deal with large collections of multi-aspect streams; and (ii) Algorithmic framework: We propose window-based tensor analysis (WTA) to effectively extract core patterns from tensor streams. The tensor representation is related to data cube in On-Line Analytical Processing (OLAP). However, our present invention focuses on constructing simple summaries for each window, rather than merely organizing the data to produce simple aggregates along each aspect or combination of aspects.
摘要:
In an exemplary embodiment, some of the main aspects of the present invention are the following: (i) Data model: We introduce tensor streams to deal with large collections of multi-aspect streams; and (ii) Algorithmic framework: We propose window-based tensor analysis (WTA) to effectively extract core patterns from tensor streams. The tensor representation is related to data cube in On-Line Analytical Processing (OLAP). However, our present invention focuses on constructing simple summaries for each window, rather than merely organizing the data to produce simple aggregates along each aspect or combination of aspects.
摘要:
Methods and apparatus are provided for multi-faceted visualization of rich text corpora. A data set comprising a plurality of entities, facets and relations is visualized by generating a visualization of a plurality of the facets in the data set, wherein the visualization indicates connections along the plurality of the facets in a single view using multi-faceted edges. The entities are instances of a particular concept, the facets are classes of entities and the relations are connections between pairs of the entities. A compound node comprises a representation of a primary entity, surrounded by representations of one or more secondary entities connected by one or more external relations. The internal relations can be represented as edges connecting two facet nodes from different compound nodes and a number of crossings of the edges can be reduced by adjusting a position order of facet nodes. The compound nodes can optionally be rotated based on, for example, a global spring force model to reduce an average length of one or more of the edges and/or to allow edge bundling.
摘要:
Common sub-process patterns in a plurality of deployed process models may be discovered, and performance measures associated with the sub-process patterns may be computed based on runtime events of the deployed process models. Positive or negative performance patterns among sub-process patterns may be identified and used for creating new process models or improving existing process models.
摘要:
System and method for modeling a content-based network. The method includes finding single mode clusters from among network (sender and recipient) and content dimensions represented as a tensor data structure. The method allows for derivation of useful cross-mode clusters (interpretable patterns) that reveal key relationships among user communities and keyword concepts for presentation to users in a meaningful and intuitive way. Additionally, the derivation of useful cross-mode clusters is facilitated by constructing a reduced low-dimensional representation of the content-based network. Moreover, the invention may be enhanced for modeling and analyzing the time evolution of social communication networks and the content related to such networks. To this end, a set of non-overlapping or possibly overlapping time-based windows is constructed and the analysis performed at each successive time interval.