Method and system for managing and querying large graphs
    11.
    发明授权
    Method and system for managing and querying large graphs 失效
    用于管理和查询大图的方法和系统

    公开(公告)号:US08645339B2

    公开(公告)日:2014-02-04

    申请号:US13294598

    申请日:2011-11-11

    IPC分类号: G06F17/30 G06F15/16

    CPC分类号: G06F17/30533 G06F17/30958

    摘要: A method, system and computer program product for managing and querying a graph. The method includes the steps of: receiving a graph; partitioning the graph into homogeneous blocks; compressing the homogeneous blocks; and storing the compressed homogeneous blocks in files where at least one of the steps is carried out using a computer device.

    摘要翻译: 用于管理和查询图形的方法,系统和计算机程序产品。 该方法包括以下步骤:接收图形; 将图划分成均匀块; 压制均质块; 以及将压缩的均匀块存储在使用计算机设备执行至少一个步骤的文件中。

    ANALYZING PARALLEL TOPICS FROM CORRELATED DOCUMENTS
    12.
    发明申请
    ANALYZING PARALLEL TOPICS FROM CORRELATED DOCUMENTS 审中-公开
    从相关文件分析平行主题

    公开(公告)号:US20110202484A1

    公开(公告)日:2011-08-18

    申请号:US12708053

    申请日:2010-02-18

    IPC分类号: G06F15/18 G06N5/02

    CPC分类号: G06N7/005

    摘要: Access is obtained to a parallel corpus including a problem corpus and a solution corpus. A first plurality of topics are mined from the problem corpus and a second plurality of topics are mined from the solution corpus. A transition probability from the first plurality of topics to the second plurality of topics is determined, to identify a most appropriate one of the topics from the solution corpus for a given one of the topics from the problem corpus.

    摘要翻译: 获取包含问题语料库和解决方案语料库的并行语料库。 从问题语料库中挖掘出第一多个主题,并从解决方案语料库中挖掘出第二个主题。 确定从第一多个主题到第二多个主题的转移概率,以从问题语料库中的给定一个主题的解语料库中识别最合适的一个主题。

    Method and system for visualization of data set
    14.
    发明授权
    Method and system for visualization of data set 有权
    数据集可视化的方法和系统

    公开(公告)号:US09087117B2

    公开(公告)日:2015-07-21

    申请号:US12917469

    申请日:2010-11-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30601 G06F17/30572

    摘要: The invention provides a method and system for visualization of a data set, the method comprises: dividing the data set into a plurality of information layers based on different information dimensions; and visually processing the plurality of information layers based on different information dimensions, respectively, in order to present respective views of the plurality of information layers. In the present invention, by visualizing the data set through presenting different overviews of the data set from different information dimensions, respectively, the presentation of comprehensive information of the data set to a data set analyst is ensured while distortion of presented contents as well as visual clutter are prevented.

    摘要翻译: 本发明提供了一种用于可视化数据集的方法和系统,该方法包括:基于不同的信息维度将数据集划分成多个信息层; 并且分别基于不同的信息维度可视地处理多个信息层,以便呈现多个信息层的各个视图。 在本发明中,通过分别通过从不同的信息维度呈现数据集的不同概况来可视化数据集,确保了数据集分析器的综合信息的呈现,同时呈现内容以及视觉 杂乱无章。

    Determining the importance of data items and their characteristics using centrality measures
    15.
    发明授权
    Determining the importance of data items and their characteristics using centrality measures 有权
    使用中心性措施确定数据项的重要性及其特征

    公开(公告)号:US08818918B2

    公开(公告)日:2014-08-26

    申请号:US13096220

    申请日:2011-04-28

    IPC分类号: G06F17/30 G06F15/18

    CPC分类号: G06N5/003

    摘要: Computer-implemented methods, systems, and articles of manufacture for determining the importance of a data item. A method includes: (a) receiving a node graph; (b) approximating a number of neighbor nodes of a node; and (c) calculating a average shortest path length of the node to the remaining nodes using the approximation step, where this calculation demonstrates the importance of a data item represented by the node. Another method includes: (a) receiving a node graph; (b) building a decomposed line graph of the node graph; (c) calculating stationary probabilities of incident edges of a node graph node in the decomposed line graph, and (d) calculating a summation of the stationary probabilities of the incident edges associated with the node, where the summation demonstrates the importance of a data item represented by the node. Both methods have at least one step carried out using a computer device.

    摘要翻译: 用于确定数据项的重要性的计算机实现的方法,系统和制造。 一种方法包括:(a)接收节点图; (b)近似一个节点的邻居节点数; 和(c)使用近似步骤计算节点与剩余节点的平均最短路径长度,其中该计算表明由节点表示的数据项的重要性。 另一种方法包括:(a)接收节点图; (b)构建节点图的分解线图; (c)计算分解线图中节点图形节点的入射边缘的固定概率,以及(d)计算与节点相关联的入射边缘的固定概率的总和,其中求和表示数据项的重要性 由节点表示。 两种方法都使用计算机设备进行至少一个步骤。

    Systems and methods for simultaneous summarization of data cube streams
    16.
    发明授权
    Systems and methods for simultaneous summarization of data cube streams 失效
    同时汇总数据立方体流的系统和方法

    公开(公告)号:US07505876B2

    公开(公告)日:2009-03-17

    申请号:US11620679

    申请日:2007-01-07

    IPC分类号: G06F15/00 G06F17/30

    摘要: In an exemplary embodiment, some of the main aspects of the present invention are the following: (i) Data model: We introduce tensor streams to deal with large collections of multi-aspect streams; and (ii) Algorithmic framework: We propose window-based tensor analysis (WTA) to effectively extract core patterns from tensor streams. The tensor representation is related to data cube in On-Line Analytical Processing (OLAP). However, our present invention focuses on constructing simple summaries for each window, rather than merely organizing the data to produce simple aggregates along each aspect or combination of aspects.

    摘要翻译: 在一个示例性实施例中,本发明的一些主要方面如下:(i)数据模型:我们引入张量流以处理多方面流的大集合; 和(ii)算法框架:我们提出基于窗口的张量分析(WTA)来有效地从张量流中提取核心模式。 张量表示与在线分析处理(OLAP)中的数据立方体相关。 然而,我们的本发明专注于为每个窗口构造简单的摘要,而不仅仅是组织数据以沿着每个方面或方面的组合来产生简单的聚合。

    SYSTEMS AND METHODS FOR SIMULTANEOUS SUMMARIZATION OF DATA CUBE STREAMS
    17.
    发明申请
    SYSTEMS AND METHODS FOR SIMULTANEOUS SUMMARIZATION OF DATA CUBE STREAMS 失效
    数据库流程同步总结的系统和方法

    公开(公告)号:US20080168375A1

    公开(公告)日:2008-07-10

    申请号:US11620679

    申请日:2007-01-07

    IPC分类号: G06F3/048

    摘要: In an exemplary embodiment, some of the main aspects of the present invention are the following: (i) Data model: We introduce tensor streams to deal with large collections of multi-aspect streams; and (ii) Algorithmic framework: We propose window-based tensor analysis (WTA) to effectively extract core patterns from tensor streams. The tensor representation is related to data cube in On-Line Analytical Processing (OLAP). However, our present invention focuses on constructing simple summaries for each window, rather than merely organizing the data to produce simple aggregates along each aspect or combination of aspects.

    摘要翻译: 在一个示例性实施例中,本发明的一些主要方面如下:(i)数据模型:我们引入张量流以处理多方面流的大集合; 和(ii)算法框架:我们提出基于窗口的张量分析(WTA)来有效地从张量流中提取核心模式。 张量表示与在线分析处理(OLAP)中的数据立方体相关。 然而,我们的本发明专注于为每个窗口构造简单的摘要,而不仅仅是组织数据以沿着每个方面或方面的组合来产生简单的聚合。

    Multi-faceted visualization of rich text corpora
    18.
    发明授权
    Multi-faceted visualization of rich text corpora 有权
    丰富的文本语料库的多面可视化

    公开(公告)号:US09390194B2

    公开(公告)日:2016-07-12

    申请号:US12872794

    申请日:2010-08-31

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30941 G06F17/30716

    摘要: Methods and apparatus are provided for multi-faceted visualization of rich text corpora. A data set comprising a plurality of entities, facets and relations is visualized by generating a visualization of a plurality of the facets in the data set, wherein the visualization indicates connections along the plurality of the facets in a single view using multi-faceted edges. The entities are instances of a particular concept, the facets are classes of entities and the relations are connections between pairs of the entities. A compound node comprises a representation of a primary entity, surrounded by representations of one or more secondary entities connected by one or more external relations. The internal relations can be represented as edges connecting two facet nodes from different compound nodes and a number of crossings of the edges can be reduced by adjusting a position order of facet nodes. The compound nodes can optionally be rotated based on, for example, a global spring force model to reduce an average length of one or more of the edges and/or to allow edge bundling.

    摘要翻译: 为丰富的文本语料库的多面可视化提供了方法和设备。 包括多个实体,小平面和关系的数据集通过生成数据集中的多个方面的可视化而被可视化,其中可视化表示使用多面边缘的单个视图中沿多个小平面的连接。 实体是特定概念的实例,方面是实体的类,关系是实体对之间的连接。 复合节点包括由一个或多个外部关系连接的一个或多个次实体的表示包围的主实体的表示。 内部关系可以表示为连接来自不同复合节点的两个面节点的边缘,并且可以通过调整小面节点的位置顺序来减少边缘的数量。 复合节点可以基于例如全局弹簧力模型来选择地旋转以减少一个或多个边缘的平均长度和/或允许边缘捆绑。

    Content-based and time-evolving social network analysis
    20.
    发明授权
    Content-based and time-evolving social network analysis 有权
    基于内容和时间的社交网络分析

    公开(公告)号:US08204988B2

    公开(公告)日:2012-06-19

    申请号:US12552812

    申请日:2009-09-02

    IPC分类号: G06F15/173

    摘要: System and method for modeling a content-based network. The method includes finding single mode clusters from among network (sender and recipient) and content dimensions represented as a tensor data structure. The method allows for derivation of useful cross-mode clusters (interpretable patterns) that reveal key relationships among user communities and keyword concepts for presentation to users in a meaningful and intuitive way. Additionally, the derivation of useful cross-mode clusters is facilitated by constructing a reduced low-dimensional representation of the content-based network. Moreover, the invention may be enhanced for modeling and analyzing the time evolution of social communication networks and the content related to such networks. To this end, a set of non-overlapping or possibly overlapping time-based windows is constructed and the analysis performed at each successive time interval.

    摘要翻译: 用于建模基于内容的网络的系统和方法。 该方法包括从网络(发送者和接收者)和表示为张量数据结构的内容维度中找到单一模式集群。 该方法允许推导出有用的交叉模式集群(可解释模式),其显示用户社区之间的关键关系和关键字概念,以有意义和直观的方式呈现给用户。 另外,通过构建基于内容的网络的减少的低维表示来促进有用的交叉模式集群的推导。 此外,本发明可以被增强用于建模和分析社交通信网络的时间演进和与这样的网络有关的内容。 为此,构建一组不重叠或可能重叠的基于时间的窗口,并且在每个连续的时间间隔执行分析。