RECURSIVE AGGLOMERATIVE CLUSTERING OF TIME-STRUCTURED COMMUNICATIONS

    公开(公告)号:US20180329989A1

    公开(公告)日:2018-11-15

    申请号:US15972952

    申请日:2018-05-07

    申请人: Findo, Inc.

    IPC分类号: G06F17/30

    摘要: An example method of document clustering comprises: representing each document of a plurality of documents by a vector comprising a first plurality of real values, wherein each real value of the first plurality of real values reflects a first frequency-based metric of a term comprised by the document; partitioning the plurality of documents into a first set of document clusters based on distances between vectors representing the documents; representing each document cluster of the first set of document clusters by a vector comprising a second plurality of real values, wherein each real value of the second plurality of real values reflects a second frequency-based metric of a term comprised by the document cluster; and partitioning the first set of document clusters into a second set of document clusters based on distances between vectors representing the document clusters of the first set of document clusters.

    UNSTRUCTURED VOLUMETRIC GRID SIMPLIFICATION USING SUB-VOLUME CLUSTERING

    公开(公告)号:US20180096059A1

    公开(公告)日:2018-04-05

    申请号:US15282278

    申请日:2016-09-30

    IPC分类号: G06F17/30

    摘要: A method, apparatus, and program product cluster a plurality of cells of an input unstructured volumetric grid representative of a subsurface volume into a plurality of clusters, simplify a boundary of each cluster and generate an output unstructured volumetric grid representing at least a portion of the input unstructured volumetric grid by generating in the output unstructured volumetric grid a respective cell for each of the plurality of clusters. The resulting output grid may be used to facilitate the generation of visualizations and/or numerical simulations.

    ADJUSTMENT OF DOCUMENT RELATIONSHIP GRAPHS
    4.
    发明申请

    公开(公告)号:US20180039620A1

    公开(公告)日:2018-02-08

    申请号:US15472977

    申请日:2017-03-29

    申请人: Quid, Inc.

    IPC分类号: G06F17/27 G06F17/30

    摘要: Provided is a process of modifying semantic similarity graphs representative of pair-wise similarity between documents in a corpus, the method comprising obtaining a semantic similarity graph that comprises more than 500 nodes and more than 1000 weighted edges, each node representing a document of a corpus, and each edge weight indicating an amount of similarity between a pair of documents corresponding to the respective nodes connected by the respective edge; obtaining an n-gram indicating that edge weights affected by the n-gram are to be increased or decreased; expanding the n-gram to produce a set of expansion n-grams; adjusting edge weights of edges between pairs of documents in which members of the expanded n-gram set co-occur.

    Stochastic document clustering using rare features

    公开(公告)号:US09754023B2

    公开(公告)日:2017-09-05

    申请号:US15018242

    申请日:2016-02-08

    发明人: Joshua Powers

    IPC分类号: G06F17/30

    摘要: Systems, methods, and apparatus for clustering resources using rare features are provided. For example, an environment includes an extraction module, an index module, and a cluster module. The extractions module identifies a set of resources and extracts a plurality of features from the resources. The plurality of features may be rare features. The index module identifies and generates a rare features index. The cluster module identifies at least two resources that share rare features, creates one or more clusters based on the identified at least two resources, and associates resources that share similar features with the one or more clusters. Resources that do not share similar features are not associated with the one or more clusters. Identifying at least two resources that share rare features is based at least upon a threshold.

    Document classification and characterization

    公开(公告)号:US09703863B2

    公开(公告)日:2017-07-11

    申请号:US13794446

    申请日:2013-03-11

    申请人: DiscoverReady LLC

    IPC分类号: G06F17/30 G06Q10/00

    摘要: Data is received that characterizes each of a plurality of documents within a document set. Based on this data, the plurality of documents are grouped into a plurality of stacks using one or more grouping algorithms. A prime document is identified for each stack that includes attributes representative of the entire stack. Subsequently, provision of data is provided that characterizes documents for each stack including at least the identified prime document to at least one human reviewer. User-generated input from the human reviewer is later received that categorized each provided document and data characterizing the user-generated input can then be provided. Related apparatus, systems, techniques and articles are also described.

    Cluster labeling system for documents comprising unstructured text data

    公开(公告)号:US09672279B1

    公开(公告)日:2017-06-06

    申请号:US14501431

    申请日:2014-09-30

    申请人: EMC Corporation

    摘要: An apparatus comprises a processing platform configured to implement a cluster labeling system for documents comprising unstructured text data. The cluster labeling system comprises a clustering module and a visualization module. The clustering module implements a topic model generator and is configured to assign each of the documents to one or more of a plurality of clusters based at least in part on one or more topics identified from the unstructured text data using at least one topic model provided by the topic model generator. The visualization module comprises multiple view generators configured to generate respective distinct visualizations of a selected one of the clusters. The multiple view generators include at least a bigram view generator configured to provide a visualization of a plurality of term pairs from the selected cluster, and a summarization view generator configured to provide a visualization of representative term sequences from the selected cluster.