-
公开(公告)号:US20180329989A1
公开(公告)日:2018-11-15
申请号:US15972952
申请日:2018-05-07
申请人: Findo, Inc.
IPC分类号: G06F17/30
CPC分类号: G06F17/30713 , G06F17/30011 , G06F17/3069
摘要: An example method of document clustering comprises: representing each document of a plurality of documents by a vector comprising a first plurality of real values, wherein each real value of the first plurality of real values reflects a first frequency-based metric of a term comprised by the document; partitioning the plurality of documents into a first set of document clusters based on distances between vectors representing the documents; representing each document cluster of the first set of document clusters by a vector comprising a second plurality of real values, wherein each real value of the second plurality of real values reflects a second frequency-based metric of a term comprised by the document cluster; and partitioning the first set of document clusters into a second set of document clusters based on distances between vectors representing the document clusters of the first set of document clusters.
-
公开(公告)号:US20180096059A1
公开(公告)日:2018-04-05
申请号:US15282278
申请日:2016-09-30
发明人: Milan Frank , Anton Alenov
IPC分类号: G06F17/30
CPC分类号: G06F17/30707 , G06F17/30713 , G06Q50/06
摘要: A method, apparatus, and program product cluster a plurality of cells of an input unstructured volumetric grid representative of a subsurface volume into a plurality of clusters, simplify a boundary of each cluster and generate an output unstructured volumetric grid representing at least a portion of the input unstructured volumetric grid by generating in the output unstructured volumetric grid a respective cell for each of the plurality of clusters. The resulting output grid may be used to facilitate the generation of visualizations and/or numerical simulations.
-
公开(公告)号:US09904727B2
公开(公告)日:2018-02-27
申请号:US15289846
申请日:2016-10-10
发明人: Riham Hassan Abdel-Moneim Mansour , Ahmed Ael Mohamed Abdel Kader Ashour , Hesham Saad Mohamed Abdelwahab El Baz
CPC分类号: G06F17/30663 , G06F17/30011 , G06F17/30613 , G06F17/30713 , G06F17/30716 , G06K9/00 , G06K9/00483 , G06K9/6215 , G06K9/68
摘要: A system for retrieving/identifying a document comprising text stored in a document repository is described. A memory stores a graphical structure comprising a first plurality of nodes each representing a person, and a second plurality of nodes each representing a document in the document repository, the nodes being connected by edges according to automatically observed interactions between the represented people and documents. At least some of the nodes have one or more annotations each denoting a topic. A node relatedness calculator computes distances between nodes of the graphical structure using the topic annotations. An input receives an identifier of a user who is represented by one of the first plurality of nodes. An identifier/retriever identifies one or more documents from the document repository by using the identifier and using the computed distances between nodes.
-
公开(公告)号:US20180039620A1
公开(公告)日:2018-02-08
申请号:US15472977
申请日:2017-03-29
申请人: Quid, Inc.
CPC分类号: G06F17/2785 , G06F17/2775 , G06F17/30011 , G06F17/30705 , G06F17/30713 , G06F17/30958
摘要: Provided is a process of modifying semantic similarity graphs representative of pair-wise similarity between documents in a corpus, the method comprising obtaining a semantic similarity graph that comprises more than 500 nodes and more than 1000 weighted edges, each node representing a document of a corpus, and each edge weight indicating an amount of similarity between a pair of documents corresponding to the respective nodes connected by the respective edge; obtaining an n-gram indicating that edge weights affected by the n-gram are to be increased or decreased; expanding the n-gram to produce a set of expansion n-grams; adjusting edge weights of edges between pairs of documents in which members of the expanded n-gram set co-occur.
-
公开(公告)号:US20180032330A9
公开(公告)日:2018-02-01
申请号:US15069633
申请日:2016-03-14
申请人: Wipro Limited
CPC分类号: G06F8/77 , G06F8/20 , G06F8/60 , G06F17/3069 , G06F17/30707 , G06F17/30713 , G06N5/00 , G06Q10/0631 , G06Q10/101
摘要: A system and method for classifying and resolving software production incident tickets includes receiving an incident ticket, extracting a plurality of keywords from the incident ticket, and deriving a query vector corresponding to the incident ticket based on the plurality of keywords. The system and method further comprises classifying the incident ticket into at least one of a positive mechanization incident ticket and a negative mechanization incident ticket based on a comparison of the query vector and a plurality of vectors derived from a plurality of past incident tickets. The plurality of vectors are derived based on a plurality of keywords and their corresponding occurrences in the plurality of past incident tickets.
-
公开(公告)号:US09754023B2
公开(公告)日:2017-09-05
申请号:US15018242
申请日:2016-02-08
申请人: Securboration, Inc.
发明人: Joshua Powers
IPC分类号: G06F17/30
CPC分类号: G06F17/3071 , G06F17/30705 , G06F17/30713
摘要: Systems, methods, and apparatus for clustering resources using rare features are provided. For example, an environment includes an extraction module, an index module, and a cluster module. The extractions module identifies a set of resources and extracts a plurality of features from the resources. The plurality of features may be rare features. The index module identifies and generates a rare features index. The cluster module identifies at least two resources that share rare features, creates one or more clusters based on the identified at least two resources, and associates resources that share similar features with the one or more clusters. Resources that do not share similar features are not associated with the one or more clusters. Identifying at least two resources that share rare features is based at least upon a threshold.
-
公开(公告)号:US09703863B2
公开(公告)日:2017-07-11
申请号:US13794446
申请日:2013-03-11
申请人: DiscoverReady LLC
发明人: Stephen John Barsony , Yerachmiel Tzvi Messing , David Matthew Shub , James Kenneth Wagner, Jr.
CPC分类号: G06F17/30713 , G06F17/3071 , G06Q10/00
摘要: Data is received that characterizes each of a plurality of documents within a document set. Based on this data, the plurality of documents are grouped into a plurality of stacks using one or more grouping algorithms. A prime document is identified for each stack that includes attributes representative of the entire stack. Subsequently, provision of data is provided that characterizes documents for each stack including at least the identified prime document to at least one human reviewer. User-generated input from the human reviewer is later received that categorized each provided document and data characterizing the user-generated input can then be provided. Related apparatus, systems, techniques and articles are also described.
-
公开(公告)号:US09674572B2
公开(公告)日:2017-06-06
申请号:US13453072
申请日:2012-04-23
IPC分类号: H04N5/445 , G06F3/00 , G06F17/30 , H04N21/434 , H04N5/50 , H04N7/16 , H04N21/442 , H04N21/45 , H04N21/466 , H04N21/482 , H04N21/8405 , H04N21/475
CPC分类号: H04N21/434 , G06F17/30598 , G06F17/30713 , H04N5/44543 , H04N5/50 , H04N7/163 , H04N21/4345 , H04N21/44222 , H04N21/4532 , H04N21/466 , H04N21/4755 , H04N21/482 , H04N21/8405
摘要: An information processing device includes an extraction section for extracting, from program information, a genre feature word that is a keyword representing a genre feature. An identification section identifies a channel by genre based on the genre feature word extracted from the program information associated with a program to be broadcast on the channel. The processing device further includes a display processing section for providing control over the channel to be displayed after genre classification.
-
公开(公告)号:US09672279B1
公开(公告)日:2017-06-06
申请号:US14501431
申请日:2014-09-30
申请人: EMC Corporation
CPC分类号: G06F17/30713 , G06F17/30011 , G06F17/30705 , G06N7/00
摘要: An apparatus comprises a processing platform configured to implement a cluster labeling system for documents comprising unstructured text data. The cluster labeling system comprises a clustering module and a visualization module. The clustering module implements a topic model generator and is configured to assign each of the documents to one or more of a plurality of clusters based at least in part on one or more topics identified from the unstructured text data using at least one topic model provided by the topic model generator. The visualization module comprises multiple view generators configured to generate respective distinct visualizations of a selected one of the clusters. The multiple view generators include at least a bigram view generator configured to provide a visualization of a plurality of term pairs from the selected cluster, and a summarization view generator configured to provide a visualization of representative term sequences from the selected cluster.
-
公开(公告)号:US09659088B2
公开(公告)日:2017-05-23
申请号:US14245516
申请日:2014-04-04
申请人: FUJI XEROX CO., LTD.
CPC分类号: G06F17/30713 , G06F17/2715 , G06F17/30598 , G06F17/30705 , G06F17/3071 , G06F17/30722
摘要: It is a non-transitory computer readable medium storing a program causing a computer to execute a process for information processing, the process including: calculating a feature amount of each of document contents to which common attribute information is added; and generating distribution map information by plotting each of document contents in a feature amount space on the basis of the calculated feature amount.
-
-
-
-
-
-
-
-
-