-
公开(公告)号:US11663254B2
公开(公告)日:2023-05-30
申请号:US15418763
申请日:2017-01-29
发明人: Jack G. Conrad , Michael J. Bender
IPC分类号: G06F16/33 , G06F16/93 , G06F16/338 , G06F16/35 , G06F17/22 , G06F17/27 , G06F40/194 , G06F40/295
CPC分类号: G06F16/334 , G06F16/338 , G06F16/355 , G06F16/358 , G06F16/93 , G06F40/194 , G06F40/295
摘要: The present invention provides a seeded news event clustering and retrieval system configured to first create a candidate data set of documents, second create a set of initial clusters based on nearness or duplicate similarity status, and third create an aggregate cluster by merging initial clusters with seed documents. The invention generates top-level clusters for news events based on an editorially supplied topical label or “seed” component and generates sub-topic-focused clusters based on algorithm. The system uses an agglomerative clustering algorithm to gather and structure documents into distinct result sets. Decisions on whether to merge related documents or clusters are made according to similarity of evidence derived from two distinct sources, one, relying on a digital signature based on the unstructured text in the document, the other based on the presence of named entity tags that have been assigned to the document by an event or named entity tagger such as the Thomson Reuters Calais engine/web service.
-
公开(公告)号:US11580763B2
公开(公告)日:2023-02-14
申请号:US16876617
申请日:2020-05-18
发明人: Khaled Ammar , Brian Zubert , Sakif Hossain Khan
IPC分类号: G06V30/414 , G06K9/62 , G06V10/75 , G06V30/416
摘要: In some aspects, a method includes performing optical character recognition (OCR) based on data corresponding to a document to generate text data, detecting one or more bounded regions from the data based on a predetermined boundary rule set, and matching one or more portions of the text data to the one or more bounded regions to generate matched text data. Each bounded region of the one or more bounded regions encloses a corresponding block of text. The method also includes extracting features from the matched text data to generate a plurality of feature vectors and providing the plurality of feature vectors to a trained machine-learning classifier to generate one or more labels associated with the one or more bounded regions. The method further includes outputting metadata indicating a hierarchical layout associated with the document based on the one or more labels and the matched text data.
-
公开(公告)号:US11568503B2
公开(公告)日:2023-01-31
申请号:US16446423
申请日:2019-06-19
发明人: Thomas Vacek , Dezhao Song , Tim Nugent , Conner Cowling , Ronald Teo , Frank Schilder
摘要: The present disclosure relates to systems and methods for analyzing and extracting data related to a structured proceeding, and for identifying, based on the analysis, at least one outcome associated with the structured proceeding. Embodiments provide for receiving data associated with a structured proceeding involving at least one party, the data including at least one docket entry, and analyzing, by an outcome location detector, the data to identify one or more docket entries in the at least one docket entry that are likely to include evidence of an outcome. Embodiments further include analyzing, by an outcome detector, the one or more docket entries determined to be likely to include evidence of an outcome to determine outcomes. The outcomes include at least one of a final outcome and at least one party outcome. The final outcome is associated with the structured proceeding overall, and the at least one party outcome is associated with a party of the at least one party that may have been terminated early.
-
公开(公告)号:US20220164397A1
公开(公告)日:2022-05-26
申请号:US17534017
申请日:2021-11-23
发明人: Rogelio Escalona , Xiao Xiao , Nathan Harris , Mahesh Ramachandran , Paul Cifarelli , Chad Longo , Katherine Kent , Yelena Altman Shapiro , Laura McCurdy , Andrew Petrosie , Mathew Lawrence , Joseph Santoru , Bob Rhodes , John Kennedy , Spencer Torene
IPC分类号: G06F16/93
摘要: Aspects of the present disclosure provide systems, methods, apparatus, and computer-readable storage media that support relevance-based analysis and filtering of documents and media for one or more enterprises. Aspects disclosed herein leverage custom-built taxonomies, natural language processing (NLP), and machine learning (ML) for identifying and extracting features from highly-relevant documents. The extracted features are vectorized and then filtered based on entities (e.g., enterprises, organizations, individuals, etc.) and compliance-based risks (e.g., illegal or non-compliant activities) that are highly relevant to a particular client. The filtered feature vectors are used to identify and highlight relevant information in the corresponding documents, enabling decision making to resolve compliance-related risks. The aspects described herein generate fewer false positive or otherwise less relevant results than conventional document screening applications or manual techniques.
-
公开(公告)号:US20220092453A1
公开(公告)日:2022-03-24
申请号:US17484881
申请日:2021-09-24
发明人: Nadja Herger , Nina Stamenova Hristozova , Milda Norkute , Leszek Michalak , Stavroula Skylaki , Daniele Giofré , Andrew Timothy Mulder
摘要: Methods and systems for providing mechanisms for presenting artificial intelligence (AI) explainability metrics associated with model-based results are provided. In embodiments, a model is applied to a source document to generate a summary. An attention score is determined for each token of a plurality of tokens of the source document. The attention score for a token indicates a level of relevance of the token to the model-based summary. The tokens are aligned to at least one word of a plurality of words included in the source document, and the attention scores of the tokens aligned to the each word are combined to generate an overall attention score for each word of the source document. At least one word of the source document is displayed with an indication of the overall attention score associated with the at least one word.
-
公开(公告)号:US20220083560A1
公开(公告)日:2022-03-17
申请号:US17531693
申请日:2021-11-19
发明人: Domingo Huh , Julian Brooke , Elnaz Davoodi , Jack G. Conrad
IPC分类号: G06F16/2457 , G06F16/93 , G06F16/38
摘要: The present disclosure relates to methods and systems for providing context aware searching using concept markers. Embodiments provide concept markers configured to facilitate the identification and refinement of relevant content associated with a query with a high degree of precision. In embodiments, in response to a user query, documents and concept markers relevant to the query are determined. The identified documents are associated with the concept markers and are ranked based on the quality of the association. Upon a user selecting at least one of the concept markers, the search results are refined in response. The refining includes re-ranking the documents based on a combination of the original query and the selected concept marker. The suggested concept markers are similarly re-ranked. As such, the techniques disclosed herein provide for a high precision identification of relevant content as well as high precision refining of the search results.
-
公开(公告)号:US20220019609A1
公开(公告)日:2022-01-20
申请号:US17375657
申请日:2021-07-14
发明人: Cecil Lee Quartey , Isaac Kriegman
IPC分类号: G06F16/35 , G06F16/383 , G06N20/00
摘要: Computer implemented methods for categorizing documents are provided that include: receiving a document having a plurality of headnotes and metadata associated with the document, wherein the plurality of headnotes each comprise a segment of text that summarizes at least a portion of the document; predicting using at least a first machine learning model, for at least a first of the plurality of headnotes, a statute pertaining to the first headnote, wherein the predicted statute has associated therewith a taxonomy of topics; predicting using the first machine learning model, a topic from the taxonomy of topics associated with the statute that the first headnote pertains; and associating the first headnote with the predicted topic.
-
公开(公告)号:US11222027B2
公开(公告)日:2022-01-11
申请号:US16181729
申请日:2018-11-06
发明人: Domingo Huh , Julian Brooke , Elnaz Davoodi , Jack G. Conrad
IPC分类号: G06F7/00 , G06F16/2457 , G06F16/93 , G06F16/38
摘要: The present disclosure relates to methods and systems for providing context aware searching using concept markers. Embodiments provide concept markers configured to facilitate the identification and refinement of relevant content associated with a query with a high degree of precision. In embodiments, in response to a user query, documents and concept markers relevant to the query are determined. The identified documents are associated with the concept markers and are ranked based on the quality of the association. Upon a user selecting at least one of the concept markers, the search results are refined in response. The refining includes re-ranking the documents based on a combination of the original query and the selected concept marker. The suggested concept markers are similarly re-ranked. The disclosed techniques provide for a high precision identification of relevant content as well as high precision refining of the search results.
-
89.
公开(公告)号:US20210382878A1
公开(公告)日:2021-12-09
申请号:US17403858
申请日:2021-08-16
IPC分类号: G06F16/242 , G06N5/04 , G06F16/248
摘要: The present disclosure relates to systems and methods for generating contextually, grammatically, and conversationally correct answers to input questions. Embodiments provide for linguistic and syntactic structure analysis of a submitted question in order to determine whether the submitted question may be answered by at least one headnote. The question is then further analyzed to determine more details about the intent and context of the question. A federated search process, based on the linguistic and syntactic structure analysis, and the additional analysis of the question is used to identify candidate question-answer pairs from a corpus of previously created headnotes. Machine learning models are used to analyze the candidate question-answer pairs, additional rules are applied to rank the candidate answers, and dynamic thresholds are applied to identify the best potential answers to provide to a user as a response to the submitted question.
-
公开(公告)号:US20210319177A1
公开(公告)日:2021-10-14
申请号:US17156546
申请日:2021-01-23
发明人: Richard Anthony Pito
IPC分类号: G06F40/258 , G06F40/279
摘要: The present disclosure is directed towards systems and methods for extracting structure and headers from a body of text. This computational extraction is based on the visual and logical similarities between portions of text. Structure is derived from a programmatic and methodic computation of similarities between header pairs.
-
-
-
-
-
-
-
-
-