System and engine for seeded clustering of news events

    公开(公告)号:US11663254B2

    公开(公告)日:2023-05-30

    申请号:US15418763

    申请日:2017-01-29

    摘要: The present invention provides a seeded news event clustering and retrieval system configured to first create a candidate data set of documents, second create a set of initial clusters based on nearness or duplicate similarity status, and third create an aggregate cluster by merging initial clusters with seed documents. The invention generates top-level clusters for news events based on an editorially supplied topical label or “seed” component and generates sub-topic-focused clusters based on algorithm. The system uses an agglomerative clustering algorithm to gather and structure documents into distinct result sets. Decisions on whether to merge related documents or clusters are made according to similarity of evidence derived from two distinct sources, one, relying on a digital signature based on the unstructured text in the document, the other based on the presence of named entity tags that have been assigned to the document by an event or named entity tagger such as the Thomson Reuters Calais engine/web service.

    Representative document hierarchy generation

    公开(公告)号:US11580763B2

    公开(公告)日:2023-02-14

    申请号:US16876617

    申请日:2020-05-18

    摘要: In some aspects, a method includes performing optical character recognition (OCR) based on data corresponding to a document to generate text data, detecting one or more bounded regions from the data based on a predetermined boundary rule set, and matching one or more portions of the text data to the one or more bounded regions to generate matched text data. Each bounded region of the one or more bounded regions encloses a corresponding block of text. The method also includes extracting features from the matched text data to generate a plurality of feature vectors and providing the plurality of feature vectors to a trained machine-learning classifier to generate one or more labels associated with the one or more bounded regions. The method further includes outputting metadata indicating a hierarchical layout associated with the document based on the one or more labels and the matched text data.

    Systems and methods for determining structured proceeding outcomes

    公开(公告)号:US11568503B2

    公开(公告)日:2023-01-31

    申请号:US16446423

    申请日:2019-06-19

    IPC分类号: G06Q50/18 G06N3/08

    摘要: The present disclosure relates to systems and methods for analyzing and extracting data related to a structured proceeding, and for identifying, based on the analysis, at least one outcome associated with the structured proceeding. Embodiments provide for receiving data associated with a structured proceeding involving at least one party, the data including at least one docket entry, and analyzing, by an outcome location detector, the data to identify one or more docket entries in the at least one docket entry that are likely to include evidence of an outcome. Embodiments further include analyzing, by an outcome detector, the one or more docket entries determined to be likely to include evidence of an outcome to determine outcomes. The outcomes include at least one of a final outcome and at least one party outcome. The final outcome is associated with the structured proceeding overall, and the at least one party outcome is associated with a party of the at least one party that may have been terminated early.

    SYSTEMS AND METHODS FOR ANALYSIS EXPLAINABILITY

    公开(公告)号:US20220092453A1

    公开(公告)日:2022-03-24

    申请号:US17484881

    申请日:2021-09-24

    IPC分类号: G06N5/04 G06F16/23

    摘要: Methods and systems for providing mechanisms for presenting artificial intelligence (AI) explainability metrics associated with model-based results are provided. In embodiments, a model is applied to a source document to generate a summary. An attention score is determined for each token of a plurality of tokens of the source document. The attention score for a token indicates a level of relevance of the token to the model-based summary. The tokens are aligned to at least one word of a plurality of words included in the source document, and the attention scores of the tokens aligned to the each word are combined to generate an overall attention score for each word of the source document. At least one word of the source document is displayed with an indication of the overall attention score associated with the at least one word.

    SYSTEM AND METHODS FOR CONTEXT AWARE SEARCHING

    公开(公告)号:US20220083560A1

    公开(公告)日:2022-03-17

    申请号:US17531693

    申请日:2021-11-19

    摘要: The present disclosure relates to methods and systems for providing context aware searching using concept markers. Embodiments provide concept markers configured to facilitate the identification and refinement of relevant content associated with a query with a high degree of precision. In embodiments, in response to a user query, documents and concept markers relevant to the query are determined. The identified documents are associated with the concept markers and are ranked based on the quality of the association. Upon a user selecting at least one of the concept markers, the search results are refined in response. The refining includes re-ranking the documents based on a combination of the original query and the selected concept marker. The suggested concept markers are similarly re-ranked. As such, the techniques disclosed herein provide for a high precision identification of relevant content as well as high precision refining of the search results.

    SYSTEMS AND METHODS FOR THE AUTOMATIC CATEGORIZATION OF TEXT

    公开(公告)号:US20220019609A1

    公开(公告)日:2022-01-20

    申请号:US17375657

    申请日:2021-07-14

    摘要: Computer implemented methods for categorizing documents are provided that include: receiving a document having a plurality of headnotes and metadata associated with the document, wherein the plurality of headnotes each comprise a segment of text that summarizes at least a portion of the document; predicting using at least a first machine learning model, for at least a first of the plurality of headnotes, a statute pertaining to the first headnote, wherein the predicted statute has associated therewith a taxonomy of topics; predicting using the first machine learning model, a topic from the taxonomy of topics associated with the statute that the first headnote pertains; and associating the first headnote with the predicted topic.

    System and methods for context aware searching

    公开(公告)号:US11222027B2

    公开(公告)日:2022-01-11

    申请号:US16181729

    申请日:2018-11-06

    摘要: The present disclosure relates to methods and systems for providing context aware searching using concept markers. Embodiments provide concept markers configured to facilitate the identification and refinement of relevant content associated with a query with a high degree of precision. In embodiments, in response to a user query, documents and concept markers relevant to the query are determined. The identified documents are associated with the concept markers and are ranked based on the quality of the association. Upon a user selecting at least one of the concept markers, the search results are refined in response. The refining includes re-ranking the documents based on a combination of the original query and the selected concept marker. The suggested concept markers are similarly re-ranked. The disclosed techniques provide for a high precision identification of relevant content as well as high precision refining of the search results.

    SYSTEMS AND METHODS FOR GENERATING A CONTEXTUALLY AND CONVERSATIONALLY CORRECT RESPONSE TO A QUERY

    公开(公告)号:US20210382878A1

    公开(公告)日:2021-12-09

    申请号:US17403858

    申请日:2021-08-16

    摘要: The present disclosure relates to systems and methods for generating contextually, grammatically, and conversationally correct answers to input questions. Embodiments provide for linguistic and syntactic structure analysis of a submitted question in order to determine whether the submitted question may be answered by at least one headnote. The question is then further analyzed to determine more details about the intent and context of the question. A federated search process, based on the linguistic and syntactic structure analysis, and the additional analysis of the question is used to identify candidate question-answer pairs from a corpus of previously created headnotes. Machine learning models are used to analyze the candidate question-answer pairs, additional rules are applied to rank the candidate answers, and dynamic thresholds are applied to identify the best potential answers to provide to a user as a response to the submitted question.