Generating rules for automated text annotation

    公开(公告)号:US10902198B2

    公开(公告)日:2021-01-26

    申请号:US16203903

    申请日:2018-11-29

    摘要: Natural language text and annotated text can be received. The annotated text can specify at least one anchor and at least one trigger contained in the natural language text and indicate a correspondence between the anchor and the trigger. The natural language text, the annotated text and at least one parse tree generated from the natural language text can be processed. Based on the processing, at least one natural language processing rule can be generated and output. The natural language processing rule can be configured to be executed by a processor to process other natural language text.

    DE-IDENTIFICATION OF ELECTRONIC MEDICAL RECORDS FOR CONTINUOUS DATA DEVELOPMENT

    公开(公告)号:US20200175203A1

    公开(公告)日:2020-06-04

    申请号:US16205423

    申请日:2018-11-30

    IPC分类号: G06F21/62 G16H10/60 G06F16/81

    摘要: A method for de-identifying protected health information (PHI) associated with electronic medical records (EMRs) based on a common analysis structure (CAS) is provided. The method may include detecting a system event associated with a system comprising the EMRs. The method may further include in response to detecting the system event, detecting a first CAS associated with the EMRs. The method may further include extracting first CAS data associated with the first CAS, wherein the first CAS data comprises unstructured data associated with the EMRs and normalized annotations based on CAS objects that are associated with the unstructured data. The method may further include obfuscating the unstructured data associated with the first CAS. The method may also include generating a second CAS comprising the obfuscated unstructured data and a copied version of the normalized annotations, wherein the copied version of normalized annotations are correlated with the obfuscated unstructured data.

    Relation extraction using Q and A

    公开(公告)号:US10540440B2

    公开(公告)日:2020-01-21

    申请号:US15613469

    申请日:2017-06-05

    摘要: Embodiments of the present invention disclose a method, a computer program product, and a computer system for extracting natural language relations between entities. A computer receives a configuration for associating one or more natural language questions with one or more entities and identifies the one or more entities annotated within a document. The computer answers the natural language questions associated with the identified one or more entities based on context surrounding the identified one or more entities. The computer may further transmit the natural language questions associated with the identified one or more entities and the surrounding context to a question and answer service, then receive answers to the natural language questions from the question and answer service. The computer may further determine whether the received answers correctly describe the relation between the identified one or more entities and other entities within the extracted surrounding context.

    CLASSIFYING TEXT TO DETERMINE A GOAL TYPE USED TO SELECT MACHINE LEARNING ALGORITHM OUTCOMES

    公开(公告)号:US20190318263A1

    公开(公告)日:2019-10-17

    申请号:US16453221

    申请日:2019-06-26

    IPC分类号: G06N20/00 G06F17/27 G06F16/33

    摘要: Provided are a computer program product, system, and method for classifying text to determine a goal type used to select machine learning algorithm outcomes. Natural language processing of text is performed to determine features in the text and their relationships. A classifier classifies the text based on the relationships and features to determine a goal type. The determined features and relationships from the text are inputted into a plurality of different machine learning algorithms to generate outcomes. For each of the machine learning algorithms, a determination is made of performance measurements resulting from the machine learning algorithms generating the outcomes. A determination is made of at least one machine learning algorithm having performance measurements that are highly correlated to the determined goal type. An outcome is determined from at least one of the outcomes.

    Personalized approach to handling hypotheticals in text

    公开(公告)号:US10360301B2

    公开(公告)日:2019-07-23

    申请号:US15289224

    申请日:2016-10-10

    IPC分类号: G06F17/27 G16H10/00

    摘要: Mechanisms receive natural language content and analyze the natural language content to generate a parse tree data structure. The mechanisms process the parse tree data structure to identify one or more instances of candidate hypothetical spans in the natural language content. Hypothetical spans are terms or phrases indicative of a hypothetical statement. The mechanisms calculate, for each candidate hypothetical span, a confidence score value indicative of a confidence that the candidate hypothetical span is an actual hypothetical span based on a personalized hypothetical dictionary data structure associated with a source of the natural language content. The mechanisms perform an operation based on the natural language content. The operation is performed with portions of the natural language content corresponding to the one or more identified instances of actual hypothetical spans being given different relative weights within portions of the natural language content than other portions of the natural language content.

    Determining context using weighted parsing scoring

    公开(公告)号:US10275456B2

    公开(公告)日:2019-04-30

    申请号:US15623613

    申请日:2017-06-15

    IPC分类号: G06F17/20 G06F17/27 G06N5/04

    摘要: According to one embodiment, a method, computer system, and computer program product for natural language processing is provided. The present invention may include detecting natural language entities, and running parsing algorithms on the natural language entities to determine the relationship between said natural language entities. The present invention may further comprise assigning, by the parsing algorithms, initial scores to detected natural language entities based on the relationship between said natural language entities; choosing a final score for plurality of natural language entities; and comparing the final score against a threshold to determine whether the natural language entities are within the same context.

    Verification of Clinical Hypothetical Statements Based on Dynamic Cluster Analysis

    公开(公告)号:US20180096103A1

    公开(公告)日:2018-04-05

    申请号:US15283893

    申请日:2016-10-03

    IPC分类号: G06F19/00

    CPC分类号: G16H10/60 G16H50/20

    摘要: A mechanism is provided in a data processing system comprising at least one processor and at least one memory, the at least one memory comprising instructions which are executed by the at least one processor and configure the processor to implement a medical treatment recommendation system. The medical treatment recommendation system receives a first patient electronic medical record (EMR) corresponding to a first patient. The medical treatment recommendation system analyzes the first patient EMR to identify a span of content in the first patient EMR that is a candidate hypothetical statement within the patient EMR. The medical treatment recommendation system verifies whether or not the candidate hypothetical statement is an actual hypothetical statement based on an analysis of a corpus of other content. The medical treatment recommendation system controls an operation of the medical treatment recommendation system with regard to the span of content based on results of the verifying. The controlling causes the medical treatment recommendation system to ignore the span of content in response to the results of the verifying indicating the candidate hypothetical statement to be an actual hypothetical statement. The medical treatment recommendation system generates a treatment recommendation based on the operation of the medical treatment recommendation system with regard to the span of content. The medical treatment recommendation system outputs the treatment recommendation for use in treating the first patient.

    Gap identification in corpora
    9.
    发明授权

    公开(公告)号:US10740365B2

    公开(公告)日:2020-08-11

    申请号:US15622762

    申请日:2017-06-14

    摘要: Embodiments of the present invention disclose a method, a computer program product, and a computer system for identifying information gaps in corpora. A computer receives a document and extracts keywords from the document while filtering trivial keywords. The computer identifies and extracts top keywords detailed by the document using a topic modelling approach before determining whether the extracted top keywords exceed a threshold use frequency. Based on determining that the top keywords exceed a threshold use frequency, determining whether the top keywords have a relation to other entities within the document and, if so, determining whether the top keywords are defined within the document. Based on determining that the top keywords are not defined in the document, adding the top keywords to a list and defining the top keywords.

    IDENTIFYING AND PRIORITIZING CANDIDATE ANSWER GAPS WITHIN A CORPUS

    公开(公告)号:US20200183962A1

    公开(公告)日:2020-06-11

    申请号:US16212676

    申请日:2018-12-06

    摘要: Methods and apparatus, including computer program products, implementing and using techniques for identifying candidate answer gaps within a corpus of a question and answer system. An original question posed to the question and answer system is analyzed to identify an object and a semantic type for the question. Concepts having a same or similar semantic type are retrieved from an ontology or dictionary. For at least one retrieved concept, one or more altered questions are created by replacing the object of the original question with a preferred term of the retrieved concept. The one or more altered questions are submitted to the question and answer system. The answers to the altered questions are analyzed to identify gaps within the corpus of the question and answer system.