Systems and methods for relation inference

    公开(公告)号:US10650305B2

    公开(公告)日:2020-05-12

    申请号:US15205798

    申请日:2016-07-08

    申请人: Baidu USA, LLC

    摘要: Presented are relation inference methods and systems that use deep learning techniques for data mining documents to discover a relation between terms of interest in a given field covering a specific topic. For example, in the healthcare domain, various embodiments of the present disclosure provide for a relation inference system that mines large-scale medical documents in a free-text database to extract symptom and disease terms and generates relation information that aids in disease diagnosis. In embodiments, this is accomplished by training and using an RNN, such as an LSTM, a Gated Recurrent Unit (GRU), etc., that takes advantage of a term dictionary to examine co-occurrences of terms of interest within documents to discover correlations between the terms. The correlation may then be used to predict statistically most probable terms (e.g., a disease) related to a given search term (e.g., a symptom).

    Hierarchical multi-task term embedding learning for synonym prediction

    公开(公告)号:US11580415B2

    公开(公告)日:2023-02-14

    申请号:US16506291

    申请日:2019-07-09

    申请人: Baidu USA, LLC

    摘要: Due to the high language use variability in real-life, manual construction of semantic resources to cover all synonyms is prohibitively expensive and may result in limited coverage. Described herein are systems and methods that automate the process of synonymy resource development, including both formal entities and noisy descriptions from end-users. Embodiments of a multi-task model with hierarchical task relationship are presented that learn more representative entity/term embeddings and apply them to synonym prediction. In model embodiments, a skip-gram word embedding model is extended by introducing an auxiliary task “neighboring word/term semantic type prediction” and hierarchically organize them based on the task complexity. In one or more embodiments, existing term-term synonymous knowledge is integrated into the word embedding learning framework. Embeddings trained from the multi-task model embodiments yield significant improvement for entity semantic relatedness evaluation, neighboring word/term semantic type prediction, and synonym prediction compared with baselines.

    Question generation systems and methods for automating diagnosis

    公开(公告)号:US11194860B2

    公开(公告)日:2021-12-07

    申请号:US15207445

    申请日:2016-07-11

    申请人: Baidu USA, LLC

    摘要: Systems and methods are disclosed for question generation to obtain more related medical information based on observed symptoms from a patient. In embodiments, possible diseases associated with the observed symptoms are generated by querying a knowledge graph. In embodiments, candidate symptoms associated with the possible diseases are also identified and are combined with the observed symptoms to obtain combined symptom sets. In embodiments, discriminative scores for the candidate symptom sets are determined and candidate symptoms with top discriminative scores are selected. In embodiments, these selected candidate symptoms may be checked for conflicts with observed symptoms and removed from further consideration if a conflict exists. In embodiments, one or more questions may be generated based on the remaining selected candidate systems to aid in collecting information about the patient. In embodiments, the process may be repeated with the updated observed symptoms.

    Systems and methods for deep skip-gram network based text classification

    公开(公告)号:US11494615B2

    公开(公告)日:2022-11-08

    申请号:US16368440

    申请日:2019-03-28

    申请人: Baidu USA, LLC

    IPC分类号: G06N3/04 G06K9/62 G06F17/18

    摘要: Described herein are embodiments for systems and methods to incorporate skip-gram convolution to extract non-consecutive local n-gram patterns for comprehensive information for varying text expressions. In one or more embodiments, one or more recurrent neural networks are employed to extract long-range features from localized level to sequential and global level via a chain-like architecture. Comprehensive experiments on large-scale datasets widely used for the text classification task were conducted to demonstrate the effectiveness of the presented deep skip-gram network embodiments. Performance evaluation on various datasets demonstrates that embodiments of the skip-gram network are powerful for general text classification task set. The skip-gram models are robust and may be generalized well on different datasets, even without tuning the hyper-parameters for specific dataset.

    SYSTEMS AND METHODS FOR ESTIMATING HEALTHCARE RESOURCE DEMAND

    公开(公告)号:US20180039735A1

    公开(公告)日:2018-02-08

    申请号:US15226249

    申请日:2016-08-02

    申请人: Baidu USA, LLC

    IPC分类号: G06F19/00 G06Q10/06

    摘要: Presented are systems and methods that allow healthcare providers and governments to infer demand for healthcare resources to ensure effective and timely healthcare services to patients by reducing healthcare supply shortages, emergencies, and healthcare costs. In embodiments, this is accomplished by gathering data from a number of sources to generate labeled records from which entity features and relationships between entities are extracted, correlates, and/or combined with other external healthcare data. In embodiments, this information is used to train a model that predicts healthcare resource demands given a set of input conditions or factors.

    Systems and methods for estimating healthcare resource demand

    公开(公告)号:US11195128B2

    公开(公告)日:2021-12-07

    申请号:US15226249

    申请日:2016-08-02

    申请人: Baidu USA, LLC

    摘要: Presented are systems and methods that allow healthcare providers and governments to infer demand for healthcare resources to ensure effective and timely healthcare services to patients by reducing healthcare supply shortages, emergencies, and healthcare costs. In embodiments, this is accomplished by gathering data from a number of sources to generate labeled records from which entity features and relationships between entities are extracted, correlates, and/or combined with other external healthcare data. In embodiments, this information is used to train a model that predicts healthcare resource demands given a set of input conditions or factors.

    Systems and methods for homogeneous entity grouping

    公开(公告)号:US10372743B2

    公开(公告)日:2019-08-06

    申请号:US15215492

    申请日:2016-07-20

    申请人: Baidu USA, LLC

    IPC分类号: G06F17/30 G06F16/35 G06F16/33

    摘要: Systems and methods are disclosed to identify entities that have a similar meaning, and may, in embodiments, be grouped into entity groups for knowledge base construction. In embodiments, the entity relations of similarity or non-similarity for an entity pair are predicted as a binary relationship. In embodiments, the prediction may be based upon similarity score between the entities and the entity features, which features are constructed using an entity feature or representation model. In embodiments, the prediction may be an iterative process involving minimum human checking and existing knowledge update. In embodiments, one or more entity groups are formed using graph search from the predicted entity pairs. In embodiments, a group centroid entity may be selected to represent each group based on one or more factors, such as its generality or popularity.

    SYSTEMS AND METHODS FOR RELATION INFERENCE
    10.
    发明申请

    公开(公告)号:US20180012121A1

    公开(公告)日:2018-01-11

    申请号:US15205798

    申请日:2016-07-08

    申请人: Baidu USA, LLC

    IPC分类号: G06N3/04 G06N3/08 G06F17/30

    摘要: Presented are relation inference methods and systems that use deep learning techniques for data mining documents to discover a relation between terms of interest in a given field covering a specific topic. For example, in the healthcare domain, various embodiments of the present disclosure provide for a relation inference system that mines large-scale medical documents in a free-text database to extract symptom and disease terms and generates relation information that aids in disease diagnosis. In embodiments, this is accomplished by training and using an RNN, such as an LSTM, a Gated Recurrent Unit (GRU), etc., that takes advantage of a term dictionary to examine co-occurrences of terms of interest within documents to discover correlations between the terms. The correlation may then be used to predict statistically most probable terms (e.g., a disease) related to a given search term (e.g., a symptom).