-
公开(公告)号:US11748613B2
公开(公告)日:2023-09-05
申请号:US16409148
申请日:2019-05-10
申请人: Baidu USA, LLC
发明人: Dingcheng Li , Jingyuan Zhang , Ping Li
IPC分类号: G06F16/93 , G06F16/35 , G06F40/205 , G06F40/30 , G06N3/08 , G06N3/04 , G06N3/044 , G06N3/045
CPC分类号: G06N3/08 , G06F16/353 , G06F16/93 , G06F40/205 , G06F40/30 , G06N3/04 , G06N3/044 , G06N3/045
摘要: Described herein are embodiments for a deep level-wise extreme multi-label learning and classification (XMLC) framework to facilitate the semantic indexing of literatures. In one or more embodiments, the Deep Level-wise XMLC framework comprises two sequential modules, a deep level-wise multi-label learning module and a hierarchical pointer generation module. In one or more embodiments, the first module decomposes terms of domain ontology into multiple levels and builds a special convolutional neural network for each level with category-dependent dynamic max-pooling and macro F-measure based weights tuning. In one or more embodiments, the second module merges the level-wise outputs into a final summarized semantic indexing. The effectiveness of Deep Level-wise XMLC framework embodiments is demonstrated by comparing it with several state-of-the-art methods of automatic labeling on various datasets.
-
公开(公告)号:US11727243B2
公开(公告)日:2023-08-15
申请号:US16262618
申请日:2019-01-30
申请人: Baidu USA, LLC
发明人: Jingyuan Zhang , Dingcheng Li , Ping Li , Xiao Huang
IPC分类号: G06N3/00 , G06F16/901 , G06N3/08 , G06F16/2452 , G06N3/04 , G06N3/006 , G06N3/042 , G06N3/044
CPC分类号: G06N3/006 , G06F16/24522 , G06F16/9024 , G06N3/042 , G06N3/044 , G06N3/08
摘要: Described herein are embodiments for question answering over knowledge graph using a Knowledge Embedding based Question Answering (KEQA) framework. Instead of inferring an input questions' head entity and predicate directly, KEQA embodiments target jointly recovering the question's head entity, predicate, and tail entity representations in the KG embedding spaces. In embodiments, a joint distance metric incorporating various loss terms is used to measure distances of a predicated fact to all candidate facts. In embodiments, the fact with the minimum distance is returned as the answer. Embodiments of a joint training strategy are also disclosed for better performance. Performance evaluation on various datasets demonstrates the effectiveness of the disclosed systems and methods using the KEQA framework.
-
3.
公开(公告)号:US11636355B2
公开(公告)日:2023-04-25
申请号:US16427225
申请日:2019-05-30
申请人: Baidu USA, LLC
发明人: Dingcheng Li , Jingyuan Zhang , Ping Li , Siamak Zamani Dadaneh
IPC分类号: G06F40/289 , G06N5/04 , G06N20/00 , G06F40/20
摘要: Leveraging domain knowledge is an effective strategy for enhancing the quality of inferred low-dimensional representations of documents by topic models. Presented herein are embodiments of a Bayesian nonparametric model that employ knowledge graph (KG) embedding in the context of topic modeling for extracting more coherent topics; embodiments of the model may be referred to as topic modeling with knowledge graph embedding (TMKGE). TMKGE embodiments are hierarchical Dirichlet process (HDP)-based models that flexibly borrow information from a KG to improve the interpretability of topics. Also, embodiments of a new, efficient online variational inference method based on a stick-breaking construction of HDP were developed for TMKGE models, making TMKGE suitable for large document corpora and KGs. Experiments on datasets illustrate the superior performance of TMKGE in terms of topic coherence and document classification accuracy, compared to state-of-the-art topic modeling methods.
-
公开(公告)号:US11886446B2
公开(公告)日:2024-01-30
申请号:US17575650
申请日:2022-01-14
申请人: Baidu USA, LLC
发明人: Hongliang Fei , Puxuan Yu , Ping Li
IPC分类号: G06F17/00 , G06F7/00 , G06F16/2457 , G06F40/58 , G06F40/284
CPC分类号: G06F16/24578 , G06F40/284 , G06F40/58
摘要: Existing research on cross-lingual retrieval cannot take good advantage of large-scale pretrained language models, such as multilingual BERT and XLM. The absence of cross-lingual passage-level relevance data for finetuning and the lack of query-document style pretraining are some of the key factors of this issue. Accordingly, embodiments of two novel retrieval-oriented pretraining tasks are presented herein to further pretrain cross-lingual language models for downstream retrieval tasks, such as cross-lingual ad-hoc retrieval (CUR) and cross-lingual question answering (CLQA). In one or more embodiments, distant supervision data was constructed from multilingual texts using section alignment to support retrieval-oriented language model pretraining. In one or more embodiments, directly finetuning language models on part of an evaluation collection was performed by making Transformers capable of accepting longer sequences. Experiments show that model embodiments significantly improve upon general multilingual language models in at least the cross-lingual retrieval setting and the cross-lingual transfer setting.
-
公开(公告)号:US11748567B2
公开(公告)日:2023-09-05
申请号:US16926525
申请日:2020-07-10
申请人: Baidu USA, LLC
发明人: Dingcheng Li , Shaogang Ren , Ping Li
IPC分类号: G06F40/10 , G06F40/284 , G06F40/211 , G06F40/30 , G06N3/08
CPC分类号: G06F40/284 , G06F40/211 , G06F40/30 , G06N3/08
摘要: Described herein are embodiments of a framework named as total correlation variational autoencoder (TC_VAE) to disentangle syntax and semantics by making use of total correlation penalties of KL divergences. One or more Kullback-Leibler (KL) divergence terms in a loss for a variational autoencoder are discomposed so that generated hidden variables may be separated. Embodiments of the TC_VAE framework were examined on semantic similarity tasks and syntactic similarity tasks. Experimental results show that better disentanglement between syntactic and semantic representations have been achieved compared with state-of-the-art (SOTA) results on the same data sets in similar settings.
-
公开(公告)号:US11354506B2
公开(公告)日:2022-06-07
申请号:US16526614
申请日:2019-07-30
申请人: Baidu USA, LLC
发明人: Hongliang Fei , Zeyu Dai , Ping Li
摘要: Previous neural network models that perform named entity recognition (NER) typically treat the input sentences as a linear sequence of words but ignore rich structural information, such as the coreference relations among non-adjacent words, phrases, or entities. Presented herein are novel approaches to learn coreference-aware word representations for the NER task. In one or more embodiments, a “CNN-BiLSTM-CRF” neural architecture is modified to include a coreference layer component on top of the BiLSTM layer to incorporate coreferential relations. Also, in one or more embodiments, a coreference regularization is added during training to ensure that the coreferential entities share similar representations and consistent predictions within the same coreference cluster. A model embodiment achieved new state-of-the-art performance when tested.
-
公开(公告)号:US12056133B2
公开(公告)日:2024-08-06
申请号:US17555316
申请日:2021-12-17
申请人: Baidu USA, LLC
发明人: Shulong Tan , Weijie Zhao , Ping Li
IPC分类号: G06F16/2457 , G06F16/901
CPC分类号: G06F16/24578 , G06F16/9024
摘要: Presented are systems and methods that construct BipartitE Graph INdices (BEGIN) embodiments for fast neural ranking. BEGIN embodiments comprise two types of nodes: sampled queries and base or searching objects. In one or more embodiments, edges connecting these nodes are constructed by using a neural network ranking measure. These embodiments extend traditional search-on-graph methods and lend themselves to fast neural ranking. Experimental results demonstrate the effectiveness and efficiency of such embodiments.
-
公开(公告)号:US12050646B2
公开(公告)日:2024-07-30
申请号:US17408146
申请日:2021-08-20
申请人: Baidu USA, LLC
发明人: Shulong Tan , Zhaozhuo Xu , Weijie Zhao , Zhixin Zhou , Ping Li
IPC分类号: G06F16/901 , G06F16/22
CPC分类号: G06F16/9024 , G06F16/2272
摘要: Incremental proximity graph maintenance (IPGM) systems and methods for online ANN search support both online vertex deletion and insertion of vertices on proximity graphs. In various embodiments, updating a proximity graph comprises receiving a workload that represents a set of vertices in the proximity graph, each vertex being associated with a type of operation such as a query, insertion, or deletion. For a query or an insertion, a search may be executed on the graph to obtain a set of top-K vertices for each vertex. In the case of a deletion, a vertex may be deleted from the proximity graph, and a local or global reconnection update method may be used to reconstruct at least a portion of the proximity graph.
-
公开(公告)号:US11989233B2
公开(公告)日:2024-05-21
申请号:US17033791
申请日:2020-09-27
申请人: Baidu USA, LLC
发明人: Shulong Tan , Zhixin Zhou , Zhaozhuo Xu , Ping Li
IPC分类号: G06F16/901 , G06F16/903 , G06F17/16
CPC分类号: G06F16/9024 , G06F16/90335 , G06F17/16
摘要: Presented herein are embodiments of a fast search on graph methodology for Maximum Inner Product Search (MIPS). This optimization problem is challenging since traditional Approximate Nearest Neighbor (ANN) search methods may not perform efficiently in the nonmetric similarity measure. Embodiments herein are based on the property that a Möbius/Möbius-like transformation introduces an isomorphism between a subgraph of 2-Delaunay graph and Delaunay graph for inner product. Under this observation, embodiments of a novel graph indexing and searching methodology are presented to find the optimal solution with the largest inner product with the query. Experiments show significant improvements compared to existing methods.
-
公开(公告)号:US11914669B2
公开(公告)日:2024-02-27
申请号:US17095548
申请日:2020-11-11
申请人: Baidu USA, LLC
发明人: Weijie Zhao , Shulong Tan , Ping Li
IPC分类号: G06F17/10 , G06F9/30 , G06F9/38 , G06F9/48 , G06F18/2323 , G06F18/2413
CPC分类号: G06F17/10 , G06F9/3009 , G06F9/3887 , G06F9/4881 , G06F18/2323 , G06F18/24147
摘要: Approximate nearest neighbor (ANN) searching is a fundamental problem in computer science with numerous applications in area such as machine learning and data mining. For typical graph-based ANN methods, the searching method is executed iteratively, and the execution dependency prohibits graphics processor unit (GPU)/GPU-type processor adaptations. Presented herein are embodiments of a novel framework that decouples the searching on graph methodology into stages, in order to parallel the performance-crucial distance computation. Furthermore, in one or more embodiments, to obtain better parallelism on GPU-type components, also disclosed are novel ANN-specific optimization methods that eliminate dynamic memory allocations and trade computations for less memory consumption. Embodiments were empirically compared against other methods, and the results confirm the effectiveness.
-
-
-
-
-
-
-
-
-