-
公开(公告)号:US20210326751A1
公开(公告)日:2021-10-21
申请号:US16882296
申请日:2020-05-22
发明人: Xiaodong Liu , Hao Cheng , Yu Wang , Jianfeng Gao , Weizhu Chen , Pengcheng He , Hoifung Poon
IPC分类号: G06N20/00 , G06N3/08 , G06F40/284 , G06K9/62
摘要: This document relates to training of machine learning models. One example method involves providing a machine learning model having one or more mapping layers. The one or more mapping layers can include at least a first mapping layer configured to map components of pretraining examples into first representations in a space. The example method also includes performing a pretraining stage on the one or more mapping layers using the pretraining examples. The pretraining stage can include adding noise to the first representations of the components of the pretraining examples to obtain noise-adjusted first representations. The pretraining stage can also include performing a self-supervised learning process to pretrain the one or more mapping layers using at least the first representations of the training data items and the noise-adjusted first representations of the training data items.
-
公开(公告)号:US20170032035A1
公开(公告)日:2017-02-02
申请号:US14811808
申请日:2015-07-28
发明人: Jianfeng Gao , Li Deng , Xiaodong He , Ye-Yi Wang , Kevin Duh , Xiaodong Liu
CPC分类号: G06N3/08 , G06F17/3069 , G06F17/30867
摘要: A system may comprise one or more processors and memory storing instructions that, when executed by one or more processors, configure one or more processors to perform a number of operations or tasks, such as receiving a query or a document, and mapping the query or the document into a lower dimensional representation by performing at least one operational layer that shares at least two disparate tasks.
摘要翻译: 系统可以包括一个或多个处理器和存储器存储指令,当由一个或多个处理器执行时,配置一个或多个处理器来执行多个操作或任务,诸如接收查询或文档,以及映射查询或 通过执行共享至少两个不同任务的至少一个操作层来将文档转换成较低维度的表示。
-
公开(公告)号:US20190377792A1
公开(公告)日:2019-12-12
申请号:US16022001
申请日:2018-06-28
发明人: Minjia Zhang , Xiaodong Liu , Wenhan Wang , Jianfeng Gao , Yuxiong He
摘要: Systems, methods, and computer-executable instructions for approximating a softmax layer are disclosed. A small world graph that includes a plurality of nodes is constructed for a vocabulary of a natural language model. A context vector is transformed. The small world graph is searched using the transformed context vector to identify a top-K hypothesis. A distance from the context vector for each of the top-K hypothesis is determined. The distance is transformed to an original inner product space. A softmax distribution is computed for the softmax layer over the inner product space of the top-K hypothesis. The softmax layer is useful for determining a next word in a speech recognition or machine translation.
-
公开(公告)号:US10089576B2
公开(公告)日:2018-10-02
申请号:US14811808
申请日:2015-07-28
发明人: Jianfeng Gao , Li Deng , Xiaodong He , Ye-Yi Wang , Kevin Duh , Xiaodong Liu
摘要: A system may comprise one or more processors and memory storing instructions that, when executed by one or more processors, configure one or more processors to perform a number of operations or tasks, such as receiving a query or a document, and mapping the query or the document into a lower dimensional representation by performing at least one operational layer that shares at least two disparate tasks.
-
公开(公告)号:US12008459B2
公开(公告)日:2024-06-11
申请号:US16443440
申请日:2019-06-17
发明人: Weizhu Chen , Pengcheng He , Xiaodong Liu , Jianfeng Gao
摘要: This document relates to architectures and training procedures for multi-task machine learning models, such as neural networks. One example method involves providing a multi-task machine learning model having one or more shared layers and two or more task-specific layers. The method can also involve performing a pretraining stage on the one or more shared layers using one or more unsupervised prediction tasks. The method can also involve performing a tuning stage on the one or more shared layers and the two or more task-specific layers using respective task-specific objectives.
-
公开(公告)号:US20240013055A1
公开(公告)日:2024-01-11
申请号:US18373051
申请日:2023-09-26
发明人: Xiaodong Liu , Hao Cheng , Yu Wang , Jianfeng Gao , Weizhu Chen , Pengcheng He , Hoifung Poon
CPC分类号: G06N3/084 , G06N20/00 , G06N3/08 , G06N3/088 , G06V10/82 , G06F18/24 , G06V10/7784 , G06F40/284
摘要: This document relates to training of machine learning models. One example method involves providing a machine learning model having one or more mapping layers. The one or more mapping layers can include at least a first mapping layer configured to map components of pretraining examples into first representations in a space. The example method also includes performing a pretraining stage on the one or more mapping layers using the pretraining examples. The pretraining stage can include adding noise to the first representations of the components of the pretraining examples to obtain noise-adjusted first representations. The pretraining stage can also include performing a self-supervised learning process to pretrain the one or more mapping layers using at least the first representations of the training data items and the noise-adjusted first representations of the training data items.
-
7.
公开(公告)号:US11526679B2
公开(公告)日:2022-12-13
申请号:US16910508
申请日:2020-06-24
发明人: Pengcheng He , Xiaodong Liu , Jianfeng Gao , Weizhu Chen
摘要: Systems and methods are provided for facilitating the building and use of natural language understanding models. The systems and methods identify a plurality of tokens and use them to generate one or more pre-trained natural language models using a transformer. The transformer disentangles the content embedding and positional embedding in the computation of its attention matrix. Systems and methods are also provided to facilitate self-training of the pre-trained natural language model by utilizing multi-step decoding to better reconstruct masked tokens and improve pre-training convergence.
-
8.
公开(公告)号:US12061876B2
公开(公告)日:2024-08-13
申请号:US18078530
申请日:2022-12-09
发明人: Pengcheng He , Xiaodong Liu , Jianfeng Gao , Weizhu Chen
摘要: Systems and methods are provided for facilitating the building and use of natural language understanding models. The systems and methods identify a plurality of tokens and use them to generate one or more pre-trained natural language models using a transformer. The transformer disentangles the content embedding and positional embedding in the computation of its attention matrix. Systems and methods are also provided to facilitate self-training of the pre-trained natural language model by utilizing multi-step decoding to better reconstruct masked tokens and improve pre-training convergence.
-
公开(公告)号:US11803758B2
公开(公告)日:2023-10-31
申请号:US16882296
申请日:2020-05-22
发明人: Xiaodong Liu , Hao Cheng , Yu Wang , Jianfeng Gao , Weizhu Chen , Pengcheng He , Hoifung Poon
IPC分类号: G06N3/084 , G06N20/00 , G06N3/08 , G06N3/088 , G06V10/82 , G06F18/24 , G06V10/778 , G06F40/284
CPC分类号: G06N3/084 , G06F18/24 , G06N3/08 , G06N3/088 , G06N20/00 , G06V10/7784 , G06V10/82 , G06F40/284 , G06T2207/20081
摘要: This document relates to training of machine learning models. One example method involves providing a machine learning model having one or more mapping layers. The one or more mapping layers can include at least a first mapping layer configured to map components of pretraining examples into first representations in a space. The example method also includes performing a pretraining stage on the one or more mapping layers using the pretraining examples. The pretraining stage can include adding noise to the first representations of the components of the pretraining examples to obtain noise-adjusted first representations. The pretraining stage can also include performing a self-supervised learning process to pretrain the one or more mapping layers using at least the first representations of the training data items and the noise-adjusted first representations of the training data items.
-
10.
公开(公告)号:US11676001B2
公开(公告)日:2023-06-13
申请号:US17093426
申请日:2020-11-09
发明人: Jian Jiao , Xiaodong Liu , Ruofei Zhang , Jianfeng Gao
IPC分类号: G06N3/045
CPC分类号: G06N3/045
摘要: Knowledge graphs can greatly improve the quality of content recommendation systems. There is a broad variety of knowledge graphs in the domain including clicked user-ad graphs, clicked query-ad graphs, keyword-display URL graphs etc. A hierarchical Transformer model learns entity embeddings in knowledge graphs. The model consists of two different Transformer blocks where the bottom block generates relation-dependent embeddings for the source entity and its neighbors, and the top block aggregates the outputs from the bottom block to produce the target entity embedding. To balance the information from contextual entities and the source entity itself, a masked entity model (MEM) task is combined with a link prediction task in model training.
-
-
-
-
-
-
-
-
-