TRANSFORMER-BASED ENCODING INCORPORATING METADATA

发明申请

US20220358288A1 TRANSFORMER-BASED ENCODING INCORPORATING METADATA 有权

请登陆查看更多内容

专利标题： TRANSFORMER-BASED ENCODING INCORPORATING METADATA
申请号： US17308575

申请日： 2021-05-05
公开(公告)号： US20220358288A1

公开(公告)日： 2022-11-10
发明人: Hui Wan , Xiaodong Cui , Luis A. Lastras-Montano
申请人： International Business Machines Corporation
申请人地址： US NY Armonk
专利权人： International Business Machines Corporation
当前专利权人： International Business Machines Corporation
当前专利权人地址： US NY Armonk
主分类号： G06F40/284
IPC分类号： G06F40/284 ; G06F40/205 ; G06F40/237 ; G06F40/30 ; G06F40/42 ; G06K9/66

TRANSFORMER-BASED ENCODING INCORPORATING METADATA

摘要：

From metadata of a corpus of natural language text documents, a relativity matrix is constructed, a row-column intersection in the relativity matrix corresponding to a relationship between two instances of a type of metadata. An encoder model is trained, generating a trained encoder model, to compute an embedding corresponding to a token of a natural language text document within the corpus and the relativity matrix, the encoder model comprising a first encoder layer, the first encoder layer comprising a token embedding portion, a relativity embedding portion, a token self-attention portion, a metadata self-attention portion, and a fusion portion, the training comprising adjusting a set of parameters of the encoder model.

公开/授权文献

US11893346B2 Transformer-based encoding incorporating metadata 公开/授权日：2024-02-06

信息查询

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F40/00	处理自然语言数据（语音分析或综合，语音识别G10L）
G06F40/20	.自然语言分析（自然语言的语义分析入G06F40/30）
G06F40/279	..文字实体的识别
G06F40/284	...词汇分析，例如标记或搭配词