SYSTEMS AND METHODS FOR MULTI-SCALE PRE-TRAINING WITH DENSELY CONNECTED TRANSFORMER

Invention Application

US20220129626A1 SYSTEMS AND METHODS FOR MULTI-SCALE PRE-TRAINING WITH DENSELY CONNECTED TRANSFORMER 有权

Please log in to see more content

Patent Title: SYSTEMS AND METHODS FOR MULTI-SCALE PRE-TRAINING WITH DENSELY CONNECTED TRANSFORMER
Application No.: US17080478

Application Date: 2020-10-26
Publication No.: US20220129626A1

Publication Date: 2022-04-28
Inventor: Linqing Liu , Caiming Xiong
Applicant: salesforce.com, inc.
Applicant Address: US CA San Francisco
Assignee: salesforce.com, inc.
Current Assignee: salesforce.com, inc.
Current Assignee Address: US CA San Francisco
Main IPC: G06F40/20
IPC: G06F40/20 ; G06N3/04

SYSTEMS AND METHODS FOR MULTI-SCALE PRE-TRAINING WITH DENSELY CONNECTED TRANSFORMER

Abstract:

Embodiments described herein propose a densely connected Transformer architecture in which each Transformer layer takes advantages of all previous layers. Specifically, the input for each Transformer layer comes from the outputs of all its preceding layers; and the output information of each layer will be incorporated in all its subsequent layers. In this way, a L-layer Transformer network will have L(L+1)/2 connections. In this way, the dense connection allows the linguistic information learned by the lower layer to be directly propagated to all upper layers and encourages feature reuse throughout the network. Each layer is thus directly optimized from the loss function in the fashion of implicit deep supervision.

Public/Granted literature

US11941356B2 Systems and methods for multi-scale pre-training with densely connected transformer Public/Granted day:2024-03-26

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F40/00	处理自然语言数据（语音分析或综合，语音识别G10L）
G06F40/20	.自然语言分析（自然语言的语义分析入G06F40/30）