Invention Application
- Patent Title: SYSTEMS AND METHODS FOR MULTI-SCALE PRE-TRAINING WITH DENSELY CONNECTED TRANSFORMER
-
Application No.: US17080478Application Date: 2020-10-26
-
Publication No.: US20220129626A1Publication Date: 2022-04-28
- Inventor: Linqing Liu , Caiming Xiong
- Applicant: salesforce.com, inc.
- Applicant Address: US CA San Francisco
- Assignee: salesforce.com, inc.
- Current Assignee: salesforce.com, inc.
- Current Assignee Address: US CA San Francisco
- Main IPC: G06F40/20
- IPC: G06F40/20 ; G06N3/04

Abstract:
Embodiments described herein propose a densely connected Transformer architecture in which each Transformer layer takes advantages of all previous layers. Specifically, the input for each Transformer layer comes from the outputs of all its preceding layers; and the output information of each layer will be incorporated in all its subsequent layers. In this way, a L-layer Transformer network will have L(L+1)/2 connections. In this way, the dense connection allows the linguistic information learned by the lower layer to be directly propagated to all upper layers and encourages feature reuse throughout the network. Each layer is thus directly optimized from the loss function in the fashion of implicit deep supervision.
Public/Granted literature
- US11941356B2 Systems and methods for multi-scale pre-training with densely connected transformer Public/Granted day:2024-03-26
Information query