Training giant neural networks using pipeline parallelism

    公开(公告)号:US11232356B2

    公开(公告)日:2022-01-25

    申请号:US16989787

    申请日:2020-08-10

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training giant neural networks. One of the methods includes obtaining data specifying a partitioning of the neural network into N composite layers that form a sequence of composite layers, wherein each composite layer comprises a distinct plurality of layers from the multiple network layers of the neural network; obtaining data assigning each of the N composite layers to one or more computing devices from a set of N computing devices; partitioning a mini-batch of training examples into a plurality of micro-batches; and training the neural network, comprising: performing a forward pass through the neural network until output activations have been computed for each micro-batch for a final composite layer in the sequence, and performing a backward pass through the neural network until output gradients have been computed for each micro-batch for the first composite layer in the sequence.

    Neural machine translation systems
    66.
    发明授权

    公开(公告)号:US11113480B2

    公开(公告)日:2021-09-07

    申请号:US16336870

    申请日:2017-09-25

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural machine translation. One of the systems includes an encoder neural network comprising: an input forward long short-term memory (LSTM) layer configured to process each input token in the input sequence in a forward order to generate a respective forward representation of each input token, an input backward LSTM layer configured to process each input token in a backward order to generate a respective backward representation of each input token and a plurality of hidden LSTM layers configured to process a respective combined representation of each of the input tokens in the forward order to generate a respective encoded representation of each of the input tokens; and a decoder subsystem configured to receive the respective encoded representations and to process the encoded representations to generate an output sequence.

    MINIMUM WORD ERROR RATE TRAINING FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS

    公开(公告)号:US20200043483A1

    公开(公告)日:2020-02-06

    申请号:US16529252

    申请日:2019-08-01

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

    NEURAL MACHINE TRANSLATION SYSTEMS
    69.
    发明申请

    公开(公告)号:US20200034435A1

    公开(公告)日:2020-01-30

    申请号:US16336870

    申请日:2017-09-25

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural machine translation. One of the systems includes an encoder neural network comprising: an input forward long short-term memory (LSTM) layer configured to process each input token in the input sequence in a forward order to generate a respective forward representation of each input token, an input backward LSTM layer configured to process each input token in a backward order to generate a respective backward representation of each input token and a plurality of hidden LSTM layers configured to process a respective combined representation of each of the input tokens in the forward order to generate a respective encoded representation of each of the input tokens; and a decoder subsystem configured to receive the respective encoded representations and to process the encoded representations to generate an output sequence.

Patent Agency Ranking