Attention-based decoder-only sequence transduction neural networks

    公开(公告)号:US12271817B2

    公开(公告)日:2025-04-08

    申请号:US18403966

    申请日:2024-01-04

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

    Distributing tensor computations across computing devices

    公开(公告)号:US12265903B2

    公开(公告)日:2025-04-01

    申请号:US17063034

    申请日:2020-10-05

    Applicant: Google LLC

    Inventor: Noam M. Shazeer

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing tensor computations across computing devices. One of the methods includes: receiving specification data that specifies a distribution of tensor computations among a plurality of computing devices, wherein each tensor computation (i) is defined to receive, as input, one or more respective input tensors each having one or more respective input dimensions, (ii) is defined to generate, as output, one or more respective output tensors each having one or more respective output dimensions, or both, wherein the specification data specifies a respective layout for each input and output tensor that assigns each dimension of the input or output tensor to one or more of the plurality of computing devices; assigning, based on the layouts for the input and output tensors, respective device-local operations to each of the computing devices; and causing the tensor computations to be executed.

    Mixture of experts neural networks
    35.
    发明授权

    公开(公告)号:US11790214B2

    公开(公告)日:2023-10-17

    申请号:US16879187

    申请日:2020-05-20

    Applicant: Google LLC

    CPC classification number: G06N3/045 G06N3/08

    Abstract: A system includes a neural network that includes a Mixture of Experts (MoE) subnetwork between a first neural network layer and a second neural network layer. The MoE subnetwork includes multiple expert neural networks. Each expert neural network is configured to process a first layer output generated by the first neural network layer to generate a respective expert output. The MoE subnetwork further includes a gating subsystem that selects, based on the first layer output, one or more of the expert neural networks and determine a respective weight for each selected expert neural network, provides the first layer output as input to each of the selected expert neural networks, combines the expert outputs generated by the selected expert neural networks in accordance with the weights for the selected expert neural networks to generate an MoE output, and provides the MoE output as input to the second neural network layer.

    ATTENTION-BASED IMAGE GENERATION NEURAL NETWORKS

    公开(公告)号:US20230076971A1

    公开(公告)日:2023-03-09

    申请号:US17867242

    申请日:2022-07-18

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output image. In one aspect, one of the methods includes generating the output image intensity value by intensity value according to a generation order of pixel—color channel pairs from the output image, comprising, for each particular generation order position in the generation order: generating a current output image representation of a current output image, processing the current output image representation using a decoder neural network to generate a probability distribution over possible intensity values for the pixel—color channel pair at the particular generation order position, wherein the decoder neural network includes one or more local masked self-attention sub-layers; and selecting an intensity value for the pixel—color channel pair at the particular generation order position using the probability distribution.

    Attention-based decoder-only sequence transduction neural networks

    公开(公告)号:US11556786B2

    公开(公告)日:2023-01-17

    申请号:US16759690

    申请日:2018-10-29

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. One of the methods includes, at each of a plurality of generation time steps: generating a combined sequence for the generation time step that includes the input sequence followed by the output tokens that have already been generated as of the generation time step; processing the combined sequence using a self-attention decoder neural network to generate a time step output that defines a score distribution over a set of possible output tokens; and selecting, using the time step output, an output token from the set of possible output tokens as the next output token in the output sequence.

    SPEECH RECOGNITION WITH ATTENTION-BASED RECURRENT NEURAL NETWORKS

    公开(公告)号:US20220028375A1

    公开(公告)日:2022-01-27

    申请号:US17450235

    申请日:2021-10-07

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps; processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence; processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance.

    Attention-based sequence transduction neural networks

    公开(公告)号:US11113602B2

    公开(公告)日:2021-09-07

    申请号:US16932422

    申请日:2020-07-17

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an output sequence from an input sequence. In one aspect, one of the systems includes an encoder neural network configured to receive the input sequence and generate encoded representations of the network inputs, the encoder neural network comprising a sequence of one or more encoder subnetworks, each encoder subnetwork configured to receive a respective encoder subnetwork input for each of the input positions and to generate a respective subnetwork output for each of the input positions, and each encoder subnetwork comprising: an encoder self-attention sub-layer that is configured to receive the subnetwork input for each of the input positions and, for each particular input position in the input order: apply an attention mechanism over the encoder subnetwork inputs using one or more queries derived from the encoder subnetwork input at the particular input position.

    DISTRIBUTING TENSOR COMPUTATIONS ACROSS COMPUTING DEVICES

    公开(公告)号:US20200042875A1

    公开(公告)日:2020-02-06

    申请号:US16532381

    申请日:2019-08-05

    Applicant: Google LLC

    Inventor: Noam M. Shazeer

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing tensor computations across computing devices. One of the methods includes: receiving specification data that specifies a distribution of tensor computations among a plurality of computing devices, wherein each tensor computation (i) is defined to receive, as input, one or more respective input tensors each having one or more respective input dimensions, (ii) is defined to generate, as output, one or more respective output tensors each having one or more respective output dimensions, or both, wherein the specification data specifies a respective layout for each input and output tensor that assigns each dimension of the input or output tensor to one or more of the plurality of computing devices; assigning, based on the layouts for the input and output tensors, respective device-local operations to each of the computing devices; and causing the tensor computations to be executed.

Patent Agency Ranking