VERY DEEP CONVOLUTIONAL NEURAL NETWORKS FOR END-TO-END SPEECH RECOGNITION

    公开(公告)号:US20190236451A1

    公开(公告)日:2019-08-01

    申请号:US16380101

    申请日:2019-04-10

    Applicant: Google LLC

    Abstract: A speech recognition neural network system includes an encoder neural network and a decoder neural network. The encoder neural network generates an encoded sequence from an input acoustic sequence that represents an utterance. The input acoustic sequence includes a respective acoustic feature representation at each of a plurality of input time steps, the encoded sequence includes a respective encoded representation at each of a plurality of time reduced time steps, and the number of time reduced time steps is less than the number of input time steps. The encoder neural network includes a time reduction subnetwork, a convolutional LSTM subnetwork, and a network in network subnetwork. The decoder neural network receives the encoded sequence and processes the encoded sequence to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings.

    Sequence modeling using imputation
    34.
    发明授权

    公开(公告)号:US12242818B2

    公开(公告)日:2025-03-04

    申请号:US17797872

    申请日:2021-02-08

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sequence modeling. One of the methods includes receiving an input sequence having a plurality of input positions; determining a plurality of blocks of consecutive input positions; processing the input sequence using a neural network to generate a latent alignment, comprising, at each of a plurality of input time steps: receiving a partial latent alignment from a previous input time step; selecting an input position in each block, wherein the token at the selected input position of the partial latent alignment in each block is a mask token; and processing the partial latent alignment and the input sequence using the neural network to generate a new latent alignment, wherein the new latent alignment comprises, at the selected input position in each block, an output token or a blank token; and generating, using the latent alignment, an output sequence.

    GENERATING OUTPUT SEQUENCES FROM INPUT SEQUENCES USING NEURAL NETWORKS

    公开(公告)号:US20220138531A1

    公开(公告)日:2022-05-05

    申请号:US17575394

    申请日:2022-01-13

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences from input sequences. One of the methods includes obtaining an input sequence having a first number of inputs arranged according to an input order; processing each input in the input sequence using an encoder recurrent neural network to generate a respective encoder hidden state for each input in the input sequence; and generating an output sequence having a second number of outputs arranged according to an output order, each output in the output sequence being selected from the inputs in the input sequence, comprising, for each position in the output order: generating a softmax output for the position using the encoder hidden states that is a pointer into the input sequence; and selecting an input from the input sequence as the output at the position using the softmax output.

    Generating output sequences from input sequences using neural networks

    公开(公告)号:US11227206B1

    公开(公告)日:2022-01-18

    申请号:US16552495

    申请日:2019-08-27

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences from input sequences. One of the methods includes obtaining an input sequence having a first number of inputs arranged according to an input order; processing each input in the input sequence using an encoder recurrent neural network to generate a respective encoder hidden state for each input in the input sequence; and generating an output sequence having a second number of outputs arranged according to an output order, each output in the output sequence being selected from the inputs in the input sequence, comprising, for each position in the output order: generating a softmax output for the position using the encoder hidden states that is a pointer into the input sequence; and selecting an input from the input sequence as the output at the position using the softmax output.

    Processing text sequences using neural networks

    公开(公告)号:US11182566B2

    公开(公告)日:2021-11-23

    申请号:US16338174

    申请日:2017-10-03

    Applicant: Google LLC

    Abstract: A computer-implemented method for training a neural network that is configured to generate a score distribution over a set of multiple output positions. The neural network is configured to process a network input to generate a respective score distribution for each of a plurality of output positions including a respective score for each token in a predetermined set of tokens that includes n-grams of multiple different sizes. Example methods described herein provide trained neural networks which produce results with improved accuracy compared to the state of the art, e.g. translations that are more accurate compared to the state of the art, or more accurate speech recognition compared to the state of the art.

Patent Agency Ranking