Generating audio using neural networks

    公开(公告)号:US10803884B2

    公开(公告)日:2020-10-13

    申请号:US16390549

    申请日:2019-04-22

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output sequence of audio data that comprises a respective audio sample at each of a plurality of time steps. One of the methods includes, for each of the time steps: providing a current sequence of audio data as input to a convolutional subnetwork, wherein the current sequence comprises the respective audio sample at each time step that precedes the time step in the output sequence, and wherein the convolutional subnetwork is configured to process the current sequence of audio data to generate an alternative representation for the time step; and providing the alternative representation for the time step as input to an output layer, wherein the output layer is configured to: process the alternative representation to generate an output that defines a score distribution over a plurality of possible audio samples for the time step.

    Generating audio using neural networks

    公开(公告)号:US10304477B2

    公开(公告)日:2019-05-28

    申请号:US16030742

    申请日:2018-07-09

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output sequence of audio data that comprises a respective audio sample at each of a plurality of time steps. One of the methods includes, for each of the time steps: providing a current sequence of audio data as input to a convolutional subnetwork, wherein the current sequence comprises the respective audio sample at each time step that precedes the time step in the output sequence, and wherein the convolutional subnetwork is configured to process the current sequence of audio data to generate an alternative representation for the time step; and providing the alternative representation for the time step as input to an output layer, wherein the output layer is configured to: process the alternative representation to generate an output that defines a score distribution over a plurality of possible audio samples for the time step.

    SPEECH RECOGNITION USING CONVOLUTIONAL NEURAL NETWORKS

    公开(公告)号:US20190108833A1

    公开(公告)日:2019-04-11

    申请号:US16209661

    申请日:2018-12-04

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech recognition by generating a neural network output from an audio data input sequence, where the neural network output characterizes words spoken in the audio data input sequence. One of the methods includes, for each of the audio data inputs, providing a current audio data input sequence that comprises the audio data input and the audio data inputs preceding the audio data input in the audio data input sequence to a convolutional subnetwork comprising a plurality of dilated convolutional neural network layers, wherein the convolutional subnetwork is configured to, for each of the plurality of audio data inputs: receive the current audio data input sequence for the audio data input, and process the current audio data input sequence to generate an alternative representation for the audio data input.

    GENERATING DISCRETE LATENT REPRESENTATIONS OF INPUT DATA ITEMS

    公开(公告)号:US20240354566A1

    公开(公告)日:2024-10-24

    申请号:US18623952

    申请日:2024-04-01

    CPC classification number: G06N3/08 G06N3/04

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating discrete latent representations of input data items. One of the methods includes receiving an input data item; providing the input data item as input to an encoder neural network to obtain an encoder output for the input data item; and generating a discrete latent representation of the input data item from the encoder output, comprising: for each of the latent variables, determining, from a set of latent embedding vectors in the memory, a latent embedding vector that is nearest to the encoded vector for the latent variable.

    Generating discrete latent representations of input data items

    公开(公告)号:US11948075B2

    公开(公告)日:2024-04-02

    申请号:US16620815

    申请日:2018-06-11

    CPC classification number: G06N3/08 G06N3/04

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating discrete latent representations of input data items. One of the methods includes receiving an input data item; providing the input data item as input to an encoder neural network to obtain an encoder output for the input data item; and generating a discrete latent representation of the input data item from the encoder output, comprising: for each of the latent variables, determining, from a set of latent embedding vectors in the memory, a latent embedding vector that is nearest to the encoded vector for the latent variable.

Patent Agency Ranking