Sample-efficient adaptive text-to-speech

    公开(公告)号:US11355097B2

    公开(公告)日:2022-06-07

    申请号:US17061437

    申请日:2020-10-01

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an adaptive audio-generation model. One of the methods includes generating an adaptive audio-generation model including learning a plurality of embedding vectors and parameter values of a neural network using training data comprising first text and audio data representing a plurality of different individual speakers speaking portions of the first text, wherein the plurality of embedding vectors represent respective voice characteristics of the plurality of different individual speakers. The adaptive audio-generation model is adapted for a new individual speaker using adaptation data comprising second text and audio data representing the new individual speaker speaking portions of the second text, the new individual speaker being different from each of the plurality of individual speakers, wherein adapting the audio-generation model includes learning a new embedding vector for the new individual speaker.

    Generating video frames using neural networks

    公开(公告)号:US11144782B2

    公开(公告)日:2021-10-12

    申请号:US16338338

    申请日:2017-09-29

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating video frames using neural networks. One of the methods includes processing a sequence of video frames using an encoder neural network to generate an encoded representation; and generating a predicted next frame pixel by pixel according to a pixel order and a channel order, comprising: for each color channel of each pixel, providing as input to a decoder neural network (i) the encoded representation, (ii) color values for any pixels before the pixel in the pixel order, and (iii) color values for the pixel for any color channels before the color channel in the channel order, wherein the decoder neural network is configured to generate an output defining a score distribution over a plurality of possible color values, and determining the color value for the color channel of the pixel by sampling from the score distribution.

    Committed information rate variational autoencoders

    公开(公告)号:US10671889B2

    公开(公告)日:2020-06-02

    申请号:US16586014

    申请日:2019-09-27

    Abstract: A variational autoencoder (VAE) neural network system, comprising an encoder neural network to encode an input data item to define a posterior distribution for a set of latent variables, and a decoder neural network to generate an output data item representing values of a set of latent variables sampled from the posterior distribution. The system is configured for training with an objective function including a term dependent on a difference between the posterior distribution and a prior distribution. The prior and posterior distributions are arranged so that they cannot be matched to one another. The VAE system may be used for compressing and decompressing data.

    COMMITTED INFORMATION RATE VARIATIONAL AUTOENCODERS

    公开(公告)号:US20200104640A1

    公开(公告)日:2020-04-02

    申请号:US16586014

    申请日:2019-09-27

    Abstract: A variational autoencoder (VAE) neural network system, comprising an encoder neural network to encode an input data item to define a posterior distribution for a set of latent variables, and a decoder neural network to generate an output data item representing values of a set of latent variables sampled from the posterior distribution. The system is configured for training with an objective function including a term dependent on a difference between the posterior distribution and a prior distribution. The prior and posterior distributions are arranged so that they cannot be matched to one another. The VAE system may be used for compressing and decompressing data.

Patent Agency Ranking