END-TO-END SPEECH WAVEFORM GENERATION THROUGH DATA DENSITY GRADIENT ESTIMATION

    公开(公告)号:US20230252974A1

    公开(公告)日:2023-08-10

    申请号:US18010438

    申请日:2021-09-02

    申请人: Google LLC

    IPC分类号: G10L13/08 G10L21/0208

    CPC分类号: G10L13/08 G10L21/0208

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating waveforms conditioned on phoneme sequences. In one aspect, a method comprises: obtaining a phoneme sequence; processing the phoneme sequence using an encoder neural network to generate a hidden representation of the phoneme sequence; generating, from the hidden representation, a conditioning input; initializing a current waveform output; and generating a final waveform output that defines an utterance of the phoneme sequence by a speaker by updating the current waveform output at each of a plurality of iterations, wherein each iteration corresponds to a respective noise level, and wherein the updating comprises, at each iteration: processing (i) the current waveform output and (ii) the conditioning input using a noise estimation neural network to generate a noise output; and updating the current waveform output using the noise output and the noise level for the iteration.

    Training policy neural networks using path consistency learning

    公开(公告)号:US11429844B2

    公开(公告)日:2022-08-30

    申请号:US16904785

    申请日:2020-06-18

    申请人: Google LLC

    IPC分类号: G06N3/04 G06N3/08

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network used to select actions to be performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes obtaining path data defining a path through the environment traversed by the agent. A consistency error is determined for the path from a combined reward, first and last soft-max state values, and a path likelihood. A value update for the current values of the policy neural network parameters is determined from at least the consistency error. The value update is used to adjust the current values of the policy neural network parameters.

    Systems and methods for contrastive learning of visual representations

    公开(公告)号:US11354778B2

    公开(公告)日:2022-06-07

    申请号:US16847163

    申请日:2020-04-13

    申请人: Google LLC

    摘要: Provided are systems and methods for contrastive learning of visual representations. In particular, the present disclosure provides systems and methods that leverage particular data augmentation schemes and a learnable nonlinear transformation between the representation and the contrastive loss to provide improved visual representations. In contrast to certain existing techniques, the contrastive self-supervised learning algorithms described herein do not require specialized architectures or a memory bank. Some example implementations of the proposed approaches can be referred to as a simple framework for contrastive learning of representations or “SimCLR.” Further example aspects are described below and provide the following benefits and insights.

    Device placement optimization with reinforcement learning

    公开(公告)号:US10692003B2

    公开(公告)日:2020-06-23

    申请号:US16445330

    申请日:2019-06-19

    申请人: Google LLC

    摘要: A method for determining a placement for machine learning model operations across multiple hardware devices is described. The method includes receiving data specifying a machine learning model to be placed for distributed processing on multiple hardware devices; generating, from the data, a sequence of operation embeddings, each operation embedding in the sequence characterizing respective operations necessary to perform the processing of the machine learning model; processing the sequence of operation embeddings using a placement recurrent neural network in accordance with first values of a plurality network parameters of the placement recurrent neural network to generate a network output that defines a placement of the operations characterized by the operation embeddings in the sequence across the plurality of devices; and scheduling the machine learning model for processing by the multiple hardware devices by placing the operations on the multiple devices according to the placement defined by the network output.

    Training sequence generation neural networks using quality scores

    公开(公告)号:US10540585B2

    公开(公告)日:2020-01-21

    申请号:US16421406

    申请日:2019-05-23

    申请人: Google LLC

    IPC分类号: G06N3/08 G10L25/30 G10L15/16

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a sequence generation neural network. One of the methods includes obtaining a batch of training examples; for each of the training examples: processing the training network input in the training example using the neural network to generate an output sequence; for each particular output position in the output sequence: identifying a prefix that includes the system outputs at positions before the particular output position in the output sequence, for each possible system output in the vocabulary, determining a highest quality score that can be assigned to any candidate output sequence that includes the prefix followed by the possible system output, and determining an update to the current values of the network parameters that increases a likelihood that the neural network generates a system output at the position that has a high quality score.

    SEQUENCE MODELING USING IMPUTATION

    公开(公告)号:US20230075716A1

    公开(公告)日:2023-03-09

    申请号:US17797872

    申请日:2021-02-08

    申请人: Google LLC

    IPC分类号: G06F40/47 G06F40/284

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sequence modeling. One of the methods includes receiving an input sequence having a plurality of input positions; determining a plurality of blocks of consecutive input positions; processing the input sequence using a neural network to generate a latent alignment, comprising, at each of a plurality of input time steps: receiving a partial latent alignment from a previous input time step; selecting an input position in each block, wherein the token at the selected input position of the partial latent alignment in each block is a mask token; and processing the partial latent alignment and the input sequence using the neural network to generate a new latent alignment, wherein the new latent alignment comprises, at the selected input position in each block, an output token or a blank token; and generating, using the latent alignment, an output sequence.

    NEURAL MACHINE TRANSLATION SYSTEMS
    10.
    发明申请

    公开(公告)号:US20210390271A1

    公开(公告)日:2021-12-16

    申请号:US17459111

    申请日:2021-08-27

    申请人: Google LLC

    IPC分类号: G06F40/58 G06F40/44 G06N3/04

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural machine translation. The method comprises obtaining a first sequence of words in a source language, generating a modified sequence of words in the source language by inserting a word boundary symbol only at the beginning of each word in the first sequence of words and not at the end of each word, dividing the modified sequence of words into wordpieces using a wordpiece model, generating, from the wordpieces, an input sequence of input tokens for a neural machine translation system; and generating an output sequence of words using the neural machine translation system based on the input sequence of input tokens.