GENERATING VIDEOS USING DIFFUSION MODELS
    1.
    发明公开

    公开(公告)号:US20240338936A1

    公开(公告)日:2024-10-10

    申请号:US18296938

    申请日:2023-04-06

    Applicant: Google LLC

    CPC classification number: G06V10/82 G06V10/771 H04N7/0117 H04N7/013

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output video conditioned on an input. In one aspect, a method comprises receiving the input; initializing a current intermediate representation; generating an output video by updating the current intermediate representation at each of a plurality of iterations, wherein the updating comprises, at each iteration: processing an intermediate input for the iteration comprising the current intermediate representation using a diffusion model that is configured to process the intermediate input to generate a noise output; and updating the current intermediate representation using the noise output for the iteration.

    GENERATING NEURAL NETWORK OUTPUTS USING INSERTION COMMANDS

    公开(公告)号:US20240028893A1

    公开(公告)日:2024-01-25

    申请号:US18321696

    申请日:2023-05-22

    Applicant: Google LLC

    CPC classification number: G06N3/08 G06F40/237 G06N3/04 G06N3/084

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing sequence modeling tasks using insertions. One of the methods includes receiving a system input that includes one or more source elements from a source sequence and zero or more target elements from a target sequence, wherein each source element is selected from a vocabulary of source elements and wherein each target element is selected from a vocabulary of target elements; generating a partial concatenated sequence that includes the one or more source elements from the source sequence and the zero or more target elements from the target sequence, wherein the source and target elements arranged in the partial concatenated sequence according to a combined order; and generating a final concatenated sequence that includes a finalized source sequence and a finalized target sequence, wherein the finalized target sequence includes one or more target elements.

    Speech recognition with attention-based recurrent neural networks

    公开(公告)号:US11151985B2

    公开(公告)日:2021-10-19

    申请号:US16713298

    申请日:2019-12-13

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps, processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence, processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance.

Patent Agency Ranking