LEARNING ENVIRONMENT REPRESENTATIONS FOR AGENT CONTROL USING PREDICTIONS OF BOOTSTRAPPED LATENTS

    公开(公告)号:US20230083486A1

    公开(公告)日:2023-03-16

    申请号:US17797886

    申请日:2021-02-08

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an environment representation neural network of a reinforcement learning system controls an agent to perform a given task. In one aspect, the method includes: receiving a current observation input and a future observation input; generating, from the future observation input, a future latent representation of the future state of the environment; processing, using the environment representation neural network, to generate a current internal representation of the current state of the environment; generating, from the current internal representation, a predicted future latent representation; evaluating an objective function measuring a difference between the future latent representation and the predicted future latent representation; and determining, based on a determined gradient of the objective function, an update to the current values of the environment representation parameters.

    DISCRETE TOKEN PROCESSING USING DIFFUSION MODELS

    公开(公告)号:US20240119261A1

    公开(公告)日:2024-04-11

    申请号:US18374447

    申请日:2023-09-28

    CPC classification number: G06N3/045

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an output sequence of discrete tokens using a diffusion model. In one aspect, a method includes generating, by using the diffusion model, a final latent representation of the sequence of discrete tokens that includes a determined value for each of a plurality of latent variables; applying a de-embedding matrix to the final latent representation of the output sequence of discrete tokens to generate a de-embedded final latent representation that includes, for each of the plurality of latent variables, a respective numeric score for each discrete token in a vocabulary of multiple discrete tokens; selecting, for each of the plurality of latent variables, a discrete token from among the multiple discrete tokens in the vocabulary that has a highest numeric score; and generating the output sequence of discrete tokens that includes the selected discrete tokens.

Patent Agency Ranking