Unsupervised detection of intermediate reinforcement learning goals

    公开(公告)号:US12106200B2

    公开(公告)日:2024-10-01

    申请号:US18168000

    申请日:2023-02-13

    Applicant: Google LLC

    Inventor: Pierre Sermanet

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting intermediate reinforcement learning goals. One of the methods includes obtaining a plurality of demonstration sequences, each of the demonstration sequences being a sequence of images of an environment while a respective instance of a reinforcement learning task is being performed; for each demonstration sequence, processing each image in the demonstration sequence through an image processing neural network to determine feature values for a respective set of features for the image; determining, from the demonstration sequences, a partitioning of the reinforcement learning task into a plurality of subtasks, wherein each image in each demonstration sequence is assigned to a respective subtask of the plurality of subtasks; and determining, from the feature values for the images in the demonstration sequences, a respective set of discriminative features for each of the plurality of subtasks.

    MIRROR LOSS NEURAL NETWORKS
    4.
    发明申请

    公开(公告)号:US20230020615A1

    公开(公告)日:2023-01-19

    申请号:US17893454

    申请日:2022-08-23

    Applicant: Google LLC

    Inventor: Pierre Sermanet

    Abstract: This description relates to a neural network that has multiple network parameters and is configured to receive an input observation characterizing a state of an environment and to process the input observation to generate a numeric embedding of the state of the environment. The neural network can be used to control a robotic agent. The network can be trained using a method comprising: obtaining a first observation captured by a first modality; obtaining a second observation that is co-occurring with the first observation and that is captured by a second, different modality; obtaining a third observation captured by the first modality that is not co-occurring with the first observation; determining a gradient of a triplet loss that uses the first observation, the second observation, and the third observation; and updating current values of the network parameters using the gradient of the triplet loss.

    Unsupervised detection of intermediate reinforcement learning goals

    公开(公告)号:US11580360B2

    公开(公告)日:2023-02-14

    申请号:US16347651

    申请日:2017-11-06

    Applicant: GOOGLE LLC

    Inventor: Pierre Sermanet

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting intermediate reinforcement learning goals. One of the methods includes obtaining a plurality of demonstration sequences, each of the demonstration sequences being a sequence of images of an environment while a respective instance of a reinforcement learning task is being performed; for each demonstration sequence, processing each image in the demonstration sequence through an image processing neural network to determine feature values for a respective set of features for the image; determining, from the demonstration sequences, a partitioning of the reinforcement learning task into a plurality of subtasks, wherein each image in each demonstration sequence is assigned to a respective subtask of the plurality of subtasks; and determining, from the feature values for the images in the demonstration sequences, a respective set of discriminative features for each of the plurality of subtasks.

    Mirror loss neural networks
    9.
    发明授权

    公开(公告)号:US11453121B2

    公开(公告)日:2022-09-27

    申请号:US16468987

    申请日:2018-03-19

    Applicant: Google LLC

    Inventor: Pierre Sermanet

    Abstract: This description relates to a neural network that has multiple network parameters and is configured to receive an input observation characterizing a state of an environment and to process the input observation to generate a numeric embedding of the state of the environment. The neural network can be used to control a robotic agent. The network can be trained using a method comprising: obtaining a first observation captured by a first modality; obtaining a second observation that is co-occurring with the first observation and that is captured by a second, different modality; obtaining a third observation captured by the first modality that is not co-occurring with the first observation; determining a gradient of a triplet loss that uses the first observation, the second observation, and the third observation; and updating current values of the network parameters using the gradient of the triplet loss.

    CONTROLLING AGENTS USING LATENT PLANS

    公开(公告)号:US20220076099A1

    公开(公告)日:2022-03-10

    申请号:US17432366

    申请日:2020-02-19

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent. One of the methods includes controlling the agent using a policy neural network that processes a policy input that includes (i) a current observation, (ii) a goal observation, and (iii) a selected latent plan to generate a current action output that defines an action to be performed in response to the current observation.

Patent Agency Ranking