METHODS AND SYSTEMS FOR SUPPORT POLICY LEARNING

    公开(公告)号:US20210357782A1

    公开(公告)日:2021-11-18

    申请号:US16875741

    申请日:2020-05-15

    IPC分类号: G06N5/04 G06N20/00 G06N3/00

    摘要: Methods and systems are described for support policy learning in an agent of a robot. A general value function (GVF) is learned for a main policy, where the GVF represents future performance of the agent executing the main policy for a given state of the environment. A master policy selects an action based on the predicted accumulated success value received from the general value function. When the predicted accumulated success value is an acceptable value, the action selected by the master policy is execution of the main policy. When the predicted accumulated success value is not an acceptable value, the master action causes a support policy to be learned. The support policy generates a support action to be performed which causes the robot to transition from to a new state where the predicted accumulated success value has an acceptable value.

    METHOD AND SYSTEM FOR CONTROLLING SAFETY OF EGO AND SOCIAL OBJECTS

    公开(公告)号:US20200276988A1

    公开(公告)日:2020-09-03

    申请号:US16803386

    申请日:2020-02-27

    IPC分类号: B60W60/00 G06N3/08

    摘要: A method or system for controlling safety of both an ego vehicle and social objects in an environment of the ego vehicle, comprising: receiving data representative of at least one social object and determining a current state of the ego vehicle based on sensor data; predicting an ego safety value corresponding to the ego vehicle, for each possible behavior action in a set of possible behavior actions, based on the current state; predicting a social safety value corresponding to the at least one social object in the environment of the ego vehicle, based on the current state, for each possible behavior action; and selecting a next behavior action for the ego vehicle, based on the ego safety values, the social safety values, and one or more target objectives for the ego vehicle.

    SYSTEMS AND METHODS FOR LEARNING REUSABLE OPTIONS TO TRANSFER KNOWLEDGE BETWEEN TASKS

    公开(公告)号:US20210387330A1

    公开(公告)日:2021-12-16

    申请号:US16900291

    申请日:2020-06-12

    IPC分类号: B25J9/16 G06N3/08

    摘要: A robot that includes an RL agent that is configured to learn a policy to maximize the cumulative reward of a task, to determine one or more features that are minimally correlated with each other. The features are then used as pseudo-rewards, called feature rewards, where each feature reward corresponds to an option policy, or skill, the RL agent learns to maximize. In an example, the RL agent is configured to select the most relevant features to learn respective option policies from. The RL agent is configured to, for each of the selected features, learn the respective option policy that maximizes the respective feature reward. Using the learned option policies, the RL agent is configured to learn a new (second) policy for a new (second) task that can choose from any of the learned option policies or actions available to the RL agent.