-
公开(公告)号:US20210357782A1
公开(公告)日:2021-11-18
申请号:US16875741
申请日:2020-05-15
申请人: Daniel Mark GRAVES , Jun JIN , Jun LUO
发明人: Daniel Mark GRAVES , Jun JIN , Jun LUO
摘要: Methods and systems are described for support policy learning in an agent of a robot. A general value function (GVF) is learned for a main policy, where the GVF represents future performance of the agent executing the main policy for a given state of the environment. A master policy selects an action based on the predicted accumulated success value received from the general value function. When the predicted accumulated success value is an acceptable value, the action selected by the master policy is execution of the main policy. When the predicted accumulated success value is not an acceptable value, the master action causes a support policy to be learned. The support policy generates a support action to be performed which causes the robot to transition from to a new state where the predicted accumulated success value has an acceptable value.
-
公开(公告)号:US20210004006A1
公开(公告)日:2021-01-07
申请号:US16921523
申请日:2020-07-06
申请人: Daniel Mark GRAVES
发明人: Daniel Mark GRAVES
IPC分类号: G05D1/02 , B60W30/095 , B60W50/00
摘要: Methods and systems for predictive control of an autonomous vehicle are described. Predictions of lane centeredness and road angle are generated based on data collected by sensors on the autonomous vehicle and are combined to determine a state of the vehicle that are then used to generate vehicle actions for steering control and speed control of the autonomous vehicle.
-
公开(公告)号:US20200276988A1
公开(公告)日:2020-09-03
申请号:US16803386
申请日:2020-02-27
申请人: Daniel Mark GRAVES
发明人: Daniel Mark GRAVES
摘要: A method or system for controlling safety of both an ego vehicle and social objects in an environment of the ego vehicle, comprising: receiving data representative of at least one social object and determining a current state of the ego vehicle based on sensor data; predicting an ego safety value corresponding to the ego vehicle, for each possible behavior action in a set of possible behavior actions, based on the current state; predicting a social safety value corresponding to the at least one social object in the environment of the ego vehicle, based on the current state, for each possible behavior action; and selecting a next behavior action for the ego vehicle, based on the ego safety values, the social safety values, and one or more target objectives for the ego vehicle.
-
公开(公告)号:US20210387330A1
公开(公告)日:2021-12-16
申请号:US16900291
申请日:2020-06-12
摘要: A robot that includes an RL agent that is configured to learn a policy to maximize the cumulative reward of a task, to determine one or more features that are minimally correlated with each other. The features are then used as pseudo-rewards, called feature rewards, where each feature reward corresponds to an option policy, or skill, the RL agent learns to maximize. In an example, the RL agent is configured to select the most relevant features to learn respective option policies from. The RL agent is configured to, for each of the selected features, learn the respective option policy that maximizes the respective feature reward. Using the learned option policies, the RL agent is configured to learn a new (second) policy for a new (second) task that can choose from any of the learned option policies or actions available to the RL agent.
-
-
-