Solving sparse reward tasks using self-balancing shaped rewards

    公开(公告)号:US11620572B2

    公开(公告)日:2023-04-04

    申请号:US16545279

    申请日:2019-08-20

    Abstract: Approaches for using self-balancing shaped rewards include randomly selecting a start and goal state, traversing first and second trajectories for moving from the start state toward the goal state where a first terminal state of the first trajectory is closer to the goal state than a second terminal state of the second trajectory, updating rewards for the first and trajectories using a self-balancing reward function based the terminal states of the other trajectory, determining a gradient for the goal-oriented task module, and updating one or more parameters of the goal-oriented task module based on the gradient. The second trajectory contributes to the determination of the gradient and the first trajectory contributes to the determination of the gradient when the first terminal state is within a first threshold distance of the second terminal state or the first terminal state is within a second threshold distance of the goal state.

Patent Agency Ranking