METHODS FOR TRAINING AN ARTIFICIAL INTELLIGENT AGENT WITH CURRICULUM AND SKILLS

    公开(公告)号:US20230237370A1

    公开(公告)日:2023-07-27

    申请号:US17650295

    申请日:2022-02-08

    IPC分类号: G06N20/00

    CPC分类号: G06N20/00

    摘要: A method for training an agent uses a mixture of scenarios designed to teach specific skills helpful in a larger domain, such as mixing general racing and very specific tactical racing scenarios. Aspects of the methods can include one or more of the following: (1) training the agent to be very good at time trials by having one or more cars spread out on the track; (2) running the agent in various racing scenarios with a variable number of opponents starting in different configurations around the track; (3) varying the opponents by using game-provided agents, agents trained according to aspects of the present invention, or agents controlled to follow specific driving lines; (4) setting up specific short scenarios with opponents in various racing situations with specific success criteria; and (5) having a dynamic curriculum based on how the agent performs on a variety of evaluation scenarios.

    METHODS AND SYSTEMS TO ADAPT PID COEFFICIENTS THROUGH REINFORCEMENT LEARNING

    公开(公告)号:US20220365493A1

    公开(公告)日:2022-11-17

    申请号:US17314351

    申请日:2021-05-07

    IPC分类号: G05B13/02 G06N20/00 G05B6/02

    摘要: Systems and methods are used to adapt the coefficients of a proportional-integral-derivative (PID) controller through reinforcement learning. The approach for adapting PID coefficients can include an outer loop of reinforcement learning where the PID coefficients are tuned to changes in the environment and an inner loop of PID control for quickly reacting to changing errors. The outer loop can learn and adapt as the environment changes and be configured to only run at a predetermined frequency, after a given number of steps. The outer loop can use summary statistics about the error terms and any other information sensed about the environment to calculate an observation. This observation can be used to evaluate the next action, for example, by feeding it into a neural network representing the policy. The resulting action is the coefficients of the PID controller and the tunable parameters of things such as the filters.

    TASK PRIORITIZED EXPERIENCE REPLAY ALGORITHM FOR REINFORCEMENT LEARNING

    公开(公告)号:US20220101064A1

    公开(公告)日:2022-03-31

    申请号:US17036913

    申请日:2020-09-29

    IPC分类号: G06K9/62 G06N20/00

    摘要: A task prioritized experience replay (TaPER) algorithm enables simultaneous learning of multiple RL tasks off policy. The algorithm can prioritize samples that were part of fixed length episodes that led to the achievement of tasks. This enables the agent to quickly learn task policies by bootstrapping over its early successes. Finally, TaPER can improve performance on all tasks simultaneously, which is a desirable characteristic for multi-task RL. Unlike conventional ER algorithms that are applied to single RL task learning settings or that require rewards to be binary or abundant, or are provided as a parameterized specification of goals, TaPER poses no such restrictions and supports arbitrary reward and task specifications.