METHOD AND SYSTEM FOR TRAINING REINFORCEMENT LEARNING AGENT USING ADVERSARIAL SAMPLING

    公开(公告)号:US20210004647A1

    公开(公告)日:2021-01-07

    申请号:US16920598

    申请日:2020-07-03

    摘要: Methods and systems of training RL agent for autonomous operation of a vehicle are described. The RL agent is trained using uniformly sampled training samples and learning a policy. After the RL agent has achieved a predetermined performance goal, data is collected including a sequence of sampled states, and for each sequence of sampled states, agent parameters, and an indication of failure of the RL agent for the sequence. A failure predictor is trained, using samples from the collected data, to predict a probability of failure of the RL agent for a given sequence of states. Sequences of states are collected by simulating interaction of the vehicle with the environment. Based on a probability of failure outputted by the failure predictor, a sequence of states is selected. The RL agent is further trained based on the selected sequence of states.

    MACHINE-LEARNING BASED SYSTEM FOR PATH AND/OR MOTION PLANNING AND METHOD OF TRAINING THE SAME

    公开(公告)号:US20210276598A1

    公开(公告)日:2021-09-09

    申请号:US16810552

    申请日:2020-03-05

    IPC分类号: B60W60/00 B60W30/09 G06K9/00

    摘要: A system and method for path and/or motion planning and for training such a system are described. In one aspect, the method comprises generating a sequence of predicted occupancy grid maps (OGMs) for T-T1 time steps based on a sequence of OGMs for 0-T1 time steps, a reference map of an environment in which an autonomous vehicle is operating, and a trajectory. A cost volume is generated for the sequence of predicted OGMs. The cost volume comprises a plurality of cost maps for T-T1 time steps. Each cost map corresponds to a predicted OGM in the sequence of predicted OGMs and has the same dimensions as the corresponding predicted OGM. Each cost map comprises a plurality of cells.
    Each cell in the cost map represents a cost of the cell in corresponding predicted OGM being occupied in accordance with a policy defined by a policy function.