CONTROLLER FOR AUTONOMOUS AGENTS USING REINFORCEMENT LEARNING WITH CONTROL BARRIER FUNCTIONS TO OVERCOME INACCURATE SAFETY REGION

Invention Application

WO2023034028A1 CONTROLLER FOR AUTONOMOUS AGENTS USING REINFORCEMENT LEARNING WITH CONTROL BARRIER FUNCTIONS TO OVERCOME INACCURATE SAFETY REGION 审中-公开

Please log in to see more content

Patent Title: CONTROLLER FOR AUTONOMOUS AGENTS USING REINFORCEMENT LEARNING WITH CONTROL BARRIER FUNCTIONS TO OVERCOME INACCURATE SAFETY REGION
Application No.: PCT/US2022/040687

Application Date: 2022-08-18
Publication No.: WO2023034028A1

Publication Date: 2023-03-09
Inventor: AKROTIRIANAKIS, Ioannis , DEY, Biswadip , CHAKRABORTY, Amit
Applicant: SIEMENS AKTIENGESELLSCHAFT , SIEMENS CORPORATION
Applicant Address: Werner-von-Siemens-Straße 1; 170 Wood Avenue South
Assignee: SIEMENS AKTIENGESELLSCHAFT,SIEMENS CORPORATION
Current Assignee: SIEMENS AKTIENGESELLSCHAFT,SIEMENS CORPORATION
Current Assignee Address: Werner-von-Siemens-Straße 1; 170 Wood Avenue South
Agency: VENEZIA, Anthony L.
Priority: US17/462,648 2021-08-31
Main IPC: G06N5/00
IPC: G06N5/00 ; G06N20/10 ; G05D1/00 ; G06N3/00

CONTROLLER FOR AUTONOMOUS AGENTS USING REINFORCEMENT LEARNING WITH CONTROL BARRIER FUNCTIONS TO OVERCOME INACCURATE SAFETY REGION

Abstract:

System and method are disclosed for approximating unknown safety constraints during reinforcement learning of an autonomous agent. A controller for directing the autonomous agent includes a reinforcement learning (RL) algorithm configured to define a policy for behavior of the autonomous agent, and a control barrier function (CBF) algorithm configured to calculate a corrected policy that relocates policy states to an edge of a safety region. Iterations of the RL algorithm safely learn an optimal policy where exploration remains within the safety region. CBF algorithm uses standard least squares to derive estimates of coefficients for linear constraints of the safe region. This overcomes inaccurate estimation of safety region constraints caused by one or more noisy observations of constraints received by sensors.

Information query

Global Dossier Patent Scope Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N5/00	利用基于知识的模式的计算机系统