Inverse reinforcement learning by density ratio estimation

Invention Grant

US10896382B2 Inverse reinforcement learning by density ratio estimation 有权

Please log in to see more content

Patent Title: Inverse reinforcement learning by density ratio estimation
Application No.: US15329690

Application Date: 2015-08-07
Publication No.: US10896382B2

Publication Date: 2021-01-19
Inventor: Eiji Uchibe , Kenji Doya
Applicant: Okinawa Institute of Science and Technology School Corporation
Applicant Address: JP Okinawa
Assignee: Okinawa Institute of Science and Technology School Corporation
Current Assignee: Okinawa Institute of Science and Technology School Corporation
Current Assignee Address: JP Okinawa
Agency: Westerman, Hattori, Daniels & Adrian, LLP
International Application: PCT/JP2015/004001 WO 20150807
International Announcement: WO2016/021210 WO 20160211
Main IPC: G06N20/00
IPC: G06N20/00 ; G06N7/00

Inverse reinforcement learning by density ratio estimation

Abstract:

A method of inverse reinforcement learning for estimating cost and value functions of behaviors of a subject includes acquiring data representing changes in state variables that define the behaviors of the subject; applying a modified Bellman equation given by Eq. (1) to the acquired data: q(x)+gV(y)−V(x)=−ln{pi(y|x))/(p(y|x)} (1) where q(x) and V(x) denote a cost function and a value function, respectively, at state x, g represents a discount factor, and p(y|x) and pi(y|x) denote state transition probabilities before and after learning, respectively; estimating a density ratio pi(y|x)/p(y|x) in Eq. (1); estimating q(x) and V(x) in Eq. (1) using the least square method in accordance with the estimated density ratio pi(y|x)/p(y|x), and outputting the estimated q(x) and V(x).

Public/Granted literature

US20170213151A1 INVERSE REINFORCEMENT LEARNING BY DENSITY RATIO ESTIMATION Public/Granted day:2017-07-27

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N20/00	机器学习