Direct inverse reinforcement learning with density ratio estimation

Invention Grant

US10896383B2 Direct inverse reinforcement learning with density ratio estimation 有权

Please log in to see more content

Patent Title: Direct inverse reinforcement learning with density ratio estimation
Application No.: US15425924

Application Date: 2017-02-06
Publication No.: US10896383B2

Publication Date: 2021-01-19
Inventor: Eiji Uchibe , Kenji Doya
Applicant: Okinawa Institute of Science and Technology School Corporation
Applicant Address: JP Okinawa
Assignee: Okinawa Institute of Science and Technology School Corporation
Current Assignee: Okinawa Institute of Science and Technology School Corporation
Current Assignee Address: JP Okinawa
Agency: Westerman, Hattori, Daniels & Adrian, LLP
Main IPC: G06N20/00
IPC: G06N20/00 ; G06N7/00 ; G06K9/62

Direct inverse reinforcement learning with density ratio estimation

Abstract:

A method of inverse reinforcement learning for estimating reward and value functions of behaviors of a subject includes: acquiring data representing changes in state variables that define the behaviors of the subject; applying a modified Bellman equation given by Eq. (1) to the acquired data: r ⁡ ( x ) + γ ⁢ ⁢ V ⁡ ( y ) - V ⁡ ( x ) = ⁢ ln ⁢ ⁢ π ⁡ ( y | x ) b ⁡ ( y | x ) , ⁢ ( 1 ) = ⁢ ln ⁢ ⁢ π ⁡ ( x , y ) b ⁡ ( x , y ) - ln ⁢ ⁢ π ⁡ ( x ) b ⁡ ( x ) , ⁢ ( 2 ) where r(x) and V(x) denote a reward function and a value function, respectively, at state x, and γ represents a discount factor, and b(y|x) and π(y|x) denote state transition probabilities before and after learning, respectively; estimating a logarithm of the density ratio π(x)/b(x) in Eq. (2); estimating r(x) and V(x) in Eq. (2) from the result of estimating a log of the density ratio π(x,y)/b(x,y); and outputting the estimated r(x) and V(x).

Public/Granted literature

US20170147949A1 DIRECT INVERSE REINFORCEMENT LEARNING WITH DENSITY RATIO ESTIMATION Public/Granted day:2017-05-25

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N20/00	机器学习