- Patent Title: Direct inverse reinforcement learning with density ratio estimation
-
Application No.: US15425924Application Date: 2017-02-06
-
Publication No.: US10896383B2Publication Date: 2021-01-19
- Inventor: Eiji Uchibe , Kenji Doya
- Applicant: Okinawa Institute of Science and Technology School Corporation
- Applicant Address: JP Okinawa
- Assignee: Okinawa Institute of Science and Technology School Corporation
- Current Assignee: Okinawa Institute of Science and Technology School Corporation
- Current Assignee Address: JP Okinawa
- Agency: Westerman, Hattori, Daniels & Adrian, LLP
- Main IPC: G06N20/00
- IPC: G06N20/00 ; G06N7/00 ; G06K9/62

Abstract:
A method of inverse reinforcement learning for estimating reward and value functions of behaviors of a subject includes: acquiring data representing changes in state variables that define the behaviors of the subject; applying a modified Bellman equation given by Eq. (1) to the acquired data: r ( x ) + γ V ( y ) - V ( x ) = ln π ( y | x ) b ( y | x ) , ( 1 ) = ln π ( x , y ) b ( x , y ) - ln π ( x ) b ( x ) , ( 2 ) where r(x) and V(x) denote a reward function and a value function, respectively, at state x, and γ represents a discount factor, and b(y|x) and π(y|x) denote state transition probabilities before and after learning, respectively; estimating a logarithm of the density ratio π(x)/b(x) in Eq. (2); estimating r(x) and V(x) in Eq. (2) from the result of estimating a log of the density ratio π(x,y)/b(x,y); and outputting the estimated r(x) and V(x).
Public/Granted literature
- US20170147949A1 DIRECT INVERSE REINFORCEMENT LEARNING WITH DENSITY RATIO ESTIMATION Public/Granted day:2017-05-25
Information query