Direct inverse reinforcement learning with density ratio estimation
Abstract:
A method of inverse reinforcement learning for estimating reward and value functions of behaviors of a subject includes: acquiring data representing changes in state variables that define the behaviors of the subject; applying a modified Bellman equation given by Eq. (1) to the acquired data: r ⁡ ( x ) + γ ⁢ ⁢ V ⁡ ( y ) - V ⁡ ( x ) = ⁢ ln ⁢ ⁢ π ⁡ ( y | x ) b ⁡ ( y | x ) , ⁢ ( 1 ) = ⁢ ln ⁢ ⁢ π ⁡ ( x , y ) b ⁡ ( x , y ) - ln ⁢ ⁢ π ⁡ ( x ) b ⁡ ( x ) ,                                                ⁢ ( 2 ) where r(x) and V(x) denote a reward function and a value function, respectively, at state x, and γ represents a discount factor, and b(y|x) and π(y|x) denote state transition probabilities before and after learning, respectively; estimating a logarithm of the density ratio π(x)/b(x) in Eq. (2); estimating r(x) and V(x) in Eq. (2) from the result of estimating a log of the density ratio π(x,y)/b(x,y); and outputting the estimated r(x) and V(x).
Information query
Patent Agency Ranking
0/0