-
公开(公告)号:US11580445B2
公开(公告)日:2023-02-14
申请号:US16653890
申请日:2019-10-15
Applicant: salesforce.com, inc.
Inventor: Hao Liu , Richard Socher , Caiming Xiong
Abstract: Systems and methods are provided for efficient off-policy credit assignment (ECA) in reinforcement learning. ECA allows principled credit assignment for off-policy samples, and therefore improves sample efficiency and asymptotic performance. One aspect of ECA is to formulate the optimization of expected return as approximate inference, where policy is approximating a learned prior distribution, which leads to a principled way of utilizing off-policy samples. Other features are also provided.