-
公开(公告)号:US12165020B2
公开(公告)日:2024-12-10
申请号:US17139561
申请日:2020-12-31
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Di Wu , Jikun Kang , Hang Li , Xi Chen , Yi Tian Xu , Dmitriy Rivkin , Taeseop Lee , Intaik Park , Michael Jenkin , Xue Liu , Gregory Lewis Dudek
Abstract: Rapid and data-efficient training of an artificial intelligence (AI) algorithm are disclosed. Ground truth data are not available and a policy must be learned based on limited interactions with a system. A policy bank is used to explore different policies on a target system with shallow probing. A target policy is chosen by comparing a good policy from the shallow probing with a base target policy which has evolved over other learning experiences. The target policy then interacts with the target system and a replay buffer is built up. The base target policy is then updated using gradients found with respect to the transition experience stored in the replay buffer. The base target policy is quickly learned and is robust for application to new, unseen, systems.