-
公开(公告)号:US20230237342A1
公开(公告)日:2023-07-27
申请号:US18158920
申请日:2023-01-24
Applicant: NVIDIA Corporation
Inventor: Shie Mannor , Gal Chechik , Gal Dalal , Assaf Joseph Hallak , Aviv Rosenberg
IPC: G06N3/092
CPC classification number: G06N3/092
Abstract: A method is performed by an agent operating in an environment. The method comprises computing a first value associated with each state of a number of states in the environment, determining a lookahead horizon for each state of the number of states in the environment based on the computed first value for each state of the number of states, applying a first policy to compute a second value associated with each state of at least one state in the number of states in the environment for the at least one state in the number of states based on the determined lookahead horizons for the number of states, and determining a second policy based on the first policy and the second value for each state of the number of states in the environment.