Invention Application
- Patent Title: FORECASTING AND LEARNING ACCURATE AND EFFICIENT TARGET POLICY PARAMETERS FOR DYNAMIC PROCESSES IN NON-STATIONARY ENVIRONMENTS
-
Application No.: US17072868Application Date: 2020-10-16
-
Publication No.: US20220121968A1Publication Date: 2022-04-21
- Inventor: Yash Chandak , Georgios Theocharous , Sridhar Mahadevan
- Applicant: Adobe Inc.
- Applicant Address: US CA San Jose
- Assignee: Adobe Inc.
- Current Assignee: Adobe Inc.
- Current Assignee Address: US CA San Jose
- Main IPC: G06N5/04
- IPC: G06N5/04 ; G06Q10/06 ; G06Q10/10

Abstract:
The present disclosure relates to systems, methods, and non-transitory computer-readable media that determine target policy parameters that enable target policies to provide improved future performance, even in circumstances where the underlying environment is non-stationary. For example, in one or more embodiments, the disclosed systems utilize counter-factual reasoning to estimate what the performance of the target policy would have been if implemented during past episodes of action-selection. Based on the estimates, the disclosed systems forecast a performance of the target policy for one or more future decision episodes. In some implementations, the disclosed systems further determine a performance gradient for the forecasted performance with respect to varying a target policy parameter for the target policy. In some cases, the disclosed systems use the performance gradient to efficiently modify the target policy parameter, without undergoing the computational expense of expressly modeling variations in underlying environmental functions.
Information query