FORECASTING AND LEARNING ACCURATE AND EFFICIENT TARGET POLICY PARAMETERS FOR DYNAMIC PROCESSES IN NON-STATIONARY ENVIRONMENTS

Invention Application

US20220121968A1 FORECASTING AND LEARNING ACCURATE AND EFFICIENT TARGET POLICY PARAMETERS FOR DYNAMIC PROCESSES IN NON-STATIONARY ENVIRONMENTS 有权

Please log in to see more content

Patent Title: FORECASTING AND LEARNING ACCURATE AND EFFICIENT TARGET POLICY PARAMETERS FOR DYNAMIC PROCESSES IN NON-STATIONARY ENVIRONMENTS
Application No.: US17072868

Application Date: 2020-10-16
Publication No.: US20220121968A1

Publication Date: 2022-04-21
Inventor: Yash Chandak , Georgios Theocharous , Sridhar Mahadevan
Applicant: Adobe Inc.
Applicant Address: US CA San Jose
Assignee: Adobe Inc.
Current Assignee: Adobe Inc.
Current Assignee Address: US CA San Jose
Main IPC: G06N5/04
IPC: G06N5/04 ; G06Q10/06 ; G06Q10/10

FORECASTING AND LEARNING ACCURATE AND EFFICIENT TARGET POLICY PARAMETERS FOR DYNAMIC PROCESSES IN NON-STATIONARY ENVIRONMENTS

Abstract:

The present disclosure relates to systems, methods, and non-transitory computer-readable media that determine target policy parameters that enable target policies to provide improved future performance, even in circumstances where the underlying environment is non-stationary. For example, in one or more embodiments, the disclosed systems utilize counter-factual reasoning to estimate what the performance of the target policy would have been if implemented during past episodes of action-selection. Based on the estimates, the disclosed systems forecast a performance of the target policy for one or more future decision episodes. In some implementations, the disclosed systems further determine a performance gradient for the forecasted performance with respect to varying a target policy parameter for the target policy. In some cases, the disclosed systems use the performance gradient to efficiently modify the target policy parameter, without undergoing the computational expense of expressly modeling variations in underlying environmental functions.

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N5/00	利用基于知识的模式的计算机系统
G06N5/04	.推理方法或设备