Patent search ap:("Adobe Inc.") AND inv:"Yash Chandak" Page 1

1.

发明申请
GENERATING AND PROVIDING PROPOSED DIGITAL ACTIONS IN HIGH-DIMENSIONAL ACTION SPACES USING REINFORCEMENT LEARNING MODELS 审中-公开

公开(公告)号：US20200241878A1

公开(公告)日：2020-07-30

申请号：US16261092

申请日：2019-01-29

Applicant: Adobe Inc.

Inventor： Yash Chandak , Georgios Theocharous

IPC: G06F9/38 , G06F9/48 , G06N3/08 , G06N20/00

Abstract: The present disclosure relates to generating proposed digital actions in high-dimensional action spaces for client devices utilizing reinforcement learning models. For example, the disclosed systems can utilize a supervised machine learning model to train a latent representation decoder to determine proposed digital actions based on latent representations. Additionally, the disclosed systems can utilize a latent representation policy gradient model to train a state-based latent representation generation policy to generate latent representations based on the current state of client devices. Subsequently, the disclosed systems can identify the current state of a client device and a plurality of available actions, utilize the state-based latent representation generation policy to generate a latent representation based on the current state, and utilize the latent representation decoder to determine a proposed digital action from the plurality of available actions by analyzing the latent representation.

2.

发明申请
FORECASTING AND LEARNING ACCURATE AND EFFICIENT TARGET POLICY PARAMETERS FOR DYNAMIC PROCESSES IN NON-STATIONARY ENVIRONMENTS 有权

公开(公告)号：US20220121968A1

公开(公告)日：2022-04-21

申请号：US17072868

申请日：2020-10-16

Applicant: Adobe Inc.

Inventor： Yash Chandak , Georgios Theocharous , Sridhar Mahadevan

IPC: G06N5/04 , G06Q10/06 , G06Q10/10

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that determine target policy parameters that enable target policies to provide improved future performance, even in circumstances where the underlying environment is non-stationary. For example, in one or more embodiments, the disclosed systems utilize counter-factual reasoning to estimate what the performance of the target policy would have been if implemented during past episodes of action-selection. Based on the estimates, the disclosed systems forecast a performance of the target policy for one or more future decision episodes. In some implementations, the disclosed systems further determine a performance gradient for the forecasted performance with respect to varying a target policy parameter for the target policy. In some cases, the disclosed systems use the performance gradient to efficiently modify the target policy parameter, without undergoing the computational expense of expressly modeling variations in underlying environmental functions.

3.

发明授权
Generating and providing proposed digital actions in high-dimensional action spaces using reinforcement learning models 有权

公开(公告)号：US12288074B2

公开(公告)日：2025-04-29

申请号：US16261092

申请日：2019-01-29

Applicant: Adobe Inc.

Inventor： Yash Chandak , Georgios Theocharous

IPC: G06F9/38 , G06F9/48 , G06N3/08 , G06N20/00

Abstract: The present disclosure relates to generating proposed digital actions in high-dimensional action spaces for client devices utilizing reinforcement learning models. For example, the disclosed systems can utilize a supervised machine learning model to train a latent representation decoder to determine proposed digital actions based on latent representations. Additionally, the disclosed systems can utilize a latent representation policy gradient model to train a state-based latent representation generation policy to generate latent representations based on the current state of client devices. Subsequently, the disclosed systems can identify the current state of a client device and a plurality of available actions, utilize the state-based latent representation generation policy to generate a latent representation based on the current state, and utilize the latent representation decoder to determine a proposed digital action from the plurality of available actions by analyzing the latent representation.

4.

发明授权
Reinforcement learning with a stochastic action set 有权

公开(公告)号：US11615293B2

公开(公告)日：2023-03-28

申请号：US16578863

申请日：2019-09-23

Applicant: ADOBE INC.

Inventor： Georgios Theocharous , Yash Chandak

IPC: G06N3/04 , G06N3/08

Abstract: Systems and methods are described for a decision-making process including actions characterized by stochastic availability, provide an Markov decision process (MDP) model that includes a stochastic action set based on the decision-making process, compute a policy function for the MDP model using a policy gradient based at least in part on a function representing the stochasticity of the stochastic action set, identify a probability distribution for one or more actions available at a time period using the policy function, and select an action for the time period based on the probability distribution.

5.

发明授权
Lifelong learning with a changing action set 有权

公开(公告)号：US11501207B2

公开(公告)日：2022-11-15

申请号：US16578913

申请日：2019-09-23

Applicant: ADOBE INC.

Inventor： Georgios Theocharous , Yash Chandak

IPC: G06N20/00 , G06N7/00 , G06N5/04

Abstract: Systems and methods are described for a decision-making process that includes an increasing set of actions, compute a policy function for a Markov decision process (MDP) for the decision-making process, wherein the policy function is computed based on a state conditional function mapping states into an embedding space, an inverse dynamics function mapping state transitions into the embedding space, and an action selection function mapping the elements of the embedding space to actions, identify an additional set of actions in the increasing set of actions, update the inverse dynamics function based at least in part on the additional set of actions, update the policy function based on the updated inverse dynamics function and parameters learned during the computing the policy function, and select an action based on the updated policy function.

6.

发明申请
REINFORCEMENT LEARNING WITH A STOCHASTIC ACTION SET 有权

公开(公告)号：US20210089868A1

公开(公告)日：2021-03-25

申请号：US16578863

申请日：2019-09-23

Applicant: ADOBE INC.

Inventor： Georgios Theocharous , Yash Chandak

IPC: G06N3/04 , G06N3/08

Abstract: Systems and methods are described for a decision-making process including actions characterized by stochastic availability, provide an Markov decision process (MDP) model that includes a stochastic action set based on the decision-making process, compute a policy function for the MDP model using a policy gradient based at least in part on a function representing the stochasticity of the stochastic action set, identify a probability distribution for one or more actions available at a time period using the policy function, and select an action for the time period based on the probability distribution.

Patent Agency Ranking