Patent search ap:("Google LLC") AND inv:"Craig Edgar Boutilier" Page 1

1.

发明申请
Systems and Methods for Simulating a Complex Reinforcement Learning Environment 审中-公开

公开(公告)号：US20200250575A1

公开(公告)日：2020-08-06

申请号：US16288279

申请日：2019-02-28

Applicant: Google LLC

Inventor： Tze Way Eugene Ie , Sanmit Santosh Narvekar , Craig Edgar Boutilier

IPC: G06N20/00 , G06N5/04

Abstract: A computing system for simulating allocation of resources to a plurality of entities is disclosed. The computing system can be configured to input an entity profile that describes a preference and/or demand of a simulated entity into a reinforcement learning agent model and receive, as an output of the reinforcement learning agent model, an allocation output that describes a resource allocation for the simulated entity. The computing system can select one or more resources based on the resource allocation described by the allocation output and provide the resource(s) to an entity model that is configured to simulate a simulated response output that describes a response of the simulated entity. The computing system can receive, as an output of the entity model, the simulated response output and update a resource profile that describes the at least one resource and/or the entity profile based on the simulated response output.

2.

发明申请
Systems and Methods for Simulating a Complex Reinforcement Learning Environment 有权

公开(公告)号：US20230117499A1

公开(公告)日：2023-04-20

申请号：US17967595

申请日：2022-10-17

Applicant: Google LLC

Inventor： Tze Way Eugene Ie , Sanmit Santosh Narvekar , Craig Edgar Boutilier

IPC: G06N20/00 , G06N5/043

Abstract: A computing system for simulating allocation of resources to a plurality of entities is disclosed. The computing system can be configured to input an entity profile that describes a preference and/or demand of a simulated entity into a reinforcement learning agent model and receive, as an output of the reinforcement learning agent model, an allocation output that describes a resource allocation for the simulated entity. The computing system can select one or more resources based on the resource allocation described by the allocation output and provide the resource(s) to an entity model that is configured to simulate a simulated response output that describes a response of the simulated entity. The computing system can receive, as an output of the entity model, the simulated response output and update a resource profile that describes the at least one resource and/or the entity profile based on the simulated response output.

3.

发明申请
ANALYZING EMBEDDING SPACES USING LARGE LANGUAGE MODELS 有权

公开(公告)号：US20250111157A1

公开(公告)日：2025-04-03

申请号：US18900500

申请日：2024-09-27

Applicant: Google LLC

Inventor： Guy Tennenholtz , Yinlam Chow , Chih-wei Hsu , Jihwan Jeong , Lior Shani , Deepak Ramachandran , Martin Mirolyubov Mladenov , Craig Edgar Boutilier

IPC: G06F40/284 , G06F40/40 , G06N3/0455

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for analyzing embedding spaces using large language models. In one aspect, a method performed by one or more computers for analyzing a target embedding space using a neural network configured to perform a set of machine learning tasks is described. The method includes: obtaining, for each of one or more entities, a respective domain embedding representing the entity in the target embedding space; receiving a text prompt including a sequence of input tokens describing a particular machine learning task in the set to be performed on the one or more entities; preparing, for the neural network, an input sequence including each input token in the text prompt and each domain embedding; and processing the input sequence, using the neural network, to generate a sequence of output tokens describing a result of the particular machine learning task.

4.

发明授权
Systems and methods for simulating a complex reinforcement learning environment 有权

公开(公告)号：US11475355B2

公开(公告)日：2022-10-18

申请号：US16288279

申请日：2019-02-28

Applicant: Google LLC

Inventor： Tze Way Eugene Ie , Sanmit Santosh Narvekar , Craig Edgar Boutilier

IPC: G06N20/00 , G06N5/04

Abstract: A computing system for simulating allocation of resources to a plurality of entities is disclosed. The computing system can be configured to input an entity profile that describes a preference and/or demand of a simulated entity into a reinforcement learning agent model and receive, as an output of the reinforcement learning agent model, an allocation output that describes a resource allocation for the simulated entity. The computing system can select one or more resources based on the resource allocation described by the allocation output and provide the resource(s) to an entity model that is configured to simulate a simulated response output that describes a response of the simulated entity. The computing system can receive, as an output of the entity model, the simulated response output and update a resource profile that describes the at least one resource and/or the entity profile based on the simulated response output.

5.

发明申请
DETERMINING CONTROL POLICIES BY MINIMIZING THE IMPACT OF DELUSION 有权

公开(公告)号：US20210383218A1

公开(公告)日：2021-12-09

申请号：US17289514

申请日：2019-10-29

Applicant: Google LLC

Inventor： Tian Lu , Dale Eric Schuurmans , Craig Edgar Boutilier

IPC: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining a control policy for an agent interacting with an environment. One of the methods includes updating the control policy using policy-consistent backups using Q learning. To determine a policy-consistent backup, the system determining a policy-consistent backup for the control policy at the current observation—current action pair, comprising: for each of a plurality of actions in a set of possible actions that can be performed by the agent, identifying Q values assigned by the control policy to next observation—action pairs by the control policy and justified by at least one of the information sets; pruning, from the identified Q values, any Q values that are justified only by information sets that are not policy-class consistent; and determining, from the reward and only the identified Q values that were not pruned, the policy-consistent backup.

Patent Agency Ranking