-
1.
公开(公告)号:US20240281654A1
公开(公告)日:2024-08-22
申请号:US18292165
申请日:2022-08-12
Applicant: DeepMind Technologies Limited
Inventor: Scott Ellison Reed , Konrad Zolna , Emilio Parisotto , Tom Erez , Alexander Novikov , Jack William Rae , Misha Man Ray Denil , Joao Ferdinando Gomes de Freitas , Oriol Vinyals , Sergio Gomez , Ashley Deloris Edwards , Jacob Bruce , Gabriel Barth-Maron
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent to interact with an environment using an action selection neural network. In one aspect, a method comprises, at each time step in a sequence of time steps: generating a current representation of a state of a task being performed by the agent in the environment as of the current time step as a sequence of data elements; autoregressively generating a sequence of data elements representing a current action to be performed by the agent at the current time step; and after autoregressively generating the sequence of data elements representing the current action, causing the agent to perform the current action at the current time step.
-
2.
公开(公告)号:US20230061411A1
公开(公告)日:2023-03-02
申请号:US17410689
申请日:2021-08-24
Applicant: DeepMind Technologies Limited
Inventor: Tom Erez , Alexander Novikov , Emilio Parisotto , Jack William Rae , Konrad Zolna , Misha Man Ray Denil , Joao Ferdinando Gomes de Freitas , Oriol Vinyals , Scott Ellison Reed , Sergio Gomez , Ashley Deloris Edwards , Jacob Bruce , Gabriel Barth-Maron
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent to interact with an environment using an action selection neural network. In one aspect, a method comprises, at each time step in a sequence of time steps: generating a current representation of a state of a task being performed by the agent in the environment as of the current time step as a sequence of data elements; autoregressively generating a sequence of data elements representing a current action to be performed by the agent at the current time step; and after autoregressively generating the sequence of data elements representing the current action, causing the agent to perform the current action at the current time step.
-