TRAINING REINFORCEMENT LEARNING AGENTS USING AUGMENTED TEMPORAL DIFFERENCE LEARNING

    公开(公告)号:US20230376780A1

    公开(公告)日:2023-11-23

    申请号:US18029979

    申请日:2021-10-01

    CPC classification number: G06N3/092 G06N3/0442

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network used to select actions performed by an agent interacting with an environment by performing actions that cause the environment to transition states. One of the methods includes maintaining a replay memory storing a plurality of transitions; selecting a plurality of transitions from the replay memory; and training the neural network on the plurality of transitions, comprising, for each transition: generating an initial Q value for the transition; determining a scaled Q value for the transition; determining a scaled temporal difference learning target for the transition; determining an error between the scaled temporal difference learning target and the scaled Q value; determining an update to the current values of the Q network parameters; and determining an update to the current value of the scaling term.

    LOW-PASS RECURRENT NEURAL NETWORK SYSTEMS WITH MEMORY

    公开(公告)号:US20190251419A1

    公开(公告)日:2019-08-15

    申请号:US16272880

    申请日:2019-02-11

    CPC classification number: G06N3/04 G06N3/08

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing and storing inputs for use in a neural network. One of the methods includes receiving input data for storage in a memory system comprising a first set of memory blocks, the memory blocks having an associated order; passing the input data to a highest ordered memory block; for each memory block for which there is a lower ordered memory block: applying a filter function to data currently stored by the memory block to generate filtered data and passing the filtered data to a lower ordered memory block; and for each memory block: combining the data currently stored in the memory block with the data passed to the memory block to generate updated data, and storing the updated data in the memory block.

    GATED ATTENTION NEURAL NETWORKS
    29.
    发明申请

    公开(公告)号:US20220366218A1

    公开(公告)日:2022-11-17

    申请号:US17763984

    申请日:2020-09-07

    Abstract: A system including an attention neural network that is configured to receive an input sequence and to process the input sequence to generate an output is described. The attention neural network includes: an attention block configured to receive a query input, a key input, and a value input that are derived from an attention block input. The attention block includes an attention neural network layer configured to: receive an attention layer input derived from the query input, the key input, and the value input, and apply an attention mechanism to the query input, the key input, and the value input to generate an attention layer output for the attention neural network layer; and a gating neural network layer configured to apply a gating mechanism to the attention block input and the attention layer output of the attention neural network layer to generate a gated attention output.

    NEURAL NETWORKS FOR SELECTING ACTIONS TO BE PERFORMED BY A ROBOTIC AGENT

    公开(公告)号:US20220355472A1

    公开(公告)日:2022-11-10

    申请号:US17872528

    申请日:2022-07-25

    Abstract: A system includes a neural network system implemented by one or more computers. The neural network system is configured to receive an observation characterizing a current state of a real-world environment being interacted with by a robotic agent to perform a robotic task and to process the observation to generate a policy output that defines an action to be performed by the robotic agent in response to the observation. The neural network system includes: (i) a sequence of deep neural networks (DNNs), in which the sequence of DNNs includes a simulation-trained DNN that has been trained on interactions of a simulated version of the robotic agent with a simulated version of the real-world environment to perform a simulated version of the robotic task, and (ii) a first robot-trained DNN that is configured to receive the observation and to process the observation to generate the policy output.

Patent Agency Ranking