CONTROLLING AGENTS USING STATE ASSOCIATIVE LEARNING FOR LONG-TERM CREDIT ASSIGNMENT

    公开(公告)号:US20240086703A1

    公开(公告)日:2024-03-14

    申请号:US18275542

    申请日:2022-02-04

    CPC classification number: G06N3/08

    Abstract: A computer-implemented reinforcement learning neural network system that learns a model of rewards in order to relate actions by an agent in an environment to their long-term consequences. The model learns to decompose the rewards into components explainable by different past states. That is, the model learns to associate when being in a particular state of the environment is predictive of a reward in a later state, even when the later state, and reward, is only achieved after a very long time delay.

    Neural Networks with Relational Memory

    公开(公告)号:US20210081795A1

    公开(公告)日:2021-03-18

    申请号:US17107621

    申请日:2020-11-30

    Abstract: A system including one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement a memory and memory-based neural network is described. The memory is configured to store a respective memory vector at each of a plurality of memory locations in the memory. The memory-based neural network is configured to: at each of a plurality of time steps: receive an input; determine an update to the memory, wherein determining the update comprising applying an attention mechanism over the memory vectors in the memory and the received input; update the memory using the determined update to the memory; and generate an output for the current time step using the updated memory.

    Neural networks with relational memory

    公开(公告)号:US10853725B2

    公开(公告)日:2020-12-01

    申请号:US16415954

    申请日:2019-05-17

    Abstract: A system including one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement a memory and memory-based neural network is described. The memory is configured to store a respective memory vector at each of a plurality of memory locations in the memory. The memory-based neural network is configured to: at each of a plurality of time steps: receive an input; determine an update to the memory, wherein determining the update comprising applying an attention mechanism over the memory vectors in the memory and the received input; update the memory using the determined update to the memory; and generate an output for the current time step using the updated memory.

Patent Agency Ranking