GATED LINEAR CONTEXTUAL BANDITS
    1.
    发明申请

    公开(公告)号:US20230079338A1

    公开(公告)日:2023-03-16

    申请号:US17766854

    申请日:2020-10-08

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for training a neural network to control a real-world agent interacting with a real-world environment to cause the real-world agent to perform a particular task. One of the methods includes training the neural network to determine first values of the parameters by optimizing a first task-specific objective that measures a performance of the policy neural network in controlling a simulated version of the real-world agent; obtaining real-world data generated from interactions of the real-world agent with the real-world environment; and training the neural network to determine trained values of the parameters from the first values of the parameters by jointly optimizing (i) a self-supervised objective that measures at least a performance of internal representations generated by the neural network on a self-supervised task performed on the real-world data and (ii) a second task-specific objective.

Patent Agency Ranking