GATED LINEAR CONTEXTUAL BANDITS
    1.
    发明申请

    公开(公告)号:US20230079338A1

    公开(公告)日:2023-03-16

    申请号:US17766854

    申请日:2020-10-08

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for training a neural network to control a real-world agent interacting with a real-world environment to cause the real-world agent to perform a particular task. One of the methods includes training the neural network to determine first values of the parameters by optimizing a first task-specific objective that measures a performance of the policy neural network in controlling a simulated version of the real-world agent; obtaining real-world data generated from interactions of the real-world agent with the real-world environment; and training the neural network to determine trained values of the parameters from the first values of the parameters by jointly optimizing (i) a self-supervised objective that measures at least a performance of internal representations generated by the neural network on a self-supervised task performed on the real-world data and (ii) a second task-specific objective.

    DETERMINING STATIONARY POINTS OF A LOSS FUNCTION USING CLIPPED AND UNBIASED GRADIENTS

    公开(公告)号:US20240256861A1

    公开(公告)日:2024-08-01

    申请号:US18424545

    申请日:2024-01-26

    CPC classification number: G06N3/08

    Abstract: A method of optimizing a loss function defined by one or more numerical parameters is provided. The method comprises determining initial values of the parameters, and performing a plurality of training iterations. Each training iteration except the first comprises (i) determining a gradient of the loss function associated with the parameters, (ii) obtaining a clipped value generated in a previous training iteration, (iii) additively combining the gradient and the clipped value to generate a modified gradient, (iv) processing, using a clipping function based on a threshold value, the modified gradient to generate a clipped gradient, (v) updating the value of the one or more parameters based on the clipped gradient, and (vi) storing, as the clipped value for use in a next training iteration, a difference between the modified gradient and the clipped gradient.

    EVALUATING REPRESENTATIONS WITH READ-OUT MODEL SWITCHING

    公开(公告)号:US20240119302A1

    公开(公告)日:2024-04-11

    申请号:US18475972

    申请日:2023-09-27

    CPC classification number: G06N3/092

    Abstract: A method of automatically selecting a neural network from a plurality of computer-implemented candidate neural networks, each candidate neural network comprising at least an encoder neural network trained to encode an input value as a latent representation. The method comprises: obtaining a sequence of data items, each of the data items comprising an input value and a target value; and determining a respective score for each of the candidate neural networks, comprising evaluating the encoder neural network of the candidate neural network using a plurality of read-out heads. Each read-out head comprises parameters for predicting a target value from a latent representation of an input value of a data item encoded using the encoder neural network of the candidate neural network. The method further comprises selecting the neural network from the plurality of candidate neural networks using the respective scores.

Patent Agency Ranking