SYSTEM AND METHOD FOR MULTI-OBJECTIVE REINFORCEMENT LEARNING WITH GRADIENT MODULATION

    公开(公告)号:EP4270256A1

    公开(公告)日:2023-11-01

    申请号:EP23170087.3

    申请日:2023-04-26

    摘要: Systems are methods are provided for processing multiple input objectives by a reinforcement learning agent. The method may include: instantiating a reinforcement learning agent that maintains a reinforcement learning neural network and generates, according to outputs of the reinforcement learning neural network, signals for communicating task requests; receiving a plurality of input data representing a plurality of user objectives associated with a task request and a plurality of weights; generating a plurality of preferences based on the plurality of user objectives and the plurality of weights; computing a plurality of loss values; computing a plurality of first gradients based on the plurality of loss values; for a plurality of pairs of references, computing a plurality of similarity metrics; computing an updated gradient based on the first gradients and the plurality of similarity metrics; and updating the reinforcement learning neural network based on the updated gradient.

    APPARATUS AND METHOD OF DATA PROCESSING
    105.
    发明公开

    公开(公告)号:EP4231202A1

    公开(公告)日:2023-08-23

    申请号:EP23157258.7

    申请日:2023-02-17

    摘要: A data processing apparatus comprises at least one processor configured to execute an input module to receive an input dataset comprising a plurality of samples, each assigned to one of a plurality of variables, an encoder module to map the input dataset to a latent representation, a decoder module to process the latent representation and indicate a link category for each pair of variables, wherein the link category is selected from a set of categories including 'no causal link', 'causally linked' and 'unknown', and a reinforcement learning, RL, module to: (i) compare the link category for each pair of variables with the samples for the associated variables, (ii) generate a score function including an error term based on a result of the comparison, and (iii) update one or more parameters of the encoder module and decoder module based on the score function.

    ALGORITHM FOR MITIGATION OF IMPACT OF UPLINK/DOWNLINK BEAM MIS-MATCH

    公开(公告)号:EP4184804A1

    公开(公告)日:2023-05-24

    申请号:EP22205050.2

    申请日:2022-11-02

    IPC分类号: H04B7/08 G06N3/092

    摘要: According to an aspect, there is provided an apparatus for the performing the following. The apparatus implements, separately for at least one downlink beam, a reinforcement learning model, where a state defines which of the plurality of uplink beams belong to a priority beam set for uplink reception corresponding to a downlink beam, an action is defined as an addition of a new uplink beam to the priority beam set, a removal of an uplink beam from the priority beam set or doing nothing and a reward is calculated based on a change in uplink signal-to-noise ratio due to an action adjusted with a cost for taking the action. The apparatus calculates iteratively at least one optimal state using at least one reinforcement learning model based on uplink signal-to-noise ratio statistics and on the plurality of optimal downlink beams for transmission to said plurality of terminal devices.

    LEARNING DEVICE, COMMUNICATION DEVICE, UNMANNED VEHICLE, WIRELESS COMMUNICATION SYSTEM, LEARNING METHOD, AND COMPUTER-READABLE STORAGE MEDIUM

    公开(公告)号:EP4163838A1

    公开(公告)日:2023-04-12

    申请号:EP22196318.4

    申请日:2022-09-19

    IPC分类号: G06N3/092 G06N3/0985

    摘要: A learning device includes a setting unit configured to set a first value for a parameter of a communication device controlled by a computer using a learned model; a reinforcement learning unit configured to allow a learning model to learn; a model extraction unit configured to extract, as a learned model, the learning model; a model evaluation unit configured to determine whether performance of the learned model has reached first requirement; an updating unit configured to update the first value to a second value when the performance is determined to have reached the first requirement; and a model selection unit. The model evaluation unit determines whether the performance of the learned model updated to the second value satisfies second requirement. When the performance of the learned model updated to the second value is determined to satisfy the second requirement, the model selection unit selects that learned model.

    METHOD AND APPARATUS FOR TRAINING A MODEL, AND METHOD AND APPARATUS FOR PREDICTING A TRAJECTORY

    公开(公告)号:EP4134878A2

    公开(公告)日:2023-02-15

    申请号:EP22216178.8

    申请日:2022-12-22

    IPC分类号: G06N3/092 G06N3/006 G06N3/04

    摘要: A method and an apparatus of training a model, and a method and an apparatus of predicting a trajectory, which relate to a field of artificial intelligence technology, in particular to fields of deep learning, autonomous driving and intelligent transportation technologies. The method includes: adjusting a model parameter of a to-be-trained model for an n th round according to a first action selection strategy, so as to obtain an intermediate network model, where n=1, ... N, and N is an integer greater than 1; performing, by using the intermediate network model, at least one trajectory prediction action indicated by the first action selection strategy, so as to obtain a trajectory prediction result, where the at least one trajectory prediction action is based on training sample data; determining a second action selection strategy according to the trajectory prediction result and the first action selection strategy; and adjusting the model parameter of the to-be-trained model for an (n+1) th round according to the second action selection strategy.