-
101.
公开(公告)号:EP4270256A1
公开(公告)日:2023-11-01
申请号:EP23170087.3
申请日:2023-04-26
申请人: Royal Bank of Canada
摘要: Systems are methods are provided for processing multiple input objectives by a reinforcement learning agent. The method may include: instantiating a reinforcement learning agent that maintains a reinforcement learning neural network and generates, according to outputs of the reinforcement learning neural network, signals for communicating task requests; receiving a plurality of input data representing a plurality of user objectives associated with a task request and a plurality of weights; generating a plurality of preferences based on the plurality of user objectives and the plurality of weights; computing a plurality of loss values; computing a plurality of first gradients based on the plurality of loss values; for a plurality of pairs of references, computing a plurality of similarity metrics; computing an updated gradient based on the first gradients and the plurality of similarity metrics; and updating the reinforcement learning neural network based on the updated gradient.
-
102.
公开(公告)号:EP4250184A1
公开(公告)日:2023-09-27
申请号:EP23161940.4
申请日:2023-03-15
IPC分类号: G06N3/0985 , H04W24/02 , G06N3/006 , G06N3/0464 , G06N3/092 , G06N5/022
摘要: A radio frequency (RF) system may include at least one RF sensor in an RF environment and at least one RF actuator. The RF system may also include at least one processor that includes a machine learning agent configured to use a machine learning algorithm to generate an RF model to operate the at least one RF actuator based upon the at least one RF sensor. The processor may also include a recommendation training agent configured to generate performance data from the machine learning agent, and change the RF environment based upon the performance data so that the machine learning agent updates the machine learning algorithm.
-
公开(公告)号:EP3523761B1
公开(公告)日:2023-09-20
申请号:EP17807935.6
申请日:2017-11-04
IPC分类号: G06N3/092 , G06N3/0442 , G06N3/006 , G06N3/0455 , G06N3/0464 , G06N3/084
-
104.
公开(公告)号:EP3924884B1
公开(公告)日:2023-08-30
申请号:EP20838656.5
申请日:2020-12-04
-
公开(公告)号:EP4231202A1
公开(公告)日:2023-08-23
申请号:EP23157258.7
申请日:2023-02-17
IPC分类号: G06N3/045 , G06N3/0455 , G06N3/092 , G06N5/022
摘要: A data processing apparatus comprises at least one processor configured to execute an input module to receive an input dataset comprising a plurality of samples, each assigned to one of a plurality of variables, an encoder module to map the input dataset to a latent representation, a decoder module to process the latent representation and indicate a link category for each pair of variables, wherein the link category is selected from a set of categories including 'no causal link', 'causally linked' and 'unknown', and a reinforcement learning, RL, module to: (i) compare the link category for each pair of variables with the samples for the associated variables, (ii) generate a score function including an error term based on a result of the comparison, and (iii) update one or more parameters of the encoder module and decoder module based on the score function.
-
公开(公告)号:EP4202785A1
公开(公告)日:2023-06-28
申请号:EP22212583.3
申请日:2022-12-09
申请人: INTEL Corporation
发明人: BUERKLE, Cornelius , SCHOLL, Kay-Ulrich , GRAEFE, Ralf , PENG, Yang , PASCH, Frederik , OBORIL, Fabian , GEISSLER, Florian
IPC分类号: G06N3/092 , G06N3/008 , B25J9/16 , A62C99/00 , G06N3/044 , G06N3/045 , G06N3/0464 , G06N3/047 , G06N3/0495 , G06N3/084 , G06N5/01 , G06N20/10 , G06N20/20
摘要: Techniques are disclosed for the exploration of environments for the estimation and detection of hazards or near hazards within the environment and the mitigation of hazards therein. The exploration of the environment and mitigation of hazards therein may use one or more autonomous agents, including a hazard response robot. The estimation of the hazards may use a policy learning engine, and the hazards may be detected, and the associated risks therefrom, may be determined using a hazard estimation system.
-
公开(公告)号:EP4184804A1
公开(公告)日:2023-05-24
申请号:EP22205050.2
申请日:2022-11-02
发明人: KAYA, Aliye , VISWANATHAN, Harish
摘要: According to an aspect, there is provided an apparatus for the performing the following. The apparatus implements, separately for at least one downlink beam, a reinforcement learning model, where a state defines which of the plurality of uplink beams belong to a priority beam set for uplink reception corresponding to a downlink beam, an action is defined as an addition of a new uplink beam to the priority beam set, a removal of an uplink beam from the priority beam set or doing nothing and a reward is calculated based on a change in uplink signal-to-noise ratio due to an action adjusted with a cost for taking the action. The apparatus calculates iteratively at least one optimal state using at least one reinforcement learning model based on uplink signal-to-noise ratio statistics and on the plurality of optimal downlink beams for transmission to said plurality of terminal devices.
-
108.
公开(公告)号:EP4163838A1
公开(公告)日:2023-04-12
申请号:EP22196318.4
申请日:2022-09-19
IPC分类号: G06N3/092 , G06N3/0985
摘要: A learning device includes a setting unit configured to set a first value for a parameter of a communication device controlled by a computer using a learned model; a reinforcement learning unit configured to allow a learning model to learn; a model extraction unit configured to extract, as a learned model, the learning model; a model evaluation unit configured to determine whether performance of the learned model has reached first requirement; an updating unit configured to update the first value to a second value when the performance is determined to have reached the first requirement; and a model selection unit. The model evaluation unit determines whether the performance of the learned model updated to the second value satisfies second requirement. When the performance of the learned model updated to the second value is determined to satisfy the second requirement, the model selection unit selects that learned model.
-
109.
公开(公告)号:EP4134878A2
公开(公告)日:2023-02-15
申请号:EP22216178.8
申请日:2022-12-22
发明人: ZHENG, Xinyue , LIU, Changchun , ZHU, Zhenguang , SUN, Hao
摘要: A method and an apparatus of training a model, and a method and an apparatus of predicting a trajectory, which relate to a field of artificial intelligence technology, in particular to fields of deep learning, autonomous driving and intelligent transportation technologies. The method includes: adjusting a model parameter of a to-be-trained model for an n th round according to a first action selection strategy, so as to obtain an intermediate network model, where n=1, ... N, and N is an integer greater than 1; performing, by using the intermediate network model, at least one trajectory prediction action indicated by the first action selection strategy, so as to obtain a trajectory prediction result, where the at least one trajectory prediction action is based on training sample data; determining a second action selection strategy according to the trajectory prediction result and the first action selection strategy; and adjusting the model parameter of the to-be-trained model for an (n+1) th round according to the second action selection strategy.
-
-
-
-
-
-
-
-