专利检索 ipc:"G06N3/092" 第 11 页

101.

发明公开
SYSTEM AND METHOD FOR MULTI-OBJECTIVE REINFORCEMENT LEARNING WITH GRADIENT MODULATION 审中-公开

公开(公告)号：EP4270256A1

公开(公告)日：2023-11-01

申请号：EP23170087.3

申请日：2023-04-26

申请人： Royal Bank of Canada

发明人： HUANG, Hongfeng , YU, Zhuo , AZAM, Muhammad Mustajab , CHMURA, Jacob

IPC分类号： G06N3/092 , G06N3/084 , G06N3/006 , G06N7/01

摘要： Systems are methods are provided for processing multiple input objectives by a reinforcement learning agent. The method may include: instantiating a reinforcement learning agent that maintains a reinforcement learning neural network and generates, according to outputs of the reinforcement learning neural network, signals for communicating task requests; receiving a plurality of input data representing a plurality of user objectives associated with a task request and a plurality of weights; generating a plurality of preferences based on the plurality of user objectives and the plurality of weights; computing a plurality of loss values; computing a plurality of first gradients based on the plurality of loss values; for a plurality of pairs of references, computing a plurality of similarity metrics; computing an updated gradient based on the first gradients and the plurality of similarity metrics; and updating the reinforcement learning neural network based on the updated gradient.

102.

发明公开
RADIO FREQUENCY SYSTEM INCLUDING RECOMMENDATION TRAINING AGENT FOR MACHINE LEARNING ALGORITHM AND RELATED METHODS 审中-公开

公开(公告)号：EP4250184A1

公开(公告)日：2023-09-27

申请号：EP23161940.4

申请日：2023-03-15

申请人： L3Harris Technologies, Inc.

发明人： Haddadin, Osama S. , Hedzic, Armin , NELSON, Andrew , Terry, Boston C. , Thorup, Seth J.

IPC分类号： G06N3/0985 , H04W24/02 , G06N3/006 , G06N3/0464 , G06N3/092 , G06N5/022

摘要： A radio frequency (RF) system may include at least one RF sensor in an RF environment and at least one RF actuator. The RF system may also include at least one processor that includes a machine learning agent configured to use a machine learning algorithm to generate an RF model to operate the at least one RF actuator based upon the at least one RF sensor. The processor may also include a recommendation training agent configured to generate performance data from the machine learning agent, and change the RF environment based upon the performance data so that the machine learning agent updates the machine learning algorithm.

103.

发明授权
RECURRENT ENVIRONMENT PREDICTORS 有权

公开(公告)号：EP3523761B1

公开(公告)日：2023-09-20

申请号：EP17807935.6

申请日：2017-11-04

发明人： WIERSTRA, Daniel Pieter , MOHAMED, Shakir , CHIAPPA, Silvia , RACANIERE, Sebastien Henri Andre

IPC分类号： G06N3/092 , G06N3/0442 , G06N3/006 , G06N3/0455 , G06N3/0464 , G06N3/084

104.

发明授权
SYSTEM AND METHOD FOR ROBUST OPTIMIZATION FOR TRAJECTORY-CENTRIC MODEL-BASED REINFORCEMENT LEARNING 有权

公开(公告)号：EP3924884B1

公开(公告)日：2023-08-30

申请号：EP20838656.5

申请日：2020-12-04

发明人： JHA, Devesh , KOLARIC, Patrik , RAGHUNATHAN, Arvind , BENOSMAN, Mouhacine , ROMERES, Diego

IPC分类号： G06N3/092 , G06N7/00

105.

发明公开
APPARATUS AND METHOD OF DATA PROCESSING 审中-公开

公开(公告)号：EP4231202A1

公开(公告)日：2023-08-23

申请号：EP23157258.7

申请日：2023-02-17

申请人： Impulse Innovations Limited

发明人： Chockler, Hana , McNamee, Daniel , Lawrence, Andrew , Kleinegesse, Steven , Sipos, Maksim

IPC分类号： G06N3/045 , G06N3/0455 , G06N3/092 , G06N5/022

摘要： A data processing apparatus comprises at least one processor configured to execute an input module to receive an input dataset comprising a plurality of samples, each assigned to one of a plurality of variables, an encoder module to map the input dataset to a latent representation, a decoder module to process the latent representation and indicate a link category for each pair of variables, wherein the link category is selected from a set of categories including 'no causal link', 'causally linked' and 'unknown', and a reinforcement learning, RL, module to: (i) compare the link category for each pair of variables with the samples for the associated variables, (ii) generate a score function including an error term based on a result of the comparison, and (iii) update one or more parameters of the encoder module and decoder module based on the score function.

106.

发明公开
HAZARD EXPLORATION, ESTIMATION, AND RESPONSE SYSTEM AND METHOD 审中-公开

公开(公告)号：EP4202785A1

公开(公告)日：2023-06-28

申请号：EP22212583.3

申请日：2022-12-09

申请人： INTEL Corporation

发明人： BUERKLE, Cornelius , SCHOLL, Kay-Ulrich , GRAEFE, Ralf , PENG, Yang , PASCH, Frederik , OBORIL, Fabian , GEISSLER, Florian

IPC分类号： G06N3/092 , G06N3/008 , B25J9/16 , A62C99/00 , G06N3/044 , G06N3/045 , G06N3/0464 , G06N3/047 , G06N3/0495 , G06N3/084 , G06N5/01 , G06N20/10 , G06N20/20

摘要： Techniques are disclosed for the exploration of environments for the estimation and detection of hazards or near hazards within the environment and the mitigation of hazards therein. The exploration of the environment and mitigation of hazards therein may use one or more autonomous agents, including a hazard response robot. The estimation of the hazards may use a policy learning engine, and the hazards may be detected, and the associated risks therefrom, may be determined using a hazard estimation system.

107.

发明公开
ALGORITHM FOR MITIGATION OF IMPACT OF UPLINK/DOWNLINK BEAM MIS-MATCH 审中-公开

公开(公告)号：EP4184804A1

公开(公告)日：2023-05-24

申请号：EP22205050.2

申请日：2022-11-02

申请人： Nokia Solutions and Networks Oy

发明人： KAYA, Aliye , VISWANATHAN, Harish

IPC分类号： H04B7/08 , G06N3/092

摘要： According to an aspect, there is provided an apparatus for the performing the following. The apparatus implements, separately for at least one downlink beam, a reinforcement learning model, where a state defines which of the plurality of uplink beams belong to a priority beam set for uplink reception corresponding to a downlink beam, an action is defined as an addition of a new uplink beam to the priority beam set, a removal of an uplink beam from the priority beam set or doing nothing and a reward is calculated based on a change in uplink signal-to-noise ratio due to an action adjusted with a cost for taking the action. The apparatus calculates iteratively at least one optimal state using at least one reinforcement learning model based on uplink signal-to-noise ratio statistics and on the plurality of optimal downlink beams for transmission to said plurality of terminal devices.

108.

发明公开
LEARNING DEVICE, COMMUNICATION DEVICE, UNMANNED VEHICLE, WIRELESS COMMUNICATION SYSTEM, LEARNING METHOD, AND COMPUTER-READABLE STORAGE MEDIUM 审中-公开

公开(公告)号：EP4163838A1

公开(公告)日：2023-04-12

申请号：EP22196318.4

申请日：2022-09-19

申请人： Mitsubishi Heavy Industries, Ltd.

发明人： Kataoka, Yujiro , Ito, Masayuki , Matsunami, Natsuki

IPC分类号： G06N3/092 , G06N3/0985

摘要： A learning device includes a setting unit configured to set a first value for a parameter of a communication device controlled by a computer using a learned model; a reinforcement learning unit configured to allow a learning model to learn; a model extraction unit configured to extract, as a learned model, the learning model; a model evaluation unit configured to determine whether performance of the learned model has reached first requirement; an updating unit configured to update the first value to a second value when the performance is determined to have reached the first requirement; and a model selection unit. The model evaluation unit determines whether the performance of the learned model updated to the second value satisfies second requirement. When the performance of the learned model updated to the second value is determined to satisfy the second requirement, the model selection unit selects that learned model.

109.

发明公开
METHOD AND APPARATUS FOR TRAINING A MODEL, AND METHOD AND APPARATUS FOR PREDICTING A TRAJECTORY 审中-公开

公开(公告)号：EP4134878A2

公开(公告)日：2023-02-15

申请号：EP22216178.8

申请日：2022-12-22

申请人： Apollo Intelligent Driving Technology (Beijing) Co., Ltd.

发明人： ZHENG, Xinyue , LIU, Changchun , ZHU, Zhenguang , SUN, Hao

IPC分类号： G06N3/092 , G06N3/006 , G06N3/04

摘要： A method and an apparatus of training a model, and a method and an apparatus of predicting a trajectory, which relate to a field of artificial intelligence technology, in particular to fields of deep learning, autonomous driving and intelligent transportation technologies. The method includes: adjusting a model parameter of a to-be-trained model for an n th round according to a first action selection strategy, so as to obtain an intermediate network model, where n=1, ... N, and N is an integer greater than 1; performing, by using the intermediate network model, at least one trajectory prediction action indicated by the first action selection strategy, so as to obtain a trajectory prediction result, where the at least one trajectory prediction action is based on training sample data; determining a second action selection strategy according to the trajectory prediction result and the first action selection strategy; and adjusting the model parameter of the to-be-trained model for an (n+1) th round according to the second action selection strategy.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类