-
公开(公告)号:EP3360086B1
公开(公告)日:2024-10-23
申请号:EP16806345.1
申请日:2016-11-11
发明人: SCHAUL, Tom , QUAN, John , SILVER, David
-
公开(公告)号:EP4435674A2
公开(公告)日:2024-09-25
申请号:EP24187863.6
申请日:2024-07-10
发明人: SHI, Yixuan , LI, Wei , LIU, Jiachen , XIAO, Xinyan
摘要: A computer-implemented method for training a Text-to-Image model includes: obtaining a first Text-to-Image model and a pre-trained reward model, wherein the first Text-to-Image model is used to generate a corresponding image based on input text, and the pre-trained reward model is used to score a data pair composed of the input text and the corresponding generated image; and adjusting the parameters of the first Text-to-Image model based on the pre-trained reward model and a reinforcement learning policy to obtain a second Text-to-Image model.
-
-
4.
公开(公告)号:EP4420040A1
公开(公告)日:2024-08-28
申请号:EP22806056.2
申请日:2022-10-18
申请人: AMGEN INC.
IPC分类号: G06N3/0442 , G06N3/045 , G06N3/09 , G06N3/048 , G06N3/092
CPC分类号: G06N3/0442 , G06N3/048 , G06N3/09 , G06N3/045 , G06N3/092 , G06F40/35 , G06F40/30 , G06F40/211
-
公开(公告)号:EP4386624A3
公开(公告)日:2024-08-07
申请号:EP24173836.8
申请日:2017-11-04
发明人: VIOLA, Fabio , MIROWSKI, Piotr Wojciech , BANINO, Andrea , PASCANU, Razvan , SOYER, Hubert Josef , BALLARD, Andrew James , KUMARAN, Sudarshan , HADSELL, Raia Thais , SIFRE, Laurent , GOROSHIN, Rostislav , KAVUKCUOGLU, Koray , DENIL, Misha Man Ray
IPC分类号: G06N3/006 , G06N3/0442 , G06N3/045 , G06N3/0464 , G06N3/084 , G06N3/092
CPC分类号: G06N3/084 , G06N3/006 , G06N3/045 , G06N3/0464 , G06N3/0442 , G06N3/092
摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a loop closure prediction neural network, an intermediate output generated by the action selection policy neural network to predict whether the agent has returned to a location in the environment that the agent has already visited; and backpropagating a gradient of a loop closure based auxiliary loss into the action selection policy neural network to determine a loop closure based auxiliary update for current values of the network parameters.
-
公开(公告)号:EP4407523A1
公开(公告)日:2024-07-31
申请号:EP24151917.2
申请日:2024-01-15
申请人: Hitachi, Ltd.
IPC分类号: G06N3/092 , G06N5/045 , G06N3/0442 , G06N3/0455 , G06N3/0464 , G06N3/0499 , G06N7/01 , G06N20/10 , G05B23/02 , G06N3/047 , G06N5/01
CPC分类号: G06N3/0464 , G06N20/10 , G06N3/0499 , G06N3/0455 , G06N7/01 , G06N3/0442 , G06N3/092 , G06N3/047 , G06N5/01 , G06N5/045 , G06N3/006 , G05B23/024
摘要: A method for predictive maintenance of equipment. The method may include receiving expected future return value as input to a decision maker model, wherein the decision maker model is a machine learning model that predicts maintenance action associated with the equipment; feeding recent observations and recent actions from environment as inputs to the decision maker model; generating a next action as model outputs of the decision maker model, wherein the next action is the predicted maintenance action; and executing the next action in the environment.
-
公开(公告)号:EP4407521A1
公开(公告)日:2024-07-31
申请号:EP23153925.5
申请日:2023-01-30
发明人: RAO KRUPASHANKAR RAO, Raghavendra , KAKATHKAR, Varsha , VINCHU SUBRAMANIAN, Balakrishnan , GOYAL, Shivani , MANDHAN, Sunil , SISODIA, Rajendra Singh
IPC分类号: G06N3/0475 , G06N3/092 , G06N3/094 , G06N20/00
CPC分类号: G06N3/0475 , G06N3/094 , G06N3/092 , G06N20/00 , G16H10/60 , G16H40/67 , G16H50/20 , G16H50/70
摘要: The present disclosure relates to methods and systems for training machine learning models in a data environment where the training data is subject to deletion and/or quarantine. In particular applications, the training data may include medical information for which an individual may revoke consent to use and/or access at any time. As described herein, the methods and systems involve: receiving a consent revocation notice, wherein the notice identifies a first subset of data from a comprehensive dataset; estimating an impact of the first subset of data on the machine learning model; generating a synthetic dataset corresponding to the first subset of data; generating a training dataset comprising the comprehensive dataset and the synthetic dataset; and training the machine learning model on the training dataset.
-
8.
公开(公告)号:EP4376016A1
公开(公告)日:2024-05-29
申请号:EP23206119.2
申请日:2023-10-26
申请人: Diabeloop
发明人: LOUIS, Maxime
IPC分类号: G16H20/17 , G16H40/63 , G16H40/67 , G16H50/70 , G06N20/00 , G06N3/084 , G06N3/092 , G06N3/04
CPC分类号: G16H20/17 , G16H40/60 , G16H40/63 , G16H40/67 , G16H50/70 , G06N3/092 , G06N3/084 , G06N3/04
摘要: Method for determining a recommendation value of a control parameter of a fluid infusion device (20). The method being implemented by a control device (30) and comprising the steps of retrieving user data (40), feeding a deep reinforcement learning network (42), outputting a deep reinforcement learning network result (44), feeding an uncertainty certificates (46), outputting an uncertainty certificates result (48), comparing the uncertainty certificates result (50), determining the recommendation value (52), of a control parameter of the fluid infusion device (20) based on a state of the unique user using a control algorithm or the deep reinforcement learning network.
-
公开(公告)号:EP4372615A1
公开(公告)日:2024-05-22
申请号:EP22207703.4
申请日:2022-11-16
IPC分类号: G06N3/0442 , G06N3/092
CPC分类号: G06N3/0442 , G06N3/092 , B60W30/18163 , G08G1/167 , G08G1/0112 , G08G1/0133 , G08G1/0145
摘要: A computer implemented method for determining control decisions for a vehicle comprises the following steps carried out by computer hardware components: acquiring sensor data; processing the acquired sensor data to determine one or more control decisions; wherein determining the one or more control decisions comprises: determining a probability distribution over a discrete action space based on the processing of the acquired sensor data and an accumulator value, wherein the accumulator value is indicative of control decisions taken in the past; sampling the probability distribution; and determining the control decision based on the sampling; wherein the accumulator value is updated based on the probability distribution and/or the determined control decision.
-
10.
公开(公告)号:EP4357976A1
公开(公告)日:2024-04-24
申请号:EP23206214.1
申请日:2019-05-29
摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning. One of the methods includes selecting an action to be performed by the agent using both a slow updating recurrent neural network and a fast updating recurrent neural network that receives a fast updating input that includes the hidden state of the slow updating recurrent neural network.
-
-
-
-
-
-
-
-
-