-
公开(公告)号:US11355097B2
公开(公告)日:2022-06-07
申请号:US17061437
申请日:2020-10-01
Applicant: DeepMind Technologies Limited
Inventor: Yutian Chen , Scott Ellison Reed , Aaron Gerard Antonius van den Oord , Oriol Vinyals , Heiga Zen , Ioannis Alexandros Assael , Brendan Shillingford , Joao Ferdinando Gomes de Freitas
IPC: G10L13/047 , G10L13/033 , G10L13/00 , G06N3/04 , G06N3/08
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an adaptive audio-generation model. One of the methods includes generating an adaptive audio-generation model including learning a plurality of embedding vectors and parameter values of a neural network using training data comprising first text and audio data representing a plurality of different individual speakers speaking portions of the first text, wherein the plurality of embedding vectors represent respective voice characteristics of the plurality of different individual speakers. The adaptive audio-generation model is adapted for a new individual speaker using adaptation data comprising second text and audio data representing the new individual speaker speaking portions of the second text, the new individual speaker being different from each of the plurality of individual speakers, wherein adapting the audio-generation model includes learning a new embedding vector for the new individual speaker.
-
公开(公告)号:US11354594B2
公开(公告)日:2022-06-07
申请号:US16601505
申请日:2019-10-14
Applicant: DeepMind Technologies Limited
Inventor: Yutian Chen , Joao Ferdinando Gomes de Freitas
Abstract: Methods and systems for determining an optimized setting for one or more process parameters of a machine learning training process are described. One of the methods includes processing a current network input using a recurrent neural network in accordance with first values of the network parameters to obtain a current network output, obtaining a measure of the performance of the machine learning training process with an updated setting defined by the current network output, and generating a new network input that includes (i) the updated setting defined by the current network output and (ii) the measure of the performance of the training process with the updated setting defined by the current network output.
-
公开(公告)号:US20200327413A1
公开(公告)日:2020-10-15
申请号:US16859811
申请日:2020-04-27
Applicant: DeepMind Technologies Limited
Inventor: Scott Ellison Reed , Joao Ferdinando Gomes de Freitas
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural programming. One of the methods includes processing a current neural network input using a core recurrent neural network to generate a neural network output; determining, from the neural network output, whether or not to end a currently invoked program and to return to a calling program from the set of programs; determining, from the neural network output, a next program to be called; determining, from the neural network output, contents of arguments to the next program to be called; receiving a representation of a current state of the environment; and generating a next neural network input from an embedding for the next program to be called and the representation of the current state of the environment.
-
公开(公告)号:US12260334B2
公开(公告)日:2025-03-25
申请号:US18497924
申请日:2023-10-30
Applicant: DeepMind Technologies Limited
Inventor: Scott Ellison Reed , Joao Ferdinando Gomes de Freitas
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural programming. One of the methods includes processing a current neural network input using a core recurrent neural network to generate a neural network output; determining, from the neural network output, whether or not to end a currently invoked program and to return to a calling program from the set of programs; determining, from the neural network output, a next program to be called; determining, from the neural network output, contents of arguments to the next program to be called; receiving a representation of a current state of the environment; and generating a next neural network input from an embedding for the next program to be called and the representation of the current state of the environment.
-
公开(公告)号:US20240394504A1
公开(公告)日:2024-11-28
申请号:US18637279
申请日:2024-04-16
Applicant: DeepMind Technologies Limited
Inventor: Misha Man Ray Denil , Sergio Gomez Colmenarejo , Serkan Cabi , David William Saxton , Joao Ferdinando Gomes de Freitas
Abstract: A reinforcement learning system is proposed comprising a plurality of property detector neural networks. Each property detector neural network is arranged to receive data representing an object within an environment, and to generate property data associated with a property of the object. A processor is arranged to receive an instruction indicating a task associated with an object having an associated property, and process the output of the plurality of property detector neural networks based upon the instruction to generate a relevance data item. The relevance data item indicates objects within the environment associated with the task. The processor also generates a plurality of weights based upon the relevance data item, and, based on the weights, generates modified data representing the plurality of objects within the environment. A neural network is arranged to receive the modified data and to output an action associated with the task.
-
6.
公开(公告)号:US20240281654A1
公开(公告)日:2024-08-22
申请号:US18292165
申请日:2022-08-12
Applicant: DeepMind Technologies Limited
Inventor: Scott Ellison Reed , Konrad Zolna , Emilio Parisotto , Tom Erez , Alexander Novikov , Jack William Rae , Misha Man Ray Denil , Joao Ferdinando Gomes de Freitas , Oriol Vinyals , Sergio Gomez , Ashley Deloris Edwards , Jacob Bruce , Gabriel Barth-Maron
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent to interact with an environment using an action selection neural network. In one aspect, a method comprises, at each time step in a sequence of time steps: generating a current representation of a state of a task being performed by the agent in the environment as of the current time step as a sequence of data elements; autoregressively generating a sequence of data elements representing a current action to be performed by the agent at the current time step; and after autoregressively generating the sequence of data elements representing the current action, causing the agent to perform the current action at the current time step.
-
公开(公告)号:US20240265911A1
公开(公告)日:2024-08-08
申请号:US18571553
申请日:2022-06-15
Applicant: DeepMind Technologies Limited
CPC classification number: G10L15/063 , G10L25/30
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data using an adaptive visual speech recognition model. One of the methods includes receiving a video that includes a plurality of video frames that depict a first speaker: obtaining a first embedding characterizing the first speaker; and processing a first input comprising (i) the video and (ii) the first embedding using a visual speech recognition neural network having a plurality of parameters, wherein the visual speech recognition neural network is configured to process the video and the first embedding in accordance with trained values of the parameters to generate a speech recognition output that defines a sequence of one or more words being spoken by the first speaker in the video.
-
公开(公告)号:US11803746B2
公开(公告)日:2023-10-31
申请号:US16859811
申请日:2020-04-27
Applicant: DeepMind Technologies Limited
Inventor: Scott Ellison Reed , Joao Ferdinando Gomes de Freitas
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural programming. One of the methods includes processing a current neural network input using a core recurrent neural network to generate a neural network output; determining, from the neural network output, whether or not to end a currently invoked program and to return to a calling program from the set of programs; determining, from the neural network output, a next program to be called; determining, from the neural network output, contents of arguments to the next program to be called; receiving a representation of a current state of the environment; and generating a next neural network input from an embedding for the next program to be called and the representation of the current state of the environment.
-
公开(公告)号:US11361403B2
公开(公告)日:2022-06-14
申请号:US16324061
申请日:2018-02-26
Applicant: DeepMind Technologies Limited
Inventor: Nal Emmerich Kalchbrenner , Daniel Belov , Sergio Gomez Colmenarejo , Aaron Gerard Antonius van den Oord , Ziyu Wang , Joao Ferdinando Gomes de Freitas , Scott Ellison Reed
Abstract: A method of generating an output image having an output resolution of N pixels×N pixels, each pixel in the output image having a respective color value for each of a plurality of color channels, the method comprising: obtaining a low-resolution version of the output image; and upscaling the low-resolution version of the output image to generate the output image having the output resolution by repeatedly performing the following operations: obtaining a current version of the output image having a current K×K resolution; and processing the current version of the output image using a set of convolutional neural networks that are specific to the current resolution to generate an updated version of the output image having a 2K×2K resolution.
-
公开(公告)号:US20210110831A1
公开(公告)日:2021-04-15
申请号:US17043846
申请日:2019-05-20
Applicant: DeepMind Technologies Limited
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.
-
-
-
-
-
-
-
-
-