-
公开(公告)号:US20210089909A1
公开(公告)日:2021-03-25
申请号:US17032578
申请日:2020-09-25
Applicant: DeepMind Technologies Limited
Inventor: Mikolaj Binkowski , Karen Simonyan , Jeffrey Donahue , Aidan Clark , Sander Etienne Lea Dieleman , Erich Konrad Elsen , Luis Carlos Cobo Rus , Norman Casagrande
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output audio examples using a generative neural network. One of the methods includes obtaining a training conditioning text input; processing a training generative input comprising the training conditioning text input using a feedforward generative neural network to generate a training audio output; processing the training audio output using each of a plurality of discriminators, wherein the plurality of discriminators comprises one or more conditional discriminators and one or more unconditional discriminators; determining a first combined prediction by combining the respective predictions of the plurality of discriminators; and determining an update to current values of a plurality of generative parameters of the feedforward generative neural network to increase a first error in the first combined prediction.
-
公开(公告)号:US12211484B2
公开(公告)日:2025-01-28
申请号:US18418025
申请日:2024-01-19
Applicant: DeepMind Technologies Limited
Inventor: Luis Carlos Cobo Rus , Nal Kalchbrenner , Erich Elsen , Chenjie Gu
IPC: G10L25/30 , G10L13/00 , G10L13/047 , G10L13/08
Abstract: Techniques are disclosed that enable generation of an audio waveform representing synthesized speech based on a difference signal determined using an autoregressive model. Various implementations include using a distribution of the difference signal values to represent sounds found in human speech with a higher level of granularity than sounds not frequently found in human speech. Additional or alternative implementations include using one or more speakers of a client device to render the generated audio waveform.
-
公开(公告)号:US20220254330A1
公开(公告)日:2022-08-11
申请号:US17610934
申请日:2019-05-20
Applicant: DeepMind Technologies Limited
Inventor: Luis Carlos Cobo Rus , Nal Kalchbrenner , Erich Elsen , Chenjie Gu
IPC: G10L13/047 , G10L13/08 , G10L25/30
Abstract: Techniques are disclosed that enable generation of an audio waveform representing synthesized speech based on a difference signal determined using an autoregressive model. Various implementations include using a distribution of the difference signal values to represent sounds found in human speech with a higher level of granularity than sounds not frequently found in human speech. Additional or alternative implementations include using one or more speakers of a client device to render the generated audio waveform.
-
公开(公告)号:US20240161729A1
公开(公告)日:2024-05-16
申请号:US18418025
申请日:2024-01-19
Applicant: DeepMind Technologies Limited
Inventor: Luis Carlos Cobo Rus , Nal Kalchbrenner , Erich Elsen , Chenjie Gu
IPC: G10L13/047 , G10L13/08 , G10L25/30
CPC classification number: G10L13/047 , G10L13/08 , G10L25/30
Abstract: Techniques are disclosed that enable generation of an audio waveform representing synthesized speech based on a difference signal determined using an autoregressive model. Various implementations include using a distribution of the difference signal values to represent sounds found in human speech with a higher level of granularity than sounds not frequently found in human speech. Additional or alternative implementations include using one or more speakers of a client device to render the generated audio waveform.
-
公开(公告)号:US11915682B2
公开(公告)日:2024-02-27
申请号:US17610934
申请日:2019-05-20
Applicant: DeepMind Technologies Limited
Inventor: Luis Carlos Cobo Rus , Nal Kalchbrenner , Erich Elsen , Chenjie Gu
IPC: G10L13/00 , G10L19/00 , G10L13/047 , G10L13/08 , G10L25/30
CPC classification number: G10L13/047 , G10L13/08 , G10L25/30
Abstract: Techniques are disclosed that enable generation of an audio waveform representing synthesized speech based on a difference signal determined using an autoregressive model. Various implementations include using a distribution of the difference signal values to represent sounds found in human speech with a higher level of granularity than sounds not frequently found in human speech. Additional or alternative implementations include using one or more speakers of a client device to render the generated audio waveform.
-
-
-
-