-
公开(公告)号:US20210295858A1
公开(公告)日:2021-09-23
申请号:US17222736
申请日:2021-04-05
Applicant: Google LLC
Inventor: Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Michael Schuster , Navdeep Jaitly , Zongheng Yang , Zhifeng Chen , Yu Zhang , Yuxuan Wang , Russell John Wyatt Skerry-Ryan , Ryan M. Rifkin , Ioannis Agiomyrgiannakis
Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.
-
公开(公告)号:US11107457B2
公开(公告)日:2021-08-31
申请号:US16696101
申请日:2019-11-26
Applicant: Google LLC
Inventor: Samuel Bengio , Yuxuan Wang , Zongheng Yang , Zhifeng Chen , Yonghui Wu , Ioannis Agiomyrgiannakis , Ron J. Weiss , Navdeep Jaitly , Ryan M. Rifkin , Robert Andrew James Clark , Quoc V. Le , Russell J. Ryan , Ying Xiao
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
-
公开(公告)号:US10573293B2
公开(公告)日:2020-02-25
申请号:US16447862
申请日:2019-06-20
Applicant: Google LLC
Inventor: Samuel Bengio , Yuxuan Wang , Zongheng Yang , Zhifeng Chen , Yonghui Wu , Ioannis Agiomyrgiannakis , Ron J. Weiss , Navdeep Jaitly , Ryan M. Rifkin , Robert Andrew James Clark , Quoc V. Le , Russell J. Ryan , Ying Xiao
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
-
公开(公告)号:US20180357312A1
公开(公告)日:2018-12-13
申请号:US16105717
申请日:2018-08-20
Applicant: Google LLC
Inventor: Geremy A. Heitz, III , Adam Berenzweig , Jason E. Weston , Ron J. Weiss , Sally A. Goldman , Thomas Walters , Samy Bengio , Douglas Eck , Jay M. Ponte , Ryan M. Rifkin
IPC: G06F17/30
CPC classification number: G06F17/30772 , G06F17/30743
Abstract: Generating a playlist may include designating a seed track in an audio library; identifying audio tracks in the audio library having constructs that are within a range of a corresponding construct of the seed track, where the constructs for the audio tracks are derived from frequency representations of the audio tracks, and the corresponding construct for the seed track is derived from a frequency representation of the seed track; and generating the playlist using at least some of the audio tracks that were identified.
-
-
-