-
公开(公告)号:US11488575B2
公开(公告)日:2022-11-01
申请号:US17055951
申请日:2019-05-17
Applicant: Google LLC
Inventor: Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick Nguyen
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.
-
公开(公告)号:US11335333B2
公开(公告)日:2022-05-17
申请号:US16717746
申请日:2019-12-17
Applicant: Google LLC
Inventor: Wei Han , Chung-Cheng Chiu , Yu Zhang , Yonghui Wu , Patrick Nguyen , Sergey Kishchenko
Abstract: A method includes obtaining audio data for a long-form utterance and segmenting the audio data for the long-form utterance into a plurality of overlapping segments. The method also includes, for each overlapping segment of the plurality of overlapping segments: providing features indicative of acoustic characteristics of the long-form utterance represented by the corresponding overlapping segment as input to an encoder neural network; processing an output of the encoder neural network using an attender neural network to generate a context vector; and generating word elements using the context vector and a decoder neural network. The method also includes generating a transcription for the long-form utterance by merging the word elements from the plurality of overlapping segments and providing the transcription as an output of the automated speech recognition system.
-
公开(公告)号:US20210217404A1
公开(公告)日:2021-07-15
申请号:US17055951
申请日:2019-05-17
Applicant: Google LLC
Inventor: Ye Jia , Zhifeng Chen , Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Ignacio Lopez Moreno , Fei Ren , Yu Zhang , Quan Wang , Patrick Nguyen
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.
-
-