-
公开(公告)号:US20230352027A1
公开(公告)日:2023-11-02
申请号:US18219480
申请日:2023-07-07
Applicant: GOOGLE LLC
Inventor: Asaf Aharoni , Arun Narayanan , Nir Shabat , Parisa Haghani , Galen Tsai Chuang , Yaniv Leviathan , Neeraj Gaur , Pedro J. Moreno Mengibar , Rohit Prakash Prabhavalkar , Zhongdi Qu , Austin Severn Waters , Tomer Amiaz , Michiel A.U. Bacchiani
CPC classification number: G10L15/26 , H04M3/4286 , H04M1/663 , G10L15/32 , H04M3/5191 , H04M1/02
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
-
公开(公告)号:US20220309340A1
公开(公告)日:2022-09-29
申请号:US17544570
申请日:2021-12-07
Applicant: Google LLC
Inventor: Isabel Leal , Neeraj Gaur , Parisa Haghani , Brian Farris , Bhuvana Ramabhadran , Manasa Prasad , Pedro J. Moreno Mengibar , Yun Zhu
Abstract: A method for distilling one or more trained teacher automatic speech recognition (ASR) models into a multilingual student model includes receiving a plurality of teacher training examples and a plurality of student training examples. The method also includes training one or more teacher automatic speech recognition (ASR) models using the plurality of teacher training examples. Each teacher ASR model is configured to output a respective textual representation of a respective audio input. The method further includes generating a multi-lingual student ASR model by training the multi-lingual student ASR model using the plurality of student training examples and distilling the trained one or more teacher ASR models into the multilingual student ASR model using a tunable distillation loss weight. Each student ASR model is configured to receive an audio input and output a corresponding textual representation of the received audio input.
-
公开(公告)号:US11158321B2
公开(公告)日:2021-10-26
申请号:US16580726
申请日:2019-09-24
Applicant: Google LLC
Inventor: Asaf Aharoni , Arun Narayanan , Nir Shabat , Parisa Haghani , Galen Tsai Chuang , Yaniv Leviathan , Neeraj Gaur , Pedro J. Moreno Mengibar , Rohit Prakash Prabhavalkar , Zhongdi Qu , Austin Severn Waters , Tomer Amiaz , Michiel A. U. Bacchiani
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
-
公开(公告)号:US12080283B2
公开(公告)日:2024-09-03
申请号:US17701635
申请日:2022-03-22
Applicant: Google LLC
Inventor: Neeraj Gaur , Tongzhou Chen , Ehsan Variani , Bhuvana Ramabhadran , Parisa Haghani , Pedro J. Moreno Mengibar
IPC: G10L15/197 , G10L15/00 , G10L15/16 , G10L15/22
CPC classification number: G10L15/197 , G10L15/005 , G10L15/16 , G10L15/22
Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.
-
公开(公告)号:US20240203409A1
公开(公告)日:2024-06-20
申请号:US18589220
申请日:2024-02-27
Applicant: Google LLC
Inventor: Neeraj Gaur , Tongzhou Chen , Ehsan Variani , Bhuvana Ramabhadran , Parisa Haghani , Pedro J. Moreno Mengibar
IPC: G10L15/197 , G10L15/00 , G10L15/16 , G10L15/22
CPC classification number: G10L15/197 , G10L15/005 , G10L15/16 , G10L15/22
Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.
-
公开(公告)号:US20210090570A1
公开(公告)日:2021-03-25
申请号:US16580726
申请日:2019-09-24
Applicant: Google LLC
Inventor: Asaf Aharoni , Arun Narayanan , Nir Shabat , Parisa Haghani , Galen Tsai Chuang , Yaniv Leviathan , Neeraj Gaur , Pedro J. Moreno Mengibar , Rohit Prakash Prabhavalkar , Zhongdi Qu , Austin Severn Waters , Tomer Amiaz , Michiel A.U. Bacchiani
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
-
-
-
-
-