-
公开(公告)号:US20220284882A1
公开(公告)日:2022-09-08
申请号:US17190456
申请日:2021-03-03
Applicant: Google LLC
Inventor: Vijayaditya Peddinti , Bhuvana Ramabhadran , Andrew Rosenberg , Mateusz Golebiewski
IPC: G10L13/08 , G10L15/187
Abstract: A method for instantaneous learning in text-to-speech (TTS) during dialog includes receiving a user pronunciation of a particular word present in a query spoken by a user. The method also includes receiving a TTS pronunciation of the same particular word that is present in a TTS input where the TTS pronunciation of the particular word is different than the user pronunciation of the particular word. The method also includes obtaining user pronunciation-related features and TTS pronunciation related features associated with the particular word. The method also includes generating a pronunciation decision selecting one of the user pronunciation or the TTS pronunciation of the particular word that is associated with a highest confidence. The method also include providing the TTS audio that includes a synthesized speech representation of the response to the query using the user pronunciation or the TTS pronunciation for the particular word.
-
公开(公告)号:US11676572B2
公开(公告)日:2023-06-13
申请号:US17190456
申请日:2021-03-03
Applicant: Google LLC
Inventor: Vijayaditya Peddinti , Bhuvana Ramabhadran , Andrew Rosenberg , Mateusz Golebiewski
IPC: G10L17/02 , G10L13/08 , G10L15/187
CPC classification number: G10L13/08 , G10L15/187
Abstract: A method for instantaneous learning in text-to-speech (TTS) during dialog includes receiving a user pronunciation of a particular word present in a query spoken by a user. The method also includes receiving a TTS pronunciation of the same particular word that is present in a TTS input where the TTS pronunciation of the particular word is different than the user pronunciation of the particular word. The method also includes obtaining user pronunciation-related features and TTS pronunciation related features associated with the particular word. The method also includes generating a pronunciation decision selecting one of the user pronunciation or the TTS pronunciation of the particular word that is associated with a highest confidence. The method also include providing the TTS audio that includes a synthesized speech representation of the response to the query using the user pronunciation or the TTS pronunciation for the particular word.
-
公开(公告)号:US20200279562A1
公开(公告)日:2020-09-03
申请号:US16874646
申请日:2020-05-14
Applicant: Google LLC
Inventor: Alexander H. Gruenstein , Taral Pradeep Joglekar , Vijayaditya Peddinti , Michiel A.u. Bacchiani
Abstract: A method includes obtaining, by data processing hardware, a plurality of non-watermarked speech samples. Each non-watermarked speech does not include an audio watermark sample. The method includes, from each non-watermarked speech sample of the plurality of non-watermarked speech samples, generating one or more corresponding watermarked speech samples that each include at least one audio watermark. The method includes training, using the plurality of non-watermarked speech samples and corresponding watermarked speech samples, a model to determine whether a given audio data sample includes an audio watermark, and after training the model, transmitting the trained model to a user computing device.
-
公开(公告)号:US11967323B2
公开(公告)日:2024-04-23
申请号:US17849253
申请日:2022-06-24
Applicant: GOOGLE LLC
Inventor: Alexander H. Gruenstein , Taral Pradeep Joglekar , Vijayaditya Peddinti , Michiel A. U. Bacchiani
CPC classification number: G10L15/22 , G10L15/063 , G10L15/08 , G10L15/30 , G10L17/00 , G10L17/22 , G10L25/51 , G10L2015/088
Abstract: A method includes adding, by a first computing device, a first audio watermark to first speech data corresponding to playback of a first utterance including a hotword used to invoke an attention of a second computing device. The method includes outputting, by the first computing device, the playback of the first utterance corresponding to the watermarked first speech data. The second computing device is configured to receive the watermarked first speech data and determine to cease processing of the watermarked first speech data.
-
公开(公告)号:US10692496B2
公开(公告)日:2020-06-23
申请号:US16418415
申请日:2019-05-21
Applicant: Google LLC
Inventor: Alexander H. Gruenstein , Taral Pradeep Joglekar , Vijayaditya Peddinti , Michiel A. U. Bacchiani
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for suppressing hotwords are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to playback of an utterance. The actions further include providing the audio data as an input to a model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample. The actions further include receiving, from the model, data indicating whether the audio data includes the audio watermark. The actions further include, based on the data indicating whether the audio data includes the audio watermark, determining to continue or cease processing of the audio data.
-
公开(公告)号:US11373652B2
公开(公告)日:2022-06-28
申请号:US16874646
申请日:2020-05-14
Applicant: Google LLC
Inventor: Alexander H. Gruenstein , Taral Pradeep Joglekar , Vijayaditya Peddinti , Michiel A. u. Bacchiani
Abstract: A method includes obtaining, by data processing hardware, a plurality of non-watermarked speech samples. Each non-watermarked speech does not include an audio watermark sample. The method includes, from each non-watermarked speech sample of the plurality of non-watermarked speech samples, generating one or more corresponding watermarked speech samples that each include at least one audio watermark. The method includes training, using the plurality of non-watermarked speech samples and corresponding watermarked speech samples, a model to determine whether a given audio data sample includes an audio watermark, and after training the model, transmitting the trained model to a user computing device.
-
公开(公告)号:US20230274727A1
公开(公告)日:2023-08-31
申请号:US18312576
申请日:2023-05-04
Applicant: Google LLC
Inventor: Vijayaditya Peddinti , Bhuvana Ramabhadran , Andrew Rosenberg , Mateusz Golebiewski
IPC: G10L13/08 , G10L15/187
CPC classification number: G10L13/08 , G10L15/187
Abstract: A method for instantaneous learning in text-to-speech (TTS) during dialog includes receiving a user pronunciation of a particular word present in a query spoken by a user. The method also includes receiving a TTS pronunciation of the same particular word that is present in a TTS input where the TTS pronunciation of the particular word is different than the user pronunciation of the particular word. The method also includes obtaining user pronunciation-related features and TTS pronunciation related features associated with the particular word. The method also includes generating a pronunciation decision selecting one of the user pronunciation or the TTS pronunciation of the particular word that is associated with a highest confidence. The method also include providing the TTS audio that includes a synthesized speech representation of the response to the query using the user pronunciation or the TTS pronunciation for the particular word.
-
公开(公告)号:US20190362719A1
公开(公告)日:2019-11-28
申请号:US16418415
申请日:2019-05-21
Applicant: Google LLC
Inventor: Alexander H. Gruenstein , Taral Pradeep Joglekar , Vijayaditya Peddinti , Michiel A.U. Bacchiani
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for suppressing hotwords are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to playback of an utterance. The actions further include providing the audio data as an input to a model (i) that is configured to determine whether a given audio data sample includes an audio watermark and (ii) that was trained using watermarked audio data samples that each include an audio watermark sample and non-watermarked audio data samples that do not each include an audio watermark sample. The actions further include receiving, from the model, data indicating whether the audio data includes the audio watermark. The actions further include, based on the data indicating whether the audio data includes the audio watermark, determining to continue or cease processing of the audio data.
-
-
-
-
-
-
-