-
公开(公告)号:US11232780B1
公开(公告)日:2022-01-25
申请号:US17033783
申请日:2020-09-26
Applicant: Google LLC
Inventor: Rakesh Iyer , Vincent Wan
IPC: G10L13/027 , G10L13/10
Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.
-
公开(公告)号:US20210406451A1
公开(公告)日:2021-12-30
申请号:US17291647
申请日:2019-01-28
Applicant: Google LLC
Inventor: Rakesh Iyer , Lisha Ruan
IPC: G06F40/169 , G06K9/46 , G06K9/00 , G06K9/20
Abstract: Systems and methods for extracting information from documents are provided. In one example embodiment, a computer-implemented method includes obtaining one or more units of text from an image of a document. The method includes determining one or more annotated values from the one or more units of text and determining a set of candidate labels for each annotated value. The method determines each set of candidate labels by performing a search for the candidate labels based at least in part on a language associated with the document and a location of each annotated value. The method includes determining a canonical label for each annotated value based at least in part on the associated candidate labels, and mapping at least one annotated value to an action that is presented to a user based at least in part on the canonical label associated with the annotated value.
-
公开(公告)号:US12125469B2
公开(公告)日:2024-10-22
申请号:US18488735
申请日:2023-10-17
Applicant: Google LLC
Inventor: Rakesh Iyer , Vincent Wan
IPC: G10L13/10 , G10L13/027
CPC classification number: G10L13/027 , G10L13/10
Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.
-
公开(公告)号:US12033412B2
公开(公告)日:2024-07-09
申请号:US17291647
申请日:2019-01-28
Applicant: Google LLC
Inventor: Rakesh Iyer , Lisha Ruan
IPC: G06V30/40 , G06F40/169 , G06V30/10
CPC classification number: G06V30/40 , G06F40/169 , G06V30/10
Abstract: Systems and methods for extracting information from documents are provided. In one example embodiment, a computer-implemented method includes obtaining one or more units of text from an image of a document. The method includes determining one or more annotated values from the one or more units of text and determining a set of candidate labels for each annotated value. The method determines each set of candidate labels by performing a search for the candidate labels based at least in part on a language associated with the document and a location of each annotated value. The method includes determining a canonical label for each annotated value based at least in part on the associated candidate labels, and mapping at least one annotated value to an action that is presented to a user based at least in part on the canonical label associated with the annotated value.
-
公开(公告)号:US20240404308A1
公开(公告)日:2024-12-05
申请号:US18671218
申请日:2024-05-22
Applicant: Google LLC
Inventor: Rakesh Iyer , Lisha Ruan
IPC: G06V30/40 , G06F40/169 , G06V30/10
Abstract: Systems and methods for extracting information from documents are provided. In one example embodiment, a computer-implemented method includes obtaining one or more units of text from an image of a document. The method includes determining one or more annotated values from the one or more units of text and determining a set of candidate labels for each annotated value. The method determines each set of candidate labels by performing a search for the candidate labels based at least in part on a language associated with the document and a location of each annotated value. The method includes determining a canonical label for each annotated value based at least in part on the associated candidate labels, and mapping at least one annotated value to an action that is presented to a user based at least in part on the canonical label associated with the annotated value.
-
公开(公告)号:US20240331681A1
公开(公告)日:2024-10-03
申请号:US18128107
申请日:2023-03-29
Applicant: GOOGLE LLC
Inventor: Rakesh Iyer , Jeffrey Robert Pitman , Pendar Yousefi , Te I , Tiruvilwamalai Raman
IPC: G10L13/047 , G06F40/58 , G10L13/033 , G10L13/08 , G10L15/00 , G10L15/16 , G10L15/22 , G10L25/90
CPC classification number: G10L13/047 , G06F40/58 , G10L13/0335 , G10L13/08 , G10L15/005 , G10L15/16 , G10L15/22 , G10L25/90
Abstract: A computer generated voice can automatically be adapted to be similar to a user's voice. Various implementations include processing audio data capturing a first language spoken utterance to identify one or more pitch characteristics. For example, the one or more pitch characteristics can include an estimated frequency range of the given user's voice. Additionally or alternatively, the system can process the audio data capturing the first language spoken utterance and a set of candidate computer generated voices using a computer generated voice selection model to select a candidate computer generated voice. Various implementations can include automatically modifying the selected candidate computer generated voice based on the one or more pitch characteristics to change the frequency range of the modified computer generated voice based on the user's voice.
-
公开(公告)号:US20230097338A1
公开(公告)日:2023-03-30
申请号:US17533401
申请日:2021-11-23
Applicant: GOOGLE LLC
Inventor: Nnamdi Kalu , Fernando Fernandes , Uri First , Erwin Jansen , Rakesh Iyer , Lingfeng Yang
IPC: G10L13/08 , G10L13/02 , G10L15/26 , G06F40/279
Abstract: Systems and methods for synthesizing speech based on received text and one or more emulated speech parameters. Text is received with one or more emulated speech parameters that indicate one or more features for the synthesized speech. Synthesized speech audio is generated based on the received parameters. The synthesized speech audio data is provided to an emulated microphone component that provides the synthesized audio to an automatic speech recognizer. The automatic speech recognizer utilizes one or more speech recognition models to generate converted text based on the synthesized speech audio data.
-
公开(公告)号:US20240046915A1
公开(公告)日:2024-02-08
申请号:US18488735
申请日:2023-10-17
Applicant: Google LLC
Inventor: Rakesh Iyer , Vincent Wan
IPC: G10L13/027 , G10L13/10
CPC classification number: G10L13/027 , G10L13/10
Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.
-
公开(公告)号:US11830474B2
公开(公告)日:2023-11-28
申请号:US17647246
申请日:2022-01-06
Applicant: Google LLC
Inventor: Rakesh Iyer , Vincent Wan
IPC: G10L13/10 , G10L13/027
CPC classification number: G10L13/027 , G10L13/10
Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.
-
公开(公告)号:US20220130371A1
公开(公告)日:2022-04-28
申请号:US17647246
申请日:2022-01-06
Applicant: Google LLC
Inventor: Rakesh Iyer , Vincent Wan
IPC: G10L13/027 , G10L13/10
Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.
-
-
-
-
-
-
-
-
-