Predicting parametric vocoder parameters from prosodic features

    公开(公告)号:US11232780B1

    公开(公告)日:2022-01-25

    申请号:US17033783

    申请日:2020-09-26

    Applicant: Google LLC

    Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.

    Systems and Methods for Extracting Information from a Physical Document

    公开(公告)号:US20210406451A1

    公开(公告)日:2021-12-30

    申请号:US17291647

    申请日:2019-01-28

    Applicant: Google LLC

    Abstract: Systems and methods for extracting information from documents are provided. In one example embodiment, a computer-implemented method includes obtaining one or more units of text from an image of a document. The method includes determining one or more annotated values from the one or more units of text and determining a set of candidate labels for each annotated value. The method determines each set of candidate labels by performing a search for the candidate labels based at least in part on a language associated with the document and a location of each annotated value. The method includes determining a canonical label for each annotated value based at least in part on the associated candidate labels, and mapping at least one annotated value to an action that is presented to a user based at least in part on the canonical label associated with the annotated value.

    Predicting parametric vocoder parameters from prosodic features

    公开(公告)号:US12125469B2

    公开(公告)日:2024-10-22

    申请号:US18488735

    申请日:2023-10-17

    Applicant: Google LLC

    CPC classification number: G10L13/027 G10L13/10

    Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.

    Systems and methods for extracting information from a physical document

    公开(公告)号:US12033412B2

    公开(公告)日:2024-07-09

    申请号:US17291647

    申请日:2019-01-28

    Applicant: Google LLC

    CPC classification number: G06V30/40 G06F40/169 G06V30/10

    Abstract: Systems and methods for extracting information from documents are provided. In one example embodiment, a computer-implemented method includes obtaining one or more units of text from an image of a document. The method includes determining one or more annotated values from the one or more units of text and determining a set of candidate labels for each annotated value. The method determines each set of candidate labels by performing a search for the candidate labels based at least in part on a language associated with the document and a location of each annotated value. The method includes determining a canonical label for each annotated value based at least in part on the associated candidate labels, and mapping at least one annotated value to an action that is presented to a user based at least in part on the canonical label associated with the annotated value.

    Systems and Methods for Extracting Information from a Physical Document

    公开(公告)号:US20240404308A1

    公开(公告)日:2024-12-05

    申请号:US18671218

    申请日:2024-05-22

    Applicant: Google LLC

    Abstract: Systems and methods for extracting information from documents are provided. In one example embodiment, a computer-implemented method includes obtaining one or more units of text from an image of a document. The method includes determining one or more annotated values from the one or more units of text and determining a set of candidate labels for each annotated value. The method determines each set of candidate labels by performing a search for the candidate labels based at least in part on a language associated with the document and a location of each annotated value. The method includes determining a canonical label for each annotated value based at least in part on the associated candidate labels, and mapping at least one annotated value to an action that is presented to a user based at least in part on the canonical label associated with the annotated value.

    GENERATING SYNTHESIZED SPEECH INPUT

    公开(公告)号:US20230097338A1

    公开(公告)日:2023-03-30

    申请号:US17533401

    申请日:2021-11-23

    Applicant: GOOGLE LLC

    Abstract: Systems and methods for synthesizing speech based on received text and one or more emulated speech parameters. Text is received with one or more emulated speech parameters that indicate one or more features for the synthesized speech. Synthesized speech audio is generated based on the received parameters. The synthesized speech audio data is provided to an emulated microphone component that provides the synthesized audio to an automatic speech recognizer. The automatic speech recognizer utilizes one or more speech recognition models to generate converted text based on the synthesized speech audio data.

    Predicting Parametric Vocoder Parameters From Prosodic Features

    公开(公告)号:US20240046915A1

    公开(公告)日:2024-02-08

    申请号:US18488735

    申请日:2023-10-17

    Applicant: Google LLC

    CPC classification number: G10L13/027 G10L13/10

    Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.

    Predicting parametric vocoder parameters from prosodic features

    公开(公告)号:US11830474B2

    公开(公告)日:2023-11-28

    申请号:US17647246

    申请日:2022-01-06

    Applicant: Google LLC

    CPC classification number: G10L13/027 G10L13/10

    Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.

    Predicting Parametric Vocoder Parameters From Prosodic Features

    公开(公告)号:US20220130371A1

    公开(公告)日:2022-04-28

    申请号:US17647246

    申请日:2022-01-06

    Applicant: Google LLC

    Abstract: A method for predicting parametric vocoder parameter includes receiving a text utterance having one or more words, each word having one or more syllables, and each syllable having one or more phonemes. The method also includes receiving, as input to a vocoder model, prosodic features that represent an intended prosody for the text utterance and a linguistic specification. The prosodic features include a duration, pitch contour, and energy contour for the text utterance, while the linguistic specification includes sentence-level linguistic features, word-level linguistic features for each word, syllable-level linguistic features for each syllable, and phoneme-level linguistic features for each phoneme. The method also includes predicting vocoder parameters based on the prosodic features and the linguistic specification. The method also includes providing the predicted vocoder parameters and the prosodic features to a parametric vocoder configured to generate a synthesized speech representation of the text utterance having the intended prosody.

Patent Agency Ranking