MULTILINGUAL SPEECH SYNTHESIS AND CROSS-LANGUAGE VOICE CLONING

    公开(公告)号:US20200380952A1

    公开(公告)日:2020-12-03

    申请号:US16855042

    申请日:2020-04-22

    Applicant: Google LLC

    Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

    Large-Scale Multilingual Speech Recognition With A Streaming End-To-End Model

    公开(公告)号:US20200380215A1

    公开(公告)日:2020-12-03

    申请号:US16834342

    申请日:2020-03-30

    Applicant: Google LLC

    Abstract: A method of transcribing speech using a multilingual end-to-end (E2E) speech recognition model includes receiving audio data for an utterance spoken in a particular native language, obtaining a language vector identifying the particular language, and processing, using the multilingual E2E speech recognition model, the language vector and acoustic features derived from the audio data to generate a transcription for the utterance. The multilingual E2E speech recognition model includes a plurality of language-specific adaptor modules that include one or more adaptor modules specific to the particular native language and one or more other adaptor modules specific to at least one other native language different than the particular native language. The method also includes providing the transcription for output.

    REWARD AUGMENTED MODEL TRAINING
    34.
    发明申请

    公开(公告)号:US20190188566A1

    公开(公告)日:2019-06-20

    申请号:US16328207

    申请日:2017-08-25

    Applicant: GOOGLE LLC

    CPC classification number: G06N3/08 G06N20/00

    Abstract: A method includes obtaining data identifying a machine learning model to be trained to perform a machine learning task, the machine learning model being configured to receive an input example and to process the input example in accordance with current values of a plurality of model parameters to generate a model output for the input example; obtaining initial training data for training the machine learning model, the initial training data comprising a plurality of training examples and, for each training example, a ground truth output that should be generated by the machine learning model by processing the training example; generating modified training data from the initial training data; and training the machine learning model on the modified training data.

    Phonemes And Graphemes for Neural Text-to-Speech

    公开(公告)号:US20240339106A1

    公开(公告)日:2024-10-10

    申请号:US18746809

    申请日:2024-06-18

    Applicant: Google LLC

    CPC classification number: G10L13/086 G06F40/263 G06F40/279 G06N3/08 G10L13/047

    Abstract: A method includes receiving a text input including a sequence of words represented as an input encoder embedding. The input encoder embedding includes a plurality of tokens, with the plurality of tokens including a first set of grapheme tokens representing the text input as respective graphemes and a second set of phoneme tokens representing the text input as respective phonemes. The method also includes, for each respective phoneme token of the second set of phoneme tokens: identifying a respective word of the sequence of words corresponding to the respective phoneme token and determining a respective grapheme token representing the respective word of the sequence of words corresponding to the respective phoneme token. The method also includes generating an output encoder embedding based on a relationship between each respective phoneme token and the corresponding grapheme token determined to represent a same respective word as the respective phoneme token.

Patent Agency Ranking