USING SPEECH RECOGNITION TO IMPROVE CROSS-LANGUAGE SPEECH SYNTHESIS

    公开(公告)号:US20240282292A1

    公开(公告)日:2024-08-22

    申请号:US18654278

    申请日:2024-05-03

    申请人: Google LLC

    摘要: A method for training a speech recognition model includes obtaining a multilingual text-to-speech (TTS) model. The method also includes generating a native synthesized speech representation for an input text sequence in a first language that is conditioned on speaker characteristics of a native speaker of the first language. The method also includes generating a cross-lingual synthesized speech representation for the input text sequence in the first language that is conditioned on speaker characteristics of a native speaker of a different second language. The method also includes generating a first speech recognition result for the native synthesized speech representation and a second speech recognition result for the cross-lingual synthesized speech representation. The method also includes determining a consistent loss term based on the first speech recognition result and the second speech recognition result and updating parameters of the speech recognition model based on the consistent loss term.

    DYNAMIC SYSTEM RESPONSE CONFIGURATION
    2.
    发明公开

    公开(公告)号:US20240185833A1

    公开(公告)日:2024-06-06

    申请号:US18403041

    申请日:2024-01-03

    摘要: A natural language processing system may use system response configuration data to determine customized output data forms when outputting data for a user. The system response configuration data may represent various output attributes the system may use when creating output data. The system may have a certain number of existing profiles where a profile is associated with certain settings for the system response configuration data/attributes. The system may also use various data such as context data, sentiment data, or the like to customize system response configuration data during a dialog. Other components, such as natural language generation (NLG), text-to-speech (TTS), or the like, may use the customized system response configuration data to determine the form, timing, etc. of output data to be presented to a user.

    METHOD FOR GENERATING CAPTIONS, SUBTITLES AND DUBBING FOR AUDIOVISUAL MEDIA

    公开(公告)号:US20240155205A1

    公开(公告)日:2024-05-09

    申请号:US18403829

    申请日:2024-01-04

    申请人: SYNCWORDS

    摘要: The method for generating captions, subtitles and dubbing for audiovisual media uses a machine learning-based approach for automatically generating captions from the audio portion of audiovisual media, and further translates the captions to produce both subtitles and dubbing. A speech component of an audio portion of audiovisual media is converted into at least one text string which includes at least one word. Temporal start and end points for the at least one word are determined, and the at least one word is visually inserted into the video portion of the audiovisual media. The temporal start and end points for the at least one word are synchronized with corresponding temporal start and end points of the speech component of the audio portion of the audiovisual media. A latency period may be selectively inserted into broadcast of the audiovisual media such that the synchronization may be selectively adjusted during the latency period.

    DYNAMIC LANGUAGE SELECTION OF AN AI VOICE ASSISTANCE SYSTEM

    公开(公告)号:US20230162721A1

    公开(公告)日:2023-05-25

    申请号:US17530640

    申请日:2021-11-19

    IPC分类号: G10L13/08 G10L15/00 G10L15/22

    摘要: The computer-implemented method provides for a digital virtual assistant (DVA) receiving input spoken in a first language by a user. The DVA determines a context of a current situation based on language and identity of individuals within a proximity of the DVA. The DVA determines whether the context of the current situation includes providing a response using a second language. In response to determining the context of the current situation calls for providing the response in the second language, the DVA determines the second language based on the context, and the DVA responds to the input spoken in the first language by the user, such that the response includes a dynamic selection of the second language and is based on an interaction context of the user and the DVA, and reference to a corpus of interaction context usage of the second language in a historically similar situation.

    Speech Recognition and Text-to-Speech Learning System

    公开(公告)号:US20170287465A1

    公开(公告)日:2017-10-05

    申请号:US15087696

    申请日:2016-03-31

    IPC分类号: G10L13/10 G10L15/06 G10L13/08

    摘要: An example text-to-speech learning system performs a method for generating a pronunciation sequence conversion model. The method includes generating a first pronunciation sequence from a speech input of a training pair and generating a second pronunciation sequence from a text input of the training pair. The method also includes determining a pronunciation sequence difference between the first pronunciation sequence and the second pronunciation sequence; and generating a pronunciation sequence conversion model based on the pronunciation sequence difference. An example speech recognition learning system performs a method for generating a pronunciation sequence conversion model. The method includes extracting an audio signal vector from a speech input and applying an audio signal conversion model to the audio signal vector to generate a converted audio signal vector. The method also includes adapting an acoustic model based on the converted audio signal vector to generate an adapted acoustic model.

    PROCESS FOR IMPROVING PRONUNCIATION OF PROPER NOUNS FOREIGN TO A TARGET LANGUAGE TEXT-TO-SPEECH SYSTEM
    10.
    发明申请
    PROCESS FOR IMPROVING PRONUNCIATION OF PROPER NOUNS FOREIGN TO A TARGET LANGUAGE TEXT-TO-SPEECH SYSTEM 有权
    改进适用于目标语言文字到语音系统的正当声明的过程

    公开(公告)号:US20160358596A1

    公开(公告)日:2016-12-08

    申请号:US14733289

    申请日:2015-06-08

    摘要: A system and method configured for use in a text-to-speech (TTS) system is provided. Embodiments may include identifying, using one or more processors, a word or phrase as a named entity and identifying a language of origin associated with the named entity. Embodiments may further include transliterating the named entity to a script associated with the language of origin. If the TTS system is operating in the language of origin, embodiments may include passing the transliterated script to the TTS system. If the TTS system is not operating in the language of origin, embodiments may include generating a phoneme sequence in the language of origin using a grapheme to phoneme (G2P) converter.

    摘要翻译: 提供了一种配置为在文本到语音(TTS)系统中使用的系统和方法。 实施例可以包括使用一个或多个处理器将单词或短语识别为命名实体并且识别与被命名实体相关联的原始语言。 实施例还可以包括将命名实体音译为与原始语言相关联的脚本。 如果TTS系统以起源语言运行,则实施例可以包括将音译脚本传递给TTS系统。 如果TTS系统不以起源语言运行,则实施例可以包括使用字母到音素(G2P)转换器来产生原语语言中的音素序列。