专利检索 cpc:"G10L13/086" 第 1 页

1.

发明公开
USING SPEECH RECOGNITION TO IMPROVE CROSS-LANGUAGE SPEECH SYNTHESIS 审中-公开

公开(公告)号：US20240282292A1

公开(公告)日：2024-08-22

申请号：US18654278

申请日：2024-05-03

申请人： Google LLC

发明人： Zhehuai Chen , Bhuvana Ramabhadran , Andrew Rosenberg , Yu Zhang , Pedro J. Moreno Mengibar

IPC分类号： G10L13/047 , G10L13/08 , G10L13/10

CPC分类号： G10L13/047 , G10L13/086 , G10L13/10

摘要： A method for training a speech recognition model includes obtaining a multilingual text-to-speech (TTS) model. The method also includes generating a native synthesized speech representation for an input text sequence in a first language that is conditioned on speaker characteristics of a native speaker of the first language. The method also includes generating a cross-lingual synthesized speech representation for the input text sequence in the first language that is conditioned on speaker characteristics of a native speaker of a different second language. The method also includes generating a first speech recognition result for the native synthesized speech representation and a second speech recognition result for the cross-lingual synthesized speech representation. The method also includes determining a consistent loss term based on the first speech recognition result and the second speech recognition result and updating parameters of the speech recognition model based on the consistent loss term.

2.

发明公开
DYNAMIC SYSTEM RESPONSE CONFIGURATION 审中-公开

公开(公告)号：US20240185833A1

公开(公告)日：2024-06-06

申请号：US18403041

申请日：2024-01-03

申请人： Amazon Technologies, Inc.

发明人： Anthony Bissell , Janet Slifka

IPC分类号： G10L13/047 , G10L13/08 , G10L15/18 , G10L15/22

CPC分类号： G10L13/047 , G10L13/086 , G10L15/18 , G10L15/22

摘要： A natural language processing system may use system response configuration data to determine customized output data forms when outputting data for a user. The system response configuration data may represent various output attributes the system may use when creating output data. The system may have a certain number of existing profiles where a profile is associated with certain settings for the system response configuration data/attributes. The system may also use various data such as context data, sentiment data, or the like to customize system response configuration data during a dialog. Other components, such as natural language generation (NLG), text-to-speech (TTS), or the like, may use the customized system response configuration data to determine the form, timing, etc. of output data to be presented to a user.

3.

发明公开
METHOD FOR GENERATING CAPTIONS, SUBTITLES AND DUBBING FOR AUDIOVISUAL MEDIA 审中-公开

公开(公告)号：US20240155205A1

公开(公告)日：2024-05-09

申请号：US18403829

申请日：2024-01-04

申请人： SYNCWORDS

发明人： ASHISH SHAH , SOTIRIS CARTSOS , ALEKSANDR DUBINSKY

IPC分类号： H04N21/488 , G10L13/08 , G10L15/26 , H04N21/81

CPC分类号： H04N21/4884 , G10L13/086 , G10L15/26 , H04N21/8106

摘要： The method for generating captions, subtitles and dubbing for audiovisual media uses a machine learning-based approach for automatically generating captions from the audio portion of audiovisual media, and further translates the captions to produce both subtitles and dubbing. A speech component of an audio portion of audiovisual media is converted into at least one text string which includes at least one word. Temporal start and end points for the at least one word are determined, and the at least one word is visually inserted into the video portion of the audiovisual media. The temporal start and end points for the at least one word are synchronized with corresponding temporal start and end points of the speech component of the audio portion of the audiovisual media. A latency period may be selectively inserted into broadcast of the audiovisual media such that the synchronization may be selectively adjusted during the latency period.

4.

发明授权
Dynamic system response configuration 有权

公开(公告)号：US11887580B2

公开(公告)日：2024-01-30

申请号：US18149839

申请日：2023-01-04

申请人： Amazon Technologies, Inc.

发明人： Anthony Bissell , Janet Slifka

IPC分类号： G10L13/047 , G10L15/22 , G10L15/18 , G10L13/08

CPC分类号： G10L13/047 , G10L13/086 , G10L15/18 , G10L15/22

摘要： A natural language processing system may select a synthesized speech quality using user profile data. The system may receive a natural language input and determine responsive output data. The system may, based at least in part on user profile data associated with the input, determine response configuration data corresponding to a quality of synthesized speech. The system may then determine further output data for presentation using the responsive output data and response configuration data.

5.

发明公开
TRANSLATION METHOD AND SYSTEM USING MULTILINGUAL TEXT-TO-SPEECH SYNTHESIS MODEL 审中-公开

公开(公告)号：US20240013771A1

公开(公告)日：2024-01-11

申请号：US18371704

申请日：2023-09-22

申请人： NEOSAPIENCE, INC.

发明人： Taesu KIM , Younggun LEE

IPC分类号： G10L13/10 , G10L15/02 , G10L13/047 , G06F40/47 , G10L25/57 , G10L13/08 , G10L15/16 , G10L13/033

CPC分类号： G10L13/10 , G10L15/02 , G10L13/047 , G06F40/47 , G10L25/57 , G10L13/086 , G10L15/16 , G10L13/033

摘要： A speech translation method using a multilingual text-to-speech synthesis model includes receiving input speech data of the first language and an articulatory feature of a speaker regarding the first language, converting the input speech data of the first language into a text of the first language, converting the text of the first language into a text of the second language, and generating output speech data for the text of the second language that simulates the speaker's speech by inputting the text of the second language and the articulatory feature of the speaker to a single artificial neural network text-to-speech synthesis model.

6.

发明公开
SPEECH SYNTHESIS WITH FOREIGN FRAGMENTS 审中-公开

公开(公告)号：US20230410790A1

公开(公告)日：2023-12-21

申请号：US17842986

申请日：2022-06-17

申请人： Cerence Operating Company

发明人： Corinne Bos-Plachez , Vito Quinci , Alina Lenhardt , Benjamin Vincent Marcel Picart , Martine Marguerite Staessen , Athos Toniolo

IPC分类号： G10L13/08 , G10L13/06 , G10L13/047 , G06F40/279 , G06F40/263

CPC分类号： G10L13/086 , G10L13/06 , G10L13/047 , G06F40/279 , G06F40/263

摘要： A method for synthesizing speech from a textual input includes receiving the textual input, the textual input including native words in a native language and foreign words in a foreign language, and processing the textual input to determine a phonetic representation of the textual input. The processing includes determining a native phonetic representation of the of the native words, and determining a nativized phonetic representation of the foreign words. Determining the nativized phonetic representation includes forming a foreign phonetic representation of the foreign words using a foreign phoneme set, and mapping the foreign phonetic representation to the nativized phonetic representation according to a model of a native speaker's pronunciation of foreign words.

7.

发明公开
DYNAMIC LANGUAGE SELECTION OF AN AI VOICE ASSISTANCE SYSTEM 审中-公开

公开(公告)号：US20230162721A1

公开(公告)日：2023-05-25

申请号：US17530640

申请日：2021-11-19

申请人： International Business Machines Corporation

发明人： Tushar Agrawal , Jeremy R. Fox , Erez Lev Meir Bilgory , Sarbajit K. Rakshit

IPC分类号： G10L13/08 , G10L15/00 , G10L15/22

CPC分类号： G10L13/086 , G10L15/005 , G10L15/22 , G10L2015/223

摘要： The computer-implemented method provides for a digital virtual assistant (DVA) receiving input spoken in a first language by a user. The DVA determines a context of a current situation based on language and identity of individuals within a proximity of the DVA. The DVA determines whether the context of the current situation includes providing a response using a second language. In response to determining the context of the current situation calls for providing the response in the second language, the DVA determines the second language based on the context, and the DVA responds to the input spoken in the first language by the user, such that the response includes a dynamic selection of the second language and is based on an interaction context of the user and the DVA, and reference to a corpus of interaction context usage of the second language in a historically similar situation.

8.

发明申请
ANIMATION SYNTHESIS SYSTEM AND LIP ANIMATION SYNTHESIS METHOD 审中-公开

公开(公告)号：US20170345201A1

公开(公告)日：2017-11-30

申请号：US15603446

申请日：2017-05-24

申请人： ASUSTeK COMPUTER INC.

发明人： Wei-Ting LIN , Tsung-Yu HOU , Min-Che HUANG , Shih-Hao KE , Shu-Hui CHOU

IPC分类号： G06T13/20 , G06T13/40 , G06F3/16 , G10L21/055 , G10L13/08

CPC分类号： G10L13/086 , G06F3/167 , G06T13/00 , G10L13/08 , G10L21/10 , G10L2021/105

摘要： An animation display system is provided. The animation display system includes a display; a storage configured to store a language model database, a phonetic-symbol lip-motion matching database and a lip motion synthesis database; and a processor electronically connected to the storage and the display, respectively. The processor includes a speech conversion module, a phonetic-symbol lip-motion matching module, and a lip motion synthesis module. A lip animation display method is also provided.

9.

发明申请
Speech Recognition and Text-to-Speech Learning System 审中-公开

公开(公告)号：US20170287465A1

公开(公告)日：2017-10-05

申请号：US15087696

申请日：2016-03-31

申请人： Microsoft Technology Licensing, LLC

发明人： Pei Zhao , Kaisheng Yao , Max Leung , Bo Yan

IPC分类号： G10L13/10 , G10L15/06 , G10L13/08

CPC分类号： G10L13/10 , G10L13/08 , G10L13/086 , G10L15/063 , G10L15/07

摘要： An example text-to-speech learning system performs a method for generating a pronunciation sequence conversion model. The method includes generating a first pronunciation sequence from a speech input of a training pair and generating a second pronunciation sequence from a text input of the training pair. The method also includes determining a pronunciation sequence difference between the first pronunciation sequence and the second pronunciation sequence; and generating a pronunciation sequence conversion model based on the pronunciation sequence difference. An example speech recognition learning system performs a method for generating a pronunciation sequence conversion model. The method includes extracting an audio signal vector from a speech input and applying an audio signal conversion model to the audio signal vector to generate a converted audio signal vector. The method also includes adapting an acoustic model based on the converted audio signal vector to generate an adapted acoustic model.

10.

发明申请
PROCESS FOR IMPROVING PRONUNCIATION OF PROPER NOUNS FOREIGN TO A TARGET LANGUAGE TEXT-TO-SPEECH SYSTEM 有权
标题翻译：改进适用于目标语言文字到语音系统的正当声明的过程

公开(公告)号：US20160358596A1

公开(公告)日：2016-12-08

申请号：US14733289

申请日：2015-06-08

申请人： Nuance Communications, Inc.

发明人： Anurag Ratan Singh , Yifang Xu , Ivan A. Sanchez Quijas , Mahesh Godavarti

IPC分类号： G10L13/08 , G06F17/27 , G06F17/28 , G10L13/04

CPC分类号： G10L13/08 , G06F17/278 , G06F17/2872 , G06F17/2881 , G06F17/289 , G10L13/043 , G10L13/086

摘要： A system and method configured for use in a text-to-speech (TTS) system is provided. Embodiments may include identifying, using one or more processors, a word or phrase as a named entity and identifying a language of origin associated with the named entity. Embodiments may further include transliterating the named entity to a script associated with the language of origin. If the TTS system is operating in the language of origin, embodiments may include passing the transliterated script to the TTS system. If the TTS system is not operating in the language of origin, embodiments may include generating a phoneme sequence in the language of origin using a grapheme to phoneme (G2P) converter.

摘要翻译： 提供了一种配置为在文本到语音（TTS）系统中使用的系统和方法。实施例可以包括使用一个或多个处理器将单词或短语识别为命名实体并且识别与被命名实体相关联的原始语言。实施例还可以包括将命名实体音译为与原始语言相关联的脚本。如果TTS系统以起源语言运行，则实施例可以包括将音译脚本传递给TTS系统。如果TTS系统不以起源语言运行，则实施例可以包括使用字母到音素（G2P）转换器来产生原语语言中的音素序列。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类