专利检索 ipc:G10L13/06 第 1 页

1.

发明公开
TEXT-BASED SPEECH GENERATION 审中-公开

公开(公告)号：US20240233706A1

公开(公告)日：2024-07-11

申请号：US18562962

申请日：2022-05-23

申请人： Microsoft Technology Licensing, LLC

发明人： Xu TAN , Tao Qin , Sheng Zhao , Tie-Yan Liu

IPC分类号： G10L13/10 , G10L13/047 , G10L13/06

CPC分类号： G10L13/10 , G10L13/047 , G10L13/06 , G10L2013/105

摘要： According to implementations of the subject matter described herein, a solution is proposed for text to speech. In this solution, an initial phoneme sequence corresponding to text is generated, the initial phoneme sequence comprising feature representations of a plurality of phonemes. A first phoneme sequence is generated by inserting a feature representation of an additional phoneme into the initial phoneme sequence, the additional phoneme being related to a characteristic of spontaneous speech. The duration of a phoneme among the plurality of phonemes and the additional phoneme is determined by using an expert model corresponding to the phoneme, and a second phoneme sequence is generated based on the first phoneme sequence. Spontaneous-style speech corresponding to the text is determined based on the second phoneme sequence. In this way, spontaneous-style speech with more varying rhythms can be generated based on spontaneous-style additional phonemes and multiple expert models.

2.

发明授权
Synthetic speech processing by representing text by phonemes exhibiting predicted volume and pitch using neural networks 有权

公开(公告)号：US11978431B1

公开(公告)日：2024-05-07

申请号：US17326886

申请日：2021-05-21

申请人： Amazon Technologies, Inc.

发明人： Arnaud Joly , Simon Slangen , Alexis Pierre Moinet , Thomas Renaud Drugman , Panagiota Karanasou , Syed Ammar Abbas , Sri Vishnu Kumar Karlapati

IPC分类号： G10L13/027 , G10L13/06 , G10L13/07 , G10L13/08 , G10L15/32

CPC分类号： G10L13/027 , G10L13/06 , G10L13/07 , G10L13/08 , G10L15/32

摘要： A speech-processing system receives input data representing text. One or more encoders trained to predict audio properties corresponding to the text process the text to predict those properties. A speech decoder processes phoneme embeddings as well as the predicted properties to create data representing synthesized speech.

3.

发明授权
Wireless communication device using voice recognition and voice synthesis 有权

公开(公告)号：US11942072B2

公开(公告)日：2024-03-26

申请号：US17439197

申请日：2021-02-03

申请人： Sang Rae Park

发明人： Sang Rae Park

IPC分类号： G10L13/10 , G10L13/033 , G10L13/06 , G10L15/22 , G10L15/26 , G10L19/00

CPC分类号： G10L13/10 , G10L13/033 , G10L13/06 , G10L15/22 , G10L15/26 , G10L19/0018

摘要： Disclosed is a wireless communication device including a voice recognition portion configured to convert a voice signal input through a microphone into a syllable information stream using voice recognition, an encoding portion configured to encode the syllable information stream to generate digital transmission data, a transmission portion configured to modulate from the digital transmission data to a transmission signal and transmit the transmission signal through an antenna, a reception portion configured to demodulate from a reception signal received through the antenna to a digital reception data and output the digital reception data, a decoding portion configured to decode the digital reception data to generate the syllable information stream and a voice synthesis portion configured to convert the syllable information stream into the voice signal using voice synthesis and output the voice signal through a speaker.

4.

发明公开
SYSTEMS AND METHODS FOR TRANSPOSING SPOKEN OR TEXTUAL INPUT TO MUSIC 审中-公开

公开(公告)号：US20240071343A1

公开(公告)日：2024-02-29

申请号：US18272175

申请日：2022-01-13

申请人： RIFFIT INC

发明人： Leonardus H.T. Van Der Ploeg , Deepak Savadatti

IPC分类号： G10H1/00 , G10L13/06 , G10L13/10

CPC分类号： G10H1/0025 , G10L13/06 , G10L13/10 , G10H2210/056 , G10H2210/111 , G10H2250/455 , G10L2013/105

摘要： Described herein are musical translation devices and methods of use thereof. Exemplary uses of musical translation devices include optimizing the understanding and/or recall of an input message for a user and improving a cognitive process in a user.

5.

发明授权
Synthesized speech generation 有权

公开(公告)号：US11676571B2

公开(公告)日：2023-06-13

申请号：US17154372

申请日：2021-01-21

申请人： QUALCOMM Incorporated

发明人： Kyungguen Byun , Sunkuk Moon , Shuhua Zhang , Vahid Montazeri , Lae-Hoon Kim , Erik Visser

IPC分类号： G10L13/10 , G10L13/06 , G10L15/22 , G10L13/00 , G10L13/047 , G10L13/033 , G10L19/02 , G10L25/63 , G06N3/045 , G10L21/013

CPC分类号： G10L13/047 , G06N3/045 , G10L13/033 , G10L19/02 , G10L25/63 , G10L2021/0135

摘要： A device for speech generation includes one or more processors configured to receive one or more control parameters indicating target speech characteristics. The one or more processors are also configured to process, using a multi-encoder, an input representation of speech based on the one or more control parameters to generate encoded data corresponding to an audio signal that represents a version of the speech based on the target speech characteristics.

6.

发明申请
POSE ESTIMATION MODEL LEARNING APPARATUS, POSE ESTIMATION APPARATUS, METHODS AND PROGRAMS FOR THE SAME 有权

公开(公告)号：US20230005468A1

公开(公告)日：2023-01-05

申请号：US17779518

申请日：2019-11-26

申请人： NIPPON TELEGRAPH AND TELEPHONE CORPORATION

发明人： Mizuki NAGANO , Yusuke IJIMA , Nozomi KOBAYASHI

IPC分类号： G10L13/10 , G06F40/268 , G10L13/047 , G10L13/06

摘要： A pause estimation model learning apparatus includes: a morphological analysis unit configured to perform morphological analysis on training text data to provide M types of information, M being an integer that is equal to or larger than 2; a feature selection unit configured to combine N pieces of information, among the M pieces of information, to be an input feature when a predetermined certain condition is satisfied, and select predetermined one of the N pieces of information to be the input feature when the certain condition is not satisfied, N being an integer that is equal to or larger than 2 and equal to or smaller than M; and a learning unit configured to learn a pause estimation model by using the input feature selected by the feature selection unit and a pause correct label.

7.

发明授权
Training method and apparatus for a speech synthesis model, and storage medium 有权

公开(公告)号：US11488577B2

公开(公告)日：2022-11-01

申请号：US16907006

申请日：2020-06-19

申请人： BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

发明人： Zhipeng Chen , Jinfeng Bai , Lei Jia

IPC分类号： G10L13/047 , G06N3/08 , G10L13/06 , G10L13/08

摘要： The present application discloses a training method and an apparatus for a speech synthesis model, electronic device, and storage medium. The method includes: taking a syllable input sequence, a phoneme input sequence and a Chinese character input sequence of a current sample as inputs of an encoder of a model to be trained, to obtain encoded representations of these three sequences at an output end of the encoder; fusing the encoded representations of these three sequences, to obtain a weighted combination of these three sequences; taking the weighted combination as an input of an attention module, to obtain a weighted average of the weighted combination at each moment at an output end of the attention module; taking the weighted average as an input of a decoder of the model to be trained, to obtain a speech Mel spectrum of the current sample at an output end of the decoder.

8.

发明申请
Paragraph synthesis with cross utterance features for neural TTS 有权

公开(公告)号：US20220277728A1

公开(公告)日：2022-09-01

申请号：US17631695

申请日：2020-06-17

申请人： Microsoft Technology Licensing, LLC

发明人： Shaofei Zhang , Lei He

IPC分类号： G10L13/08 , G10L13/047 , G10L13/06 , G10L25/30

摘要： The present disclosure provides a method and apparatus for generating speech through neural text-to-speech (TTS) synthesis. A text input may be obtained. A phone feature of the text input may be generated. Context features of the text input may be generated based on a set of sentences associated with the text input. A speech waveform corresponding to the text input may be generated based on the phone feature and the context features.

9.

发明授权
Speech processing device, speech processing method, and computer program product using compensation parameters 有权

公开(公告)号：US11348569B2

公开(公告)日：2022-05-31

申请号：US16841839

申请日：2020-04-07

申请人： KABUSHIKI KAISHA TOSHIBA

发明人： Masatsune Tamura , Masahiro Morita

IPC分类号： G10L25/18 , G10L13/06 , G10L13/047

摘要： A speech processing device includes a hardware processor configured to receive input speech and extract speech frames from the input speech. The hardware processor is configured to calculate a spectrum parameter for each of the speech frames, calculate a first phase spectrum for each of the speech frames, calculate a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum, calculate a band group delay parameter in a predetermined frequency band from the group delay spectrum, and calculate a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum. The hardware processor is configured to generate a speech waveform based on the spectrum parameter, the band group delay parameter, and the band group delay compensation parameter.

10.

发明授权
Learnable speed control for speech synthesis 有权

公开(公告)号：US11302301B2

公开(公告)日：2022-04-12

申请号：US16807801

申请日：2020-03-03

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Dong Yu

IPC分类号： G10L13/07 , G10L13/06 , G10L13/10 , G10L13/033 , G10L25/18 , G10L13/047 , G10L25/24

摘要： A method, computer program, and computer system is provided for synthesizing speech at one or more speeds. A context associated with one or more phonemes corresponding to a speaking voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a voice sample corresponding to the speaking voice is synthesized using the generated mel-spectrogram features.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类