-
公开(公告)号:US20240233706A1
公开(公告)日:2024-07-11
申请号:US18562962
申请日:2022-05-23
发明人: Xu TAN , Tao Qin , Sheng Zhao , Tie-Yan Liu
IPC分类号: G10L13/10 , G10L13/047 , G10L13/06
CPC分类号: G10L13/10 , G10L13/047 , G10L13/06 , G10L2013/105
摘要: According to implementations of the subject matter described herein, a solution is proposed for text to speech. In this solution, an initial phoneme sequence corresponding to text is generated, the initial phoneme sequence comprising feature representations of a plurality of phonemes. A first phoneme sequence is generated by inserting a feature representation of an additional phoneme into the initial phoneme sequence, the additional phoneme being related to a characteristic of spontaneous speech. The duration of a phoneme among the plurality of phonemes and the additional phoneme is determined by using an expert model corresponding to the phoneme, and a second phoneme sequence is generated based on the first phoneme sequence. Spontaneous-style speech corresponding to the text is determined based on the second phoneme sequence. In this way, spontaneous-style speech with more varying rhythms can be generated based on spontaneous-style additional phonemes and multiple expert models.
-
公开(公告)号:US11978431B1
公开(公告)日:2024-05-07
申请号:US17326886
申请日:2021-05-21
发明人: Arnaud Joly , Simon Slangen , Alexis Pierre Moinet , Thomas Renaud Drugman , Panagiota Karanasou , Syed Ammar Abbas , Sri Vishnu Kumar Karlapati
IPC分类号: G10L13/027 , G10L13/06 , G10L13/07 , G10L13/08 , G10L15/32
CPC分类号: G10L13/027 , G10L13/06 , G10L13/07 , G10L13/08 , G10L15/32
摘要: A speech-processing system receives input data representing text. One or more encoders trained to predict audio properties corresponding to the text process the text to predict those properties. A speech decoder processes phoneme embeddings as well as the predicted properties to create data representing synthesized speech.
-
公开(公告)号:US11942072B2
公开(公告)日:2024-03-26
申请号:US17439197
申请日:2021-02-03
申请人: Sang Rae Park
发明人: Sang Rae Park
CPC分类号: G10L13/10 , G10L13/033 , G10L13/06 , G10L15/22 , G10L15/26 , G10L19/0018
摘要: Disclosed is a wireless communication device including a voice recognition portion configured to convert a voice signal input through a microphone into a syllable information stream using voice recognition, an encoding portion configured to encode the syllable information stream to generate digital transmission data, a transmission portion configured to modulate from the digital transmission data to a transmission signal and transmit the transmission signal through an antenna, a reception portion configured to demodulate from a reception signal received through the antenna to a digital reception data and output the digital reception data, a decoding portion configured to decode the digital reception data to generate the syllable information stream and a voice synthesis portion configured to convert the syllable information stream into the voice signal using voice synthesis and output the voice signal through a speaker.
-
公开(公告)号:US20240071343A1
公开(公告)日:2024-02-29
申请号:US18272175
申请日:2022-01-13
申请人: RIFFIT INC
CPC分类号: G10H1/0025 , G10L13/06 , G10L13/10 , G10H2210/056 , G10H2210/111 , G10H2250/455 , G10L2013/105
摘要: Described herein are musical translation devices and methods of use thereof. Exemplary uses of musical translation devices include optimizing the understanding and/or recall of an input message for a user and improving a cognitive process in a user.
-
公开(公告)号:US11676571B2
公开(公告)日:2023-06-13
申请号:US17154372
申请日:2021-01-21
发明人: Kyungguen Byun , Sunkuk Moon , Shuhua Zhang , Vahid Montazeri , Lae-Hoon Kim , Erik Visser
IPC分类号: G10L13/10 , G10L13/06 , G10L15/22 , G10L13/00 , G10L13/047 , G10L13/033 , G10L19/02 , G10L25/63 , G06N3/045 , G10L21/013
CPC分类号: G10L13/047 , G06N3/045 , G10L13/033 , G10L19/02 , G10L25/63 , G10L2021/0135
摘要: A device for speech generation includes one or more processors configured to receive one or more control parameters indicating target speech characteristics. The one or more processors are also configured to process, using a multi-encoder, an input representation of speech based on the one or more control parameters to generate encoded data corresponding to an audio signal that represents a version of the speech based on the target speech characteristics.
-
公开(公告)号:US20230005468A1
公开(公告)日:2023-01-05
申请号:US17779518
申请日:2019-11-26
发明人: Mizuki NAGANO , Yusuke IJIMA , Nozomi KOBAYASHI
IPC分类号: G10L13/10 , G06F40/268 , G10L13/047 , G10L13/06
摘要: A pause estimation model learning apparatus includes: a morphological analysis unit configured to perform morphological analysis on training text data to provide M types of information, M being an integer that is equal to or larger than 2; a feature selection unit configured to combine N pieces of information, among the M pieces of information, to be an input feature when a predetermined certain condition is satisfied, and select predetermined one of the N pieces of information to be the input feature when the certain condition is not satisfied, N being an integer that is equal to or larger than 2 and equal to or smaller than M; and a learning unit configured to learn a pause estimation model by using the input feature selected by the feature selection unit and a pause correct label.
-
公开(公告)号:US11488577B2
公开(公告)日:2022-11-01
申请号:US16907006
申请日:2020-06-19
发明人: Zhipeng Chen , Jinfeng Bai , Lei Jia
IPC分类号: G10L13/047 , G06N3/08 , G10L13/06 , G10L13/08
摘要: The present application discloses a training method and an apparatus for a speech synthesis model, electronic device, and storage medium. The method includes: taking a syllable input sequence, a phoneme input sequence and a Chinese character input sequence of a current sample as inputs of an encoder of a model to be trained, to obtain encoded representations of these three sequences at an output end of the encoder; fusing the encoded representations of these three sequences, to obtain a weighted combination of these three sequences; taking the weighted combination as an input of an attention module, to obtain a weighted average of the weighted combination at each moment at an output end of the attention module; taking the weighted average as an input of a decoder of the model to be trained, to obtain a speech Mel spectrum of the current sample at an output end of the decoder.
-
公开(公告)号:US20220277728A1
公开(公告)日:2022-09-01
申请号:US17631695
申请日:2020-06-17
发明人: Shaofei Zhang , Lei He
IPC分类号: G10L13/08 , G10L13/047 , G10L13/06 , G10L25/30
摘要: The present disclosure provides a method and apparatus for generating speech through neural text-to-speech (TTS) synthesis. A text input may be obtained. A phone feature of the text input may be generated. Context features of the text input may be generated based on a set of sentences associated with the text input. A speech waveform corresponding to the text input may be generated based on the phone feature and the context features.
-
公开(公告)号:US11348569B2
公开(公告)日:2022-05-31
申请号:US16841839
申请日:2020-04-07
发明人: Masatsune Tamura , Masahiro Morita
IPC分类号: G10L25/18 , G10L13/06 , G10L13/047
摘要: A speech processing device includes a hardware processor configured to receive input speech and extract speech frames from the input speech. The hardware processor is configured to calculate a spectrum parameter for each of the speech frames, calculate a first phase spectrum for each of the speech frames, calculate a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum, calculate a band group delay parameter in a predetermined frequency band from the group delay spectrum, and calculate a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum. The hardware processor is configured to generate a speech waveform based on the spectrum parameter, the band group delay parameter, and the band group delay compensation parameter.
-
公开(公告)号:US11302301B2
公开(公告)日:2022-04-12
申请号:US16807801
申请日:2020-03-03
申请人: TENCENT AMERICA LLC
发明人: Chengzhu Yu , Dong Yu
IPC分类号: G10L13/07 , G10L13/06 , G10L13/10 , G10L13/033 , G10L25/18 , G10L13/047 , G10L25/24
摘要: A method, computer program, and computer system is provided for synthesizing speech at one or more speeds. A context associated with one or more phonemes corresponding to a speaking voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a voice sample corresponding to the speaking voice is synthesized using the generated mel-spectrogram features.
-
-
-
-
-
-
-
-
-