-
公开(公告)号:US12183324B2
公开(公告)日:2024-12-31
申请号:US17738651
申请日:2022-05-06
Inventor: Xiaoyin Fu , Zhijie Chen , Mingxin Liang , Mingshun Yang , Lei Jia , Haifeng Wang
IPC: G10L15/02 , G06F16/683 , G10L15/187 , G10L15/26
Abstract: The present disclosure provides speech recognition and codec methods and apparatuses, an electronic device and a storage medium, and relates to the field of artificial intelligence such as intelligent speech, deep learning and natural language processing. The speech recognition method may include: acquiring an audio feature of to-be-recognized speech; encoding the audio feature to obtain an encoding feature; truncating the encoding feature to obtain continuous N feature fragments, N being a positive integer greater than one; and acquiring, for any one of the feature segments, corresponding historical feature abstraction information, encoding the feature segment in combination with the historical feature abstraction information, and decoding an encoding result to obtain a recognition result corresponding to the feature segment, wherein the historical feature abstraction information is information obtained by feature abstraction of recognized historical feature fragments.
-
公开(公告)号:US12158839B2
公开(公告)日:2024-12-03
申请号:US17454900
申请日:2021-11-15
Abstract: The disclosure provides a method and an apparatus for allocating memory, and an electronic device. Multiple frames of speech data are received and input to a neural network model. The neural network model is configured to ask for multiple data tensors when processing the multiple frames of speech data, and the multiple data tensors share a common memory.
-
公开(公告)号:US11996084B2
公开(公告)日:2024-05-28
申请号:US17738186
申请日:2022-05-06
Inventor: Liqiang Zhang , Jiankang Hou , Tao Sun , Lei Jia
IPC: G10L13/02 , G06F40/20 , G10L13/04 , G10L13/047 , G10L13/10
CPC classification number: G10L13/10 , G06F40/20 , G10L13/047
Abstract: The present disclosure discloses a speech synthesis method and apparatus, a device and a computer storage medium, and relates to speech and deep learning technologies in the field of artificial intelligence technologies. A specific implementation solution involves: acquiring to-be-synthesized text; acquiring a prosody feature extracted from the text; inputting the text and the prosody feature into a speech synthesis model to obtain a vocoder feature; and inputting the vocoder feature into a vocoder to obtain synthesized speech.
-
公开(公告)号:US12073822B2
公开(公告)日:2024-08-27
申请号:US18086004
申请日:2022-12-21
Inventor: Xinyong Zhou , Junteng Zhang , Tao Sun , Lei Jia
CPC classification number: G10L13/10 , G06F40/30 , G10L13/06 , G10L25/18 , G10L13/047 , G10L2013/105
Abstract: A voice generating method and apparatus, an electronic device and a storage medium. The specific implementation solution includes: acquiring a text to be processed, and determining an associated text of the text to be processed; acquiring an associated prosodic feature of the associated text; determining an associated text feature of the associated text based on the text to be processed; determining a spectrum feature to be processed of the text to be processed based on the associated prosodic feature and the associated text feature; and generating a target voice corresponding to the text to be processed based on the spectrum feature to be processed.
-
公开(公告)号:US12062357B2
公开(公告)日:2024-08-13
申请号:US17455156
申请日:2021-11-16
Inventor: Wenfu Wang , Xilei Wang , Tao Sun , Han Yuan , Zhengkun Gao , Lei Jia
IPC: G10L13/02
CPC classification number: G10L13/02
Abstract: A method of registering an attribute in a speech synthesis model, an apparatus of registering an attribute in a speech synthesis model, an electronic device, and a medium are provided, which relate to a field of an artificial intelligence technology such as a deep learning and intelligent speech technology. The method includes: acquiring a plurality of data associated with an attribute to be registered; and registering the attribute in the speech synthesis model by using the plurality of data associated with the attribute, wherein the speech synthesis model is trained in advance by using a training data in a training data set.
-
公开(公告)号:US11984134B2
公开(公告)日:2024-05-14
申请号:US18071187
申请日:2022-11-29
Inventor: Jiankang Hou , Zhipeng Nie , Liqiang Zhang , Tao Sun , Lei Jia
Abstract: A method of processing audio data, an electronic device, and a storage medium, which relates to a field of artificial intelligence, in particular to a field of speech processing technology. The method includes: processing spectral data of the audio data to obtain a first feature information; obtaining a fundamental frequency indication information according to the first feature information, wherein the fundamental frequency indication information indicates valid audio data of the first feature information and invalid audio data of the first feature information; obtaining a fundamental frequency information and a spectral energy information according to the first feature information and the fundamental frequency indication information; and obtaining a harmonic structure information of the audio data according to the fundamental frequency information and the spectral energy information.
-
公开(公告)号:US11861498B2
公开(公告)日:2024-01-02
申请号:US17968688
申请日:2022-10-18
Inventor: Guibin Wang , Shijun Cong , Hao Dong , Lei Jia
Abstract: A method for compressing a neural network model includes acquiring a to-be-compressed neural network model. A first bit width, a second bit width and a target thinning rate corresponding to the to-be-compressed neural network model are determined. A target value is obtained according to the first bit width, the second bit width and the target thinning rate. Then the to-be-compressed neural network model is compressed using the target value, the first bit width and the second bit width to obtain a compression result of the to-be-compressed neural network model.
-
公开(公告)号:US11769482B2
公开(公告)日:2023-09-26
申请号:US17489616
申请日:2021-09-29
Inventor: Wenfu Wang , Tao Sun , Xilei Wang , Junteng Zhang , Zhengkun Gao , Lei Jia
Abstract: The present disclosure provides a method and apparatus of synthesizing a speech, a method and apparatus of training a speech synthesis model, an electronic device, and a storage medium. The method of synthesizing a speech includes acquiring a style information of a speech to be synthesized, a tone information of the speech to be synthesized, and a content information of a text to be processed; generating an acoustic feature information of the text to be processed, by using a pre-trained speech synthesis model, based on the style information, the tone information, and the content information of the text to be processed; and synthesizing the speech for the text to be processed, based on the acoustic feature information of the text to be processed.
-
公开(公告)号:US20220147441A1
公开(公告)日:2022-05-12
申请号:US17454900
申请日:2021-11-15
Abstract: The disclosure provides a method and an apparatus for allocating memory, and an electronic device. Multiple frames of speech data are received and input to a neural network model. The neural network model is configured to ask for multiple data tensors when processing the multiple frames of speech data, and the multiple data tensors share a common memory.
-
公开(公告)号:US20230206943A1
公开(公告)日:2023-06-29
申请号:US17891596
申请日:2022-08-19
Inventor: Wenjie Li , Zhanjie Gao , Lei Jia
Abstract: An audio recognizing method, including: performing acoustic feature prediction on the audio to be recognized to obtain first audio prediction result and an acoustic feature reference quantity for predicting an audio recognition result; obtaining second audio prediction result based on the acoustic feature reference quantity; and determining the audio recognition result of the audio to be recognized based on the first audio prediction result and the second audio prediction result, the audio recognition result including unvoiced sound or voiced sound. When determining that the audio is unvoiced sound or voiced sound, the first audio prediction result obtained by performing acoustic feature prediction on the audio to be recognized is used, and the second audio prediction result is obtained in combination with other acoustic feature reference quantities, thereby making the determination result of unvoiced sound or voiced sound of the audio more accurate, to improve the audio quality in speech processing.
-
-
-
-
-
-
-
-
-