-
公开(公告)号:US20240420684A1
公开(公告)日:2024-12-19
申请号:US18706313
申请日:2023-01-17
Inventor: Saisai ZOU , Lei JIA , Haifeng WANG
Abstract: A speech wake-up method, an electronic device, and a storage medium are provided. The method includes: performing a word recognition on a speech to be recognized to obtain a wake-up word recognition result (S210); performing a syllable recognition on the speech to be recognized to obtain a wake-up syllable recognition result, in response to determining that the wake-up word recognition result represents that the speech to be recognized contains a predetermined wake-up word (S220); and determining that the speech to be recognized is a correct wake-up speech, in response to determining that the wake-up syllable recognition result represents that the speech to be recognized contains a predetermined syllable (S230).
-
12.
公开(公告)号:US20230410794A1
公开(公告)日:2023-12-21
申请号:US18237976
申请日:2023-08-25
Inventor: Xiaoyin FU , Mingshun YANG , Qiguang ZANG , Zhijie CHEN , Yangkai XU , Guibin WANG , Lei JIA
CPC classification number: G10L15/063 , G10L15/26 , G10L15/02
Abstract: An audio recognition method, a method of training an audio recognition model, and an electronic device are provided, which relate to fields of artificial intelligence, speech recognition, deep learning and natural language processing technologies. The audio recognition method includes: truncating an audio feature of target audio data to obtain at least one first audio sequence feature corresponding to a predetermined duration; obtaining, according to a peak information of the audio feature, a peak sub-information corresponding to the first audio sequence feature; performing at least one decoding operation on the first audio sequence feature to obtain a recognition result for the first audio sequence feature, a number of times the decoding operation is performed being identical to a number of peaks corresponding to the first audio sequence feature; obtaining target text data for the target audio data according to the recognition result for the at least one first audio sequence feature.
-
13.
公开(公告)号:US20230360638A1
公开(公告)日:2023-11-09
申请号:US18221593
申请日:2023-07-13
Inventor: Saisai ZOU , Lei JIA , Haifeng WANG
CPC classification number: G10L15/02 , G10L15/14 , G10L2015/027
Abstract: A method of processing a speech information, a method of training a speech model, a speech wake-up method, an electronic device, and a storage medium are provided, which relate to a field of artificial intelligence technology, in particular to fields of human-computer interaction, deep learning and intelligent speech technologies. A specific implementation solution includes: performing a syllable recognition on a speech information to obtain a posterior probability sequence for the speech information, where the speech information includes a speech frame sequence, the posterior probability sequence corresponds to the speech frame sequence, and each posterior probability in the posterior probability sequence represents a similarity between a syllable in a speech frame matched with the posterior probability and a predetermined syllable; and determining a target peak speech frame from the speech frame sequence based on the posterior probability sequence.
-
14.
公开(公告)号:US20230317060A1
公开(公告)日:2023-10-05
申请号:US18328135
申请日:2023-06-02
Inventor: Saisai ZOU , Li CHEN , Ruoxi ZHANG , Lei JIA , Haifeng WANG
CPC classification number: G10L15/063 , G10L15/02
Abstract: The present disclosure provides a method and an apparatus for training a voice wake-up model, a method and an apparatus for voice wake-up, a device and a storage medium, which relates to the field of artificial intelligence and particularly to the field of deep learning and voice technology. A specific implementation lies in: acquiring voice recognition training data and voice wake-up training data that are created, and firstly performing training on a base model according to the voice recognition training data to obtain a model parameter of the base model when a model loss function converges; then updating, based on a model configuration instruction, a configuration parameter of a decoding module in the base model to obtain a first model; and finally performing training on the first model according to the voice wake-up training data to obtain a trained voice wake-up model when the model loss function converges.
-
公开(公告)号:US20230178067A1
公开(公告)日:2023-06-08
申请号:US18074023
申请日:2022-12-02
Inventor: Wenfu WANG , Tao SUN , Xilei WANG , Lei JIA
IPC: G10L13/047 , G10L25/30
CPC classification number: G10L13/047 , G10L25/30
Abstract: A method of training a speech synthesis method, a method of synthesizing a speech, a device and a storage medium are provided, which relate to a field of artificial intelligence technology, in particular to a field of speech synthesis technology. The specific implementation scheme includes: processing training data by using the speech synthesis model, so as to determine a content encoding sequence, a style encoding sequence, a timbre encoding vector, a noise environment vector and a target Mel spectrum sequence corresponding to the training data; determine a total loss value according to the content encoding sequence, the style encoding sequence, the timbre encoding vector, the noise environment vector and the target Mel spectrum sequence; and adjusting a parameter of the speech synthesis model according to the total loss value.
-
公开(公告)号:US20230177326A1
公开(公告)日:2023-06-08
申请号:US17968688
申请日:2022-10-18
Inventor: Guibin WANG , Shijun CONG , Hao DONG , Lei JIA
CPC classification number: G06N3/08 , G06N3/0454
Abstract: A technical solution for compressing a neural network model which relates to the field of artificial intelligence technologies, such as deep learning technologies, cloud service technologies, is disclosed. The method for compressing a neural network model includes: acquiring a to-be-compressed neural network model; determining a first bit width, a second bit width and a target thinning rate corresponding to the to-be-compressed neural network model; obtaining a target value according to the first bit width, the second bit width and the target thinning rate; and compressing the to-be-compressed neural network model using the target value, the first bit width and the second bit width to obtain a compression result of the to-be-compressed neural network model.
-
公开(公告)号:US20220292337A1
公开(公告)日:2022-09-15
申请号:US17832303
申请日:2022-06-03
Inventor: Chao TIAN , Lei JIA , Xiaoping YAN , Junhui WEN , Guanglai DENG , Qiang LI
Abstract: A neural network processing method, a neural network processing unit (NPU) and a processing device are provided. The method includes: obtaining by a quantizing unit in the NPU float type input data, quantizing the float type input data to obtain quantized input data, and providing the quantized input data to an operation unit; performing by the operation unit of the NPU a matrix-vector operation and/or a convolution operation to the quantized input data to obtain an operation result of the quantized input data; and performing by the quantizing unit inverse quantization to the operation result output by the operation unit to obtain an inverse quantization result.
-
公开(公告)号:US20220108684A1
公开(公告)日:2022-04-07
申请号:US17644749
申请日:2021-12-16
Inventor: Xiaoyin FU , Mingxin LIANG , Zhijie CHEN , Qiguang ZANG , Zhengxiang JIANG , Liao ZHANG , Qi ZHANG , Lei JIA
IPC: G10L15/02 , G10L15/16 , G10L19/032
Abstract: The present disclosure provides a method of recognizing speech offline, electronic device, and a storage medium, relating to a field of artificial intelligence such as speech recognition, natural language processing, and deep learning. The method may include: decoding speech data to be recognized into a syllable recognition result; transforming the syllable recognition result into a corresponding text as a speech recognition result of the speech data.
-
-
-
-
-
-
-