-
公开(公告)号:US20240395242A1
公开(公告)日:2024-11-28
申请号:US18689668
申请日:2021-11-10
Inventor: Xin Fang , Junhua Liu
Abstract: Provided in the present application are a speech recognition method, apparatus and device, and a storage medium. The method comprises: acquiring a speech feature of target mixed speech and a speaker feature of a specified speaker; taking the direction of tending to a target speech feature as an extraction direction, and according to the speech feature of the target mixed speech and a speaker feature of a target speaker, extracting a speech feature of the target speaker from the speech feature of the target mixed speech, so as to obtain an extracted speech feature of the target speaker and acquiring a speech recognition result of the specified speaker according to an extracted speech feature of the specified speaker.
-
公开(公告)号:US20160189730A1
公开(公告)日:2016-06-30
申请号:US14585582
申请日:2014-12-30
Inventor: Jun DU , Yong XU , Yanhui TU , Li-Rong DAI , Zhiguo WANG , Yu HU , Qingfeng LIU
IPC: G10L21/0272
CPC classification number: G10L21/0272
Abstract: An example of the present invention discloses a speech separation method and a system, the method comprises: receiving a mixture speech signal to be separated; extracting a speech feature of the mixture speech signal; inputting the extracted speech feature of the mixture speech signal into a regression model for speech separation, obtaining an estimated speech feature of a target speech signal; synthesizing to obtain the target speech signal according to the estimated speech feature. Speech separation effect can be improved effectively using the present invention.
Abstract translation: 本发明的一个实例公开了一种语音分离方法和系统,该方法包括:接收要分离的混合语音信号; 提取混合语音信号的语音特征; 将所提取的混合语音信号的语音特征输入到用于语音分离的回归模型中,获得目标语音信号的估计语音特征; 合成以根据估计的语音特征获得目标语音信号。 使用本发明可以有效地提高语音分离效果。
-
公开(公告)号:US20230035947A1
公开(公告)日:2023-02-02
申请号:US17780776
申请日:2020-12-14
Applicant: IFLYTEK CO., LTD.
Inventor: Genshun WAN , Jianqing GAO , Zhiguo WANG
IPC: G10L15/183 , G06F40/289 , G10L15/22 , G06F40/247 , G10L15/04
Abstract: A speech recognition method and related products are provided. The method includes acquiring text contents and text-associated time information transmitted by a plurality of terminals in a preset scenario and determining a shared text for the preset scenario based on the text contents and the text-associated time information, obtaining a customized language model for the preset scenario based on the shared text, and performing speech recognition for the preset scenario with the customized language model. The method provides improved speech recognition for the preset scenario due to the correlation between the customized language model and the preset scenario.
-
公开(公告)号:US20220375459A1
公开(公告)日:2022-11-24
申请号:US17761217
申请日:2019-12-12
Applicant: IFLYTEK CO., LTD.
Inventor: Jianqing GAO , Zhiguo WANG , Guoping HU
IPC: G10L15/08 , G10L15/183 , G10L15/22
Abstract: A method for constructing a decoding network, a speech recognition method, a device, an apparatus, and a storage medium are provided. The method for constructing a decoding network includes: acquiring a general language model, a domain language model, and a general decoding network generated based on the general language model; generating a domain decoding network based on the domain language model and the general language model; and integrating the domain decoding network with the general decoding network to obtain a target decoding network. The speech recognition method includes: decoding to-be-recognized speech data by using a target decoding network to obtain a decoding path for the to-be-recognized speech data; and determining a speech recognition result for the to-be-recognized speech data based on the decoding path for the to-be-recognized speech data.
-
公开(公告)号:US20210051404A1
公开(公告)日:2021-02-18
申请号:US16756967
申请日:2018-07-16
Applicant: IFLYTEK CO., LTD.
Inventor: Mingzi LI , Feng MA , Haikun WANG , Zhiguo WANG , Guoping HU
IPC: H04R3/04 , H04R29/00 , G10L21/0232
Abstract: An echo cancellation method based on delay estimation is provided. In the method, a microphone signal and a reference signal are received and preprocessed. In the preprocessed microphone signal and the preprocessed reference signal, frequency point signals with non-linearity in a current echo cancellation scenario are determined. A current delay estimation value is calculated based on frequency point signals without non-linearity in the microphone signal and the reference signal. The reference signal is shifted based on the current delay estimation value. An adaptive filter is updated based on the preprocessed microphone signal and the shifted reference signal, to perform echo cancellation.
-
公开(公告)号:US20200342887A1
公开(公告)日:2020-10-29
申请号:US16757905
申请日:2018-07-16
Applicant: IFLYTEK CO., LTD.
Inventor: Dongyang XU , Haikun WANG , Zhiguo WANG , Guoping HU
IPC: G10L21/0216 , H04R3/00 , G10L25/78
Abstract: A microphone array-based target voice acquisition method and device, said method comprising: receiving voice signals acquired on the basis of a microphone array (101); determining a pre-selected target voice signal and a direction thereof (102); performing strong directional gain and weak directional gain on the pre-selected target voice signal, so as to obtain a strong gain signal and a weak gain signal (103); performing an endpoint detection on the basis of the strong gain signal, so as to obtain an endpoint detection result (104); and performing endpoint processing on the weak gain signal according to the endpoint detection result, so as to obtain a final target voice signal (105). The present invention can obtain an accurate and reliable target voice signal, thereby avoiding an adverse effect of the target voice quality on subsequent target voice processing.
-
公开(公告)号:US20250061280A1
公开(公告)日:2025-02-20
申请号:US18724188
申请日:2022-11-21
Applicant: IFLYTEK CO., LTD.
Inventor: Li YAN , Ting QI , Jianqing GAO , Jingting SUN
Abstract: A minutes determining method is provided. The method includes acquiring a to-be-used user record and a to-be-used record text; performing sentence segmentation processing on the to-be-used record text, to obtain at least one to-be-used sentence; performing semantic matching processing on the to-be-used user record and the at least one to-be-used sentence, to obtain a to-be-used semantic matching result; and determining to-be-used minutes content according to the to-be-used user record and the to-be-used semantic matching result.
-
公开(公告)号:US20230186912A1
公开(公告)日:2023-06-15
申请号:US17925483
申请日:2020-12-02
Applicant: IFLYTEK CO., LTD.
Inventor: Shifu XIONG , Cong LIU , Si WEI , Qingfeng LIU , Jianqing GAO , Jia PAN
IPC: G10L15/22 , G10L15/02 , G10L15/197
CPC classification number: G10L15/22 , G10L15/02 , G10L15/197 , G10L2015/088
Abstract: A speech recognition method and related products are provided. The method includes acquiring a to-be-recognized speech and a configured hot word library; determining, based on the to-be-recognized speech and the hot word library, an audio-related feature used at a current decoding time instant; determining, based on the audio-related feature, a hot word-related feature used at the current decoding time instant from the hot word library; and determining, based on the audio-related feature and the hot word-related feature, a recognition result of the to-be-recognized speech at the current decoding time instant.
-
公开(公告)号:US11651578B2
公开(公告)日:2023-05-16
申请号:US16329368
申请日:2017-01-11
Applicant: IFLYTEK CO., LTD.
Inventor: Jia Pan , Shiliang Zhang , Shifu Xiong , Si Wei , Guoping Hu
Abstract: A method and a system for end-to-end modeling are provided. The method includes: determining a topological structure of a target-based end-to-end model, where the topological structure includes an input layer, an encoding layer, an code enhancement layer, a filtering layer, a decoding layer and an output layer; the code enhancement layer adds information of a target unit to a feature sequence outputted by the encoding layer, the filtering layer filters a feature sequence added with the information of the target unit; collecting multiple pieces of training data; and training parameters of the target-based end-to-end model by using the multiple pieces of the training data.
-
公开(公告)号:US11508366B2
公开(公告)日:2022-11-22
申请号:US16647284
申请日:2018-06-15
Applicant: IFLYTEK CO., LTD.
Inventor: Jia Pan , Cong Liu , Haikun Wang , Zhiguo Wang , Guoping Hu
Abstract: A method, an apparatus and a device for converting a whispered speech, and a readable storage medium are provided. The method is implemented based on the whispered speech converting model. The whispered speech converting model is trained in advance by using recognition results and whispered speech training acoustic features of whispered speech training data as samples and using normal speech acoustic features of normal speech data parallel to the whispered speech training data as sample labels. A whispered speech acoustic feature and a preliminary recognition result of whispered speech data are acquired, then the whispered speech acoustic feature and the preliminary recognition result are inputted into a preset whispered speech converting model to acquire a normal speech acoustic feature outputted by the model. In this way, the whispered speech can be converted to a normal speech.
-
-
-
-
-
-
-
-
-