SPEECH RECOGNITION METHOD, APPARATUS AND DEVICE, AND STORAGE MEDIUM

    公开(公告)号:US20240395242A1

    公开(公告)日:2024-11-28

    申请号:US18689668

    申请日:2021-11-10

    Inventor: Xin Fang Junhua Liu

    Abstract: Provided in the present application are a speech recognition method, apparatus and device, and a storage medium. The method comprises: acquiring a speech feature of target mixed speech and a speaker feature of a specified speaker; taking the direction of tending to a target speech feature as an extraction direction, and according to the speech feature of the target mixed speech and a speaker feature of a target speaker, extracting a speech feature of the target speaker from the speech feature of the target mixed speech, so as to obtain an extracted speech feature of the target speaker and acquiring a speech recognition result of the specified speaker according to an extracted speech feature of the specified speaker.

    SPEECH SEPARATION METHOD AND SYSTEM
    2.
    发明申请
    SPEECH SEPARATION METHOD AND SYSTEM 审中-公开
    语音分离方法与系统

    公开(公告)号:US20160189730A1

    公开(公告)日:2016-06-30

    申请号:US14585582

    申请日:2014-12-30

    CPC classification number: G10L21/0272

    Abstract: An example of the present invention discloses a speech separation method and a system, the method comprises: receiving a mixture speech signal to be separated; extracting a speech feature of the mixture speech signal; inputting the extracted speech feature of the mixture speech signal into a regression model for speech separation, obtaining an estimated speech feature of a target speech signal; synthesizing to obtain the target speech signal according to the estimated speech feature. Speech separation effect can be improved effectively using the present invention.

    Abstract translation: 本发明的一个实例公开了一种语音分离方法和系统,该方法包括:接收要分离的混合语音信号; 提取混合语音信号的语音特征; 将所提取的混合语音信号的语音特征输入到用于语音分离的回归模型中,获得目标语音信号的估计语音特征; 合成以根据估计的语音特征获得目标语音信号。 使用本发明可以有效地提高语音分离效果。

    VOICE RECOGNITION METHOD AND RELATED PRODUCT

    公开(公告)号:US20230035947A1

    公开(公告)日:2023-02-02

    申请号:US17780776

    申请日:2020-12-14

    Abstract: A speech recognition method and related products are provided. The method includes acquiring text contents and text-associated time information transmitted by a plurality of terminals in a preset scenario and determining a shared text for the preset scenario based on the text contents and the text-associated time information, obtaining a customized language model for the preset scenario based on the shared text, and performing speech recognition for the preset scenario with the customized language model. The method provides improved speech recognition for the preset scenario due to the correlation between the customized language model and the preset scenario.

    DECODING NETWORK CONSTRUCTION METHOD, VOICE RECOGNITION METHOD, DEVICE AND APPARATUS, AND STORAGE MEDIUM

    公开(公告)号:US20220375459A1

    公开(公告)日:2022-11-24

    申请号:US17761217

    申请日:2019-12-12

    Abstract: A method for constructing a decoding network, a speech recognition method, a device, an apparatus, and a storage medium are provided. The method for constructing a decoding network includes: acquiring a general language model, a domain language model, and a general decoding network generated based on the general language model; generating a domain decoding network based on the domain language model and the general language model; and integrating the domain decoding network with the general decoding network to obtain a target decoding network. The speech recognition method includes: decoding to-be-recognized speech data by using a target decoding network to obtain a decoding path for the to-be-recognized speech data; and determining a speech recognition result for the to-be-recognized speech data based on the decoding path for the to-be-recognized speech data.

    ECHO CANCELLATION METHOD AND APPARATUS BASED ON TIME DELAY ESTIMATION

    公开(公告)号:US20210051404A1

    公开(公告)日:2021-02-18

    申请号:US16756967

    申请日:2018-07-16

    Abstract: An echo cancellation method based on delay estimation is provided. In the method, a microphone signal and a reference signal are received and preprocessed. In the preprocessed microphone signal and the preprocessed reference signal, frequency point signals with non-linearity in a current echo cancellation scenario are determined. A current delay estimation value is calculated based on frequency point signals without non-linearity in the microphone signal and the reference signal. The reference signal is shifted based on the current delay estimation value. An adaptive filter is updated based on the preprocessed microphone signal and the shifted reference signal, to perform echo cancellation.

    MICROPHONE ARRAY-BASED TARGET VOICE ACQUISITION METHOD AND DEVICE

    公开(公告)号:US20200342887A1

    公开(公告)日:2020-10-29

    申请号:US16757905

    申请日:2018-07-16

    Abstract: A microphone array-based target voice acquisition method and device, said method comprising: receiving voice signals acquired on the basis of a microphone array (101); determining a pre-selected target voice signal and a direction thereof (102); performing strong directional gain and weak directional gain on the pre-selected target voice signal, so as to obtain a strong gain signal and a weak gain signal (103); performing an endpoint detection on the basis of the strong gain signal, so as to obtain an endpoint detection result (104); and performing endpoint processing on the weak gain signal according to the endpoint detection result, so as to obtain a final target voice signal (105). The present invention can obtain an accurate and reliable target voice signal, thereby avoiding an adverse effect of the target voice quality on subsequent target voice processing.

    SUMMARY DETERMINATION METHOD AND RELATED DEVICE THEREOF

    公开(公告)号:US20250061280A1

    公开(公告)日:2025-02-20

    申请号:US18724188

    申请日:2022-11-21

    Abstract: A minutes determining method is provided. The method includes acquiring a to-be-used user record and a to-be-used record text; performing sentence segmentation processing on the to-be-used record text, to obtain at least one to-be-used sentence; performing semantic matching processing on the to-be-used user record and the at least one to-be-used sentence, to obtain a to-be-used semantic matching result; and determining to-be-used minutes content according to the to-be-used user record and the to-be-used semantic matching result.

    End-to-end modelling method and system

    公开(公告)号:US11651578B2

    公开(公告)日:2023-05-16

    申请号:US16329368

    申请日:2017-01-11

    CPC classification number: G06N3/08 G06N3/049 G06N99/00 G10L15/06

    Abstract: A method and a system for end-to-end modeling are provided. The method includes: determining a topological structure of a target-based end-to-end model, where the topological structure includes an input layer, an encoding layer, an code enhancement layer, a filtering layer, a decoding layer and an output layer; the code enhancement layer adds information of a target unit to a feature sequence outputted by the encoding layer, the filtering layer filters a feature sequence added with the information of the target unit; collecting multiple pieces of training data; and training parameters of the target-based end-to-end model by using the multiple pieces of the training data.

    Whispering voice recovery method, apparatus and device, and readable storage medium

    公开(公告)号:US11508366B2

    公开(公告)日:2022-11-22

    申请号:US16647284

    申请日:2018-06-15

    Abstract: A method, an apparatus and a device for converting a whispered speech, and a readable storage medium are provided. The method is implemented based on the whispered speech converting model. The whispered speech converting model is trained in advance by using recognition results and whispered speech training acoustic features of whispered speech training data as samples and using normal speech acoustic features of normal speech data parallel to the whispered speech training data as sample labels. A whispered speech acoustic feature and a preliminary recognition result of whispered speech data are acquired, then the whispered speech acoustic feature and the preliminary recognition result are inputted into a preset whispered speech converting model to acquire a normal speech acoustic feature outputted by the model. In this way, the whispered speech can be converted to a normal speech.

Patent Agency Ranking