INTER-CHANNEL FEATURE EXTRACTION METHOD, AUDIO SEPARATION METHOD AND APPARATUS, AND COMPUTING DEVICE

    公开(公告)号:US20210375294A1

    公开(公告)日:2021-12-02

    申请号:US17401125

    申请日:2021-08-12

    Abstract: This application relates to a method of extracting an inter channel feature from a multi-channel multi-sound source mixed audio signal performed at a computing device. The method includes: transforming one channel component of a multi-channel multi-sound source mixed audio signal into a single-channel multi-sound source mixed audio representation in a feature space; performing a two-dimensional dilated convolution on the multi-channel multi-sound source mixed audio signal to extract inter-channel features; performing a feature fusion on the single-channel multi-sound source mixed audio representation and the inter-channel features; estimating respective weights of sound sources in the single-channel multi-sound source mixed audio representation based on a fused multi-channel multi-sound source mixed audio feature; obtaining respective representations of the plurality of sound sources according to the single-channel multi-sound source mixed audio representation and the respective weights; and transforming the respective representations of the sound sources into respective audio signals of the plurality of sound sources.

    Method, apparatus, and storage medium for segmenting sentences for speech recognition

    公开(公告)号:US11430428B2

    公开(公告)日:2022-08-30

    申请号:US17016573

    申请日:2020-09-10

    Abstract: The present disclosure describes a method, apparatus, and storage medium for performing speech recognition. The method includes acquiring, by an apparatus, first to-be-processed speech information. The apparatus includes a memory storing instructions and a processor in communication with the memory. The method includes acquiring, by the apparatus, a first pause duration according to the first to-be-processed speech information; and in response to the first pause duration being greater than or equal to a first threshold, performing, by the apparatus, speech recognition on the first to-be-processed speech information to obtain a first result of sentence segmentation of speech, the first result of sentence segmentation of speech being text information, the first threshold being determined according to speech information corresponding to a previous moment.

    SPEECH SIGNAL PROCESSING MODEL TRAINING METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM

    公开(公告)号:US20200051549A1

    公开(公告)日:2020-02-13

    申请号:US16655548

    申请日:2019-10-17

    Abstract: Embodiments of the present invention provide a speech signal processing model training method, an electronic device and a storage medium. The embodiments of the present invention determines a target training loss function based on a training loss function of each of one or more speech signal processing tasks; inputs a task input feature of each speech signal processing task into a starting multi-task neural network, and updates model parameters of a shared layer and each of one or more task layers of the starting multi-task neural network corresponding to the one or more speech signal processing tasks by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model.

    Inter-channel feature extraction method, audio separation method and apparatus, and computing device

    公开(公告)号:US11908483B2

    公开(公告)日:2024-02-20

    申请号:US17401125

    申请日:2021-08-12

    CPC classification number: G10L19/008 G10L25/03 G10L25/30 H04S3/02 H04S5/00

    Abstract: This application relates to a method of extracting an inter channel feature from a multi-channel multi-sound source mixed audio signal performed at a computing device. The method includes: transforming one channel component of a multi-channel multi-sound source mixed audio signal into a single-channel multi-sound source mixed audio representation in a feature space; performing a two-dimensional dilated convolution on the multi-channel multi-sound source mixed audio signal to extract inter-channel features; performing a feature fusion on the single-channel multi-sound source mixed audio representation and the inter-channel features; estimating respective weights of sound sources in the single-channel multi-sound source mixed audio representation based on a fused multi-channel multi-sound source mixed audio feature; obtaining respective representations of the plurality of sound sources according to the single-channel multi-sound source mixed audio representation and the respective weights; and transforming the respective representations of the sound sources into respective audio signals of the plurality of sound sources.

Patent Agency Ranking