-
1.
公开(公告)号:US11450337B2
公开(公告)日:2022-09-20
申请号:US17023829
申请日:2020-09-17
Inventor: Lianwu Chen , Meng Yu , Yanmin Qian , Dan Su , Dong Yu
IPC: G10L21/0272 , G06N3/04 , G06N3/08 , G10L25/30 , G10L25/51
Abstract: A multi-person speech separation method is provided for a terminal. The method includes extracting a hybrid speech feature from a hybrid speech signal requiring separation, N human voices being mixed in the hybrid speech signal, N being a positive integer greater than or equal to 2; extracting a masking coefficient of the hybrid speech feature by using a generative adversarial network (GAN) model, to obtain a masking matrix corresponding to the N human voices, wherein the GAN model comprises a generative network model and an adversarial network model; and performing a speech separation on the masking matrix corresponding to the N human voices and the hybrid speech signal by using the GAN model, and outputting N separated speech signals corresponding to the N human voices.
-
2.
公开(公告)号:US20210375294A1
公开(公告)日:2021-12-02
申请号:US17401125
申请日:2021-08-12
Inventor: Rongzhi Gu , Shixiong Zhang , Lianwu Chen , Yong Xu , Meng Yu , Dan Su , Dong Yu
IPC: G10L19/008 , G10L25/30 , G10L25/03
Abstract: This application relates to a method of extracting an inter channel feature from a multi-channel multi-sound source mixed audio signal performed at a computing device. The method includes: transforming one channel component of a multi-channel multi-sound source mixed audio signal into a single-channel multi-sound source mixed audio representation in a feature space; performing a two-dimensional dilated convolution on the multi-channel multi-sound source mixed audio signal to extract inter-channel features; performing a feature fusion on the single-channel multi-sound source mixed audio representation and the inter-channel features; estimating respective weights of sound sources in the single-channel multi-sound source mixed audio representation based on a fused multi-channel multi-sound source mixed audio feature; obtaining respective representations of the plurality of sound sources according to the single-channel multi-sound source mixed audio representation and the respective weights; and transforming the respective representations of the sound sources into respective audio signals of the plurality of sound sources.
-
3.
公开(公告)号:US12087290B2
公开(公告)日:2024-09-10
申请号:US16941503
申请日:2020-07-28
Inventor: Jingliang Bai , Caisheng Ouyang , Haikang Liu , Lianwu Chen , Qi Chen , Yulu Zhang , Min Luo , Dan Su
IPC: G10L15/183 , G10L15/06 , G10L15/22 , G10L15/30 , G10L21/0232 , G10L25/21 , G10L25/84 , G10L25/78
CPC classification number: G10L15/183 , G10L15/063 , G10L15/22 , G10L15/30 , G10L21/0232 , G10L25/21 , G10L25/84 , G10L2015/0636 , G10L2025/783
Abstract: A data processing method based on simultaneous interpretation, applied to a server in a simultaneous interpretation system, including: obtaining audio transmitted by a simultaneous interpretation device; processing the audio by using a simultaneous interpretation model to obtain an initial text; transmitting the initial text to a user terminal; receiving a modified text fed back by the user terminal, the modified text being obtained after the user terminal modifies the initial text; and updating the simultaneous interpretation model according to the initial text and the modified text.
-
公开(公告)号:US12051441B2
公开(公告)日:2024-07-30
申请号:US17944067
申请日:2022-09-13
Inventor: Jimeng Zheng , Lianwu Chen , Weiwei Li , Zhiyi Duan , Meng Yu , Dan Su , Kaiyu Jiang
CPC classification number: G10L25/84 , G06T7/20 , G10L17/02 , G10L17/22 , G10L21/028 , G10L25/21 , G06T2207/30201
Abstract: This application discloses a multi-sound area-based speech detection method and related apparatus, and a storage medium, which is applied to the field of artificial intelligence. The method includes: obtaining sound area information corresponding to N sound areas including multiple users speaking simultaneously; generating a control signal corresponding to each target detection sound area according to user information corresponding to the target detection sound area; processing multi-user speech input signals by using the control signals, to obtain a speech output signal corresponding to each target detection sound area; generating a speech detection result of the target detection sound area according to the speech output signal corresponding to the target detection sound area; and selecting, among the multiple users, a main speaker based on the user information, the speech output signals and speech detection results of multiple users in the N sound areas.
-
公开(公告)号:US11430428B2
公开(公告)日:2022-08-30
申请号:US17016573
申请日:2020-09-10
Inventor: Lianwu Chen , Jingliang Bai , Min Luo
Abstract: The present disclosure describes a method, apparatus, and storage medium for performing speech recognition. The method includes acquiring, by an apparatus, first to-be-processed speech information. The apparatus includes a memory storing instructions and a processor in communication with the memory. The method includes acquiring, by the apparatus, a first pause duration according to the first to-be-processed speech information; and in response to the first pause duration being greater than or equal to a first threshold, performing, by the apparatus, speech recognition on the first to-be-processed speech information to obtain a first result of sentence segmentation of speech, the first result of sentence segmentation of speech being text information, the first threshold being determined according to speech information corresponding to a previous moment.
-
公开(公告)号:US20200051549A1
公开(公告)日:2020-02-13
申请号:US16655548
申请日:2019-10-17
Inventor: Lianwu Chen , Meng Yu , Min Luo , Dan Su
Abstract: Embodiments of the present invention provide a speech signal processing model training method, an electronic device and a storage medium. The embodiments of the present invention determines a target training loss function based on a training loss function of each of one or more speech signal processing tasks; inputs a task input feature of each speech signal processing task into a starting multi-task neural network, and updates model parameters of a shared layer and each of one or more task layers of the starting multi-task neural network corresponding to the one or more speech signal processing tasks by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model.
-
7.
公开(公告)号:US11908483B2
公开(公告)日:2024-02-20
申请号:US17401125
申请日:2021-08-12
Inventor: Rongzhi Gu , Shixiong Zhang , Lianwu Chen , Yong Xu , Meng Yu , Dan Su , Dong Yu
IPC: G10L19/008 , G10L25/03 , G10L25/30 , H04S3/02 , H04S5/00
CPC classification number: G10L19/008 , G10L25/03 , G10L25/30 , H04S3/02 , H04S5/00
Abstract: This application relates to a method of extracting an inter channel feature from a multi-channel multi-sound source mixed audio signal performed at a computing device. The method includes: transforming one channel component of a multi-channel multi-sound source mixed audio signal into a single-channel multi-sound source mixed audio representation in a feature space; performing a two-dimensional dilated convolution on the multi-channel multi-sound source mixed audio signal to extract inter-channel features; performing a feature fusion on the single-channel multi-sound source mixed audio representation and the inter-channel features; estimating respective weights of sound sources in the single-channel multi-sound source mixed audio representation based on a fused multi-channel multi-sound source mixed audio feature; obtaining respective representations of the plurality of sound sources according to the single-channel multi-sound source mixed audio representation and the respective weights; and transforming the respective representations of the sound sources into respective audio signals of the plurality of sound sources.
-
公开(公告)号:US11158304B2
公开(公告)日:2021-10-26
申请号:US16655548
申请日:2019-10-17
Inventor: Lianwu Chen , Meng Yu , Min Luo , Dan Su
Abstract: Embodiments of the present invention provide a speech signal processing model training method, an electronic device and a storage medium. The embodiments of the present invention determines a target training loss function based on a training loss function of each of one or more speech signal processing tasks; inputs a task input feature of each speech signal processing task into a starting multi-task neural network, and updates model parameters of a shared layer and each of one or more task layers of the starting multi-task neural network corresponding to the one or more speech signal processing tasks by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model.
-
-
-
-
-
-
-