Patent search ap:("TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED") AND inv:"Lianwu Chen" Page 1

1.

发明授权
Multi-person speech separation method and apparatus using a generative adversarial network model 有权

公开(公告)号：US11450337B2

公开(公告)日：2022-09-20

申请号：US17023829

申请日：2020-09-17

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Lianwu Chen , Meng Yu , Yanmin Qian , Dan Su , Dong Yu

IPC: G10L21/0272 , G06N3/04 , G06N3/08 , G10L25/30 , G10L25/51

Abstract: A multi-person speech separation method is provided for a terminal. The method includes extracting a hybrid speech feature from a hybrid speech signal requiring separation, N human voices being mixed in the hybrid speech signal, N being a positive integer greater than or equal to 2; extracting a masking coefficient of the hybrid speech feature by using a generative adversarial network (GAN) model, to obtain a masking matrix corresponding to the N human voices, wherein the GAN model comprises a generative network model and an adversarial network model; and performing a speech separation on the masking matrix corresponding to the N human voices and the hybrid speech signal by using the GAN model, and outputting N separated speech signals corresponding to the N human voices.

2.

发明申请
INTER-CHANNEL FEATURE EXTRACTION METHOD, AUDIO SEPARATION METHOD AND APPARATUS, AND COMPUTING DEVICE 有权

公开(公告)号：US20210375294A1

公开(公告)日：2021-12-02

申请号：US17401125

申请日：2021-08-12

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Rongzhi Gu , Shixiong Zhang , Lianwu Chen , Yong Xu , Meng Yu , Dan Su , Dong Yu

IPC: G10L19/008 , G10L25/30 , G10L25/03

Abstract: This application relates to a method of extracting an inter channel feature from a multi-channel multi-sound source mixed audio signal performed at a computing device. The method includes: transforming one channel component of a multi-channel multi-sound source mixed audio signal into a single-channel multi-sound source mixed audio representation in a feature space; performing a two-dimensional dilated convolution on the multi-channel multi-sound source mixed audio signal to extract inter-channel features; performing a feature fusion on the single-channel multi-sound source mixed audio representation and the inter-channel features; estimating respective weights of sound sources in the single-channel multi-sound source mixed audio representation based on a fused multi-channel multi-sound source mixed audio feature; obtaining respective representations of the plurality of sound sources according to the single-channel multi-sound source mixed audio representation and the respective weights; and transforming the respective representations of the sound sources into respective audio signals of the plurality of sound sources.

3.

发明授权
Data processing method based on simultaneous interpretation, computer device, and storage medium 有权

公开(公告)号：US12087290B2

公开(公告)日：2024-09-10

申请号：US16941503

申请日：2020-07-28

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Jingliang Bai , Caisheng Ouyang , Haikang Liu , Lianwu Chen , Qi Chen , Yulu Zhang , Min Luo , Dan Su

IPC: G10L15/183 , G10L15/06 , G10L15/22 , G10L15/30 , G10L21/0232 , G10L25/21 , G10L25/84 , G10L25/78

CPC classification number: G10L15/183 , G10L15/063 , G10L15/22 , G10L15/30 , G10L21/0232 , G10L25/21 , G10L25/84 , G10L2015/0636 , G10L2025/783

Abstract: A data processing method based on simultaneous interpretation, applied to a server in a simultaneous interpretation system, including: obtaining audio transmitted by a simultaneous interpretation device; processing the audio by using a simultaneous interpretation model to obtain an initial text; transmitting the initial text to a user terminal; receiving a modified text fed back by the user terminal, the modified text being obtained after the user terminal modifies the initial text; and updating the simultaneous interpretation model according to the initial text and the modified text.

4.

发明授权
Multi-register-based speech detection method and related apparatus, and storage medium 有权

公开(公告)号：US12051441B2

公开(公告)日：2024-07-30

申请号：US17944067

申请日：2022-09-13

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Jimeng Zheng , Lianwu Chen , Weiwei Li , Zhiyi Duan , Meng Yu , Dan Su , Kaiyu Jiang

IPC: G10L25/84 , G06T7/20 , G10L17/02 , G10L17/22 , G10L21/028 , G10L25/21

CPC classification number: G10L25/84 , G06T7/20 , G10L17/02 , G10L17/22 , G10L21/028 , G10L25/21 , G06T2207/30201

Abstract: This application discloses a multi-sound area-based speech detection method and related apparatus, and a storage medium, which is applied to the field of artificial intelligence. The method includes: obtaining sound area information corresponding to N sound areas including multiple users speaking simultaneously; generating a control signal corresponding to each target detection sound area according to user information corresponding to the target detection sound area; processing multi-user speech input signals by using the control signals, to obtain a speech output signal corresponding to each target detection sound area; generating a speech detection result of the target detection sound area according to the speech output signal corresponding to the target detection sound area; and selecting, among the multiple users, a main speaker based on the user information, the speech output signals and speech detection results of multiple users in the N sound areas.

5.

发明授权
Method, apparatus, and storage medium for segmenting sentences for speech recognition 有权

公开(公告)号：US11430428B2

公开(公告)日：2022-08-30

申请号：US17016573

申请日：2020-09-10

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Lianwu Chen , Jingliang Bai , Min Luo

IPC: G10L15/04 , G10L15/02 , G10L15/26

Abstract: The present disclosure describes a method, apparatus, and storage medium for performing speech recognition. The method includes acquiring, by an apparatus, first to-be-processed speech information. The apparatus includes a memory storing instructions and a processor in communication with the memory. The method includes acquiring, by the apparatus, a first pause duration according to the first to-be-processed speech information; and in response to the first pause duration being greater than or equal to a first threshold, performing, by the apparatus, speech recognition on the first to-be-processed speech information to obtain a first result of sentence segmentation of speech, the first result of sentence segmentation of speech being text information, the first threshold being determined according to speech information corresponding to a previous moment.

6.

发明申请
SPEECH SIGNAL PROCESSING MODEL TRAINING METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20200051549A1

公开(公告)日：2020-02-13

申请号：US16655548

申请日：2019-10-17

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Lianwu Chen , Meng Yu , Min Luo , Dan Su

IPC: G10L15/06 , G10L15/16 , G10L15/22 , G10L15/183 , G06N3/08 , G06N3/04

Abstract: Embodiments of the present invention provide a speech signal processing model training method, an electronic device and a storage medium. The embodiments of the present invention determines a target training loss function based on a training loss function of each of one or more speech signal processing tasks; inputs a task input feature of each speech signal processing task into a starting multi-task neural network, and updates model parameters of a shared layer and each of one or more task layers of the starting multi-task neural network corresponding to the one or more speech signal processing tasks by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model.

7.

发明授权
Inter-channel feature extraction method, audio separation method and apparatus, and computing device 有权

公开(公告)号：US11908483B2

公开(公告)日：2024-02-20

申请号：US17401125

申请日：2021-08-12

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Rongzhi Gu , Shixiong Zhang , Lianwu Chen , Yong Xu , Meng Yu , Dan Su , Dong Yu

IPC: G10L19/008 , G10L25/03 , G10L25/30 , H04S3/02 , H04S5/00

CPC classification number: G10L19/008 , G10L25/03 , G10L25/30 , H04S3/02 , H04S5/00

Abstract: This application relates to a method of extracting an inter channel feature from a multi-channel multi-sound source mixed audio signal performed at a computing device. The method includes: transforming one channel component of a multi-channel multi-sound source mixed audio signal into a single-channel multi-sound source mixed audio representation in a feature space; performing a two-dimensional dilated convolution on the multi-channel multi-sound source mixed audio signal to extract inter-channel features; performing a feature fusion on the single-channel multi-sound source mixed audio representation and the inter-channel features; estimating respective weights of sound sources in the single-channel multi-sound source mixed audio representation based on a fused multi-channel multi-sound source mixed audio feature; obtaining respective representations of the plurality of sound sources according to the single-channel multi-sound source mixed audio representation and the respective weights; and transforming the respective representations of the sound sources into respective audio signals of the plurality of sound sources.

8.

发明授权
Training method of speech signal processing model with shared layer, electronic device and storage medium 有权

公开(公告)号：US11158304B2

公开(公告)日：2021-10-26

申请号：US16655548

申请日：2019-10-17

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Lianwu Chen , Meng Yu , Min Luo , Dan Su

IPC: G10L15/16 , G10L15/06 , G06N3/04 , G06N3/08 , G10L15/183 , G10L15/22

Abstract: Embodiments of the present invention provide a speech signal processing model training method, an electronic device and a storage medium. The embodiments of the present invention determines a target training loss function based on a training loss function of each of one or more speech signal processing tasks; inputs a task input feature of each speech signal processing task into a starting multi-task neural network, and updates model parameters of a shared layer and each of one or more task layers of the starting multi-task neural network corresponding to the one or more speech signal processing tasks by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification