Patent search ap:("TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED") AND inv:"Dong Yu" Page 1

1.

发明授权
Speech separation model training method and apparatus, storage medium and computer device 有权

公开(公告)号：US11908455B2

公开(公告)日：2024-02-20

申请号：US17672565

申请日：2022-02-15

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Jun Wang , Wingyip Lam , Dan Su , Dong Yu

IPC: G10L15/06 , G10L15/05 , G10L15/16

CPC classification number: G10L15/063 , G10L15/05 , G10L15/16

Abstract: A speech separation model training method and apparatus, a computer-readable storage medium, and a computer device are provided, the method including: obtaining first audio and second audio, the first audio including target audio and having corresponding labeled audio, and the second audio including noise audio. obtaining an encoding model, an extraction model, and an initial estimation model; performing unsupervised training on the encoding model, the extraction model, and the estimation model according to the second audio, and adjusting model parameters of the extraction model and the estimation model; performing supervised training on the encoding model and the extraction model according to the first audio and the labeled audio corresponding to the first audio, and adjusting a model parameter of the encoding model; continuously performing the unsupervised training and the supervised training, so that the unsupervised training and the supervised training overlap, and the training is not finished until a training stop condition is met.

2.

发明授权
Voice synthesis method, model training method, device and computer device 有权

公开(公告)号：US12014720B2

公开(公告)日：2024-06-18

申请号：US16999989

申请日：2020-08-21

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Xixin Wu , Mu Wang , Shiyin Kang , Dan Su , Dong Yu

IPC: G10L13/00 , G10L19/02

CPC classification number: G10L13/00 , G10L19/02

Abstract: This application relates to a speech synthesis method and apparatus, a model training method and apparatus, and a computer device. The method includes: obtaining to-be-processed linguistic data; encoding the linguistic data, to obtain encoded linguistic data; obtaining an embedded vector for speech feature conversion, the embedded vector being generated according to a residual between synthesized reference speech data and reference speech data that correspond to the same reference linguistic data; and decoding the encoded linguistic data according to the embedded vector, to obtain target synthesized speech data on which the speech feature conversion is performed. The solution provided in this application can prevent quality of a synthesized speech from being affected by a semantic feature in a mel-frequency cepstrum.

3.

发明授权
Monaural multi-talker speech recognition with attention mechanism and gated convolutional networks 有权

公开(公告)号：US10699700B2

公开(公告)日：2020-06-30

申请号：US16050825

申请日：2018-07-31

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Yanmin Qian , Dong Yu

IPC: G10L15/06 , G10L15/20 , G06N5/04 , G10L21/0308 , G06N3/08 , G10L25/30

Abstract: Provided are a speech recognition training processing method and an apparatus including the same. The speech recognition training processing method includes acquiring multi-talker mixed speech sequence data corresponding to a plurality of speakers, encoding the multi-speaker mixed speech sequence data into an embedded sequence data, generating speaker specific context vectors at each frame based on the embedded sequence, generating senone posteriors for each of the speaker based on the speaker specific context vectors and updating an acoustic model by performing permutation invariant training (PIT) model training based on the senone posteriors.

4.

发明授权
Model training method, media information synthesis method, and related apparatuses 有权

公开(公告)号：US12283087B2

公开(公告)日：2025-04-22

申请号：US17109072

申请日：2020-12-01

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Haozhi Huang , Jiawei Li , Li Shen , Yonggen Ling , Wei Liu , Dong Yu

IPC: G06N20/00 , G06F18/21 , G06F18/214 , G06V10/764 , G06V10/774 , G06V40/10 , G06V40/16 , H04N5/265

Abstract: A model training method includes obtaining an image sample set and brief-prompt information; generating a content mask set according to the image sample set and the brief-prompt information; generating a to-be-trained image set according to the content mask set; obtaining, based on the image sample set and the to-be-trained image set, a predicted image set through a to-be-trained information synthesis model, the predicted image set comprising at least one predicted image, the predicted image being in correspondence to the image sample; and training, based on the predicted image set and the image sample set, the to-be-trained information synthesis model by using a target loss function, to obtain an information synthesis model.

5.

发明授权
Mixed speech recognition method and apparatus, and computer-readable storage medium 有权

公开(公告)号：US11996091B2

公开(公告)日：2024-05-28

申请号：US16989844

申请日：2020-08-10

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Jun Wang , Jie Chen , Dan Su , Dong Yu

IPC: G10L15/20 , G10L15/02 , G10L15/16 , G10L15/22 , G10L17/06 , G10L21/02 , G10L21/0272 , G10L21/0208

CPC classification number: G10L15/20 , G10L15/02 , G10L15/16 , G10L15/22 , G10L17/06 , G10L21/02 , G10L21/0272 , G10L2015/223 , G10L2021/02087

Abstract: A mixed speech recognition method, a mixed speech recognition apparatus, and a computer-readable storage medium are provided. The mixed speech recognition method includes: monitoring an input of speech input and detecting an enrollment speech and a mixed speech; acquiring speech features of a target speaker based on the enrollment speech; and determining speech belonging to the target speaker in the mixed speech based on the speech features of the target speaker. The enrollment speech includes preset speech information, and the mixed speech is non-enrollment speech inputted after the enrollment speech.

6.

发明授权
Inter-channel feature extraction method, audio separation method and apparatus, and computing device 有权

公开(公告)号：US11908483B2

公开(公告)日：2024-02-20

申请号：US17401125

申请日：2021-08-12

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Rongzhi Gu , Shixiong Zhang , Lianwu Chen , Yong Xu , Meng Yu , Dan Su , Dong Yu

IPC: G10L19/008 , G10L25/03 , G10L25/30 , H04S3/02 , H04S5/00

CPC classification number: G10L19/008 , G10L25/03 , G10L25/30 , H04S3/02 , H04S5/00

Abstract: This application relates to a method of extracting an inter channel feature from a multi-channel multi-sound source mixed audio signal performed at a computing device. The method includes: transforming one channel component of a multi-channel multi-sound source mixed audio signal into a single-channel multi-sound source mixed audio representation in a feature space; performing a two-dimensional dilated convolution on the multi-channel multi-sound source mixed audio signal to extract inter-channel features; performing a feature fusion on the single-channel multi-sound source mixed audio representation and the inter-channel features; estimating respective weights of sound sources in the single-channel multi-sound source mixed audio representation based on a fused multi-channel multi-sound source mixed audio feature; obtaining respective representations of the plurality of sound sources according to the single-channel multi-sound source mixed audio representation and the respective weights; and transforming the respective representations of the sound sources into respective audio signals of the plurality of sound sources.

7.

发明授权
Audio recognition method and system and machine device 有权

公开(公告)号：US11900917B2

公开(公告)日：2024-02-13

申请号：US17230515

申请日：2021-04-14

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Dan Su , Jun Wang , Jie Chen , Dong Yu

IPC: G10L15/07 , G10L15/06 , G06N3/04 , G06N3/08 , G10L15/02 , G10L15/16 , G10L15/22

CPC classification number: G10L15/063 , G06N3/04 , G06N3/08 , G10L15/02 , G10L15/16 , G10L15/22 , G10L2015/0635

Abstract: A neural network training method is provided. The method includes obtaining an audio data stream, performing, for different audio data of each time frame in the audio data stream, feature extraction in each layer of a neural network, to obtain a depth feature outputted by a corresponding time frame, fusing, for a given label in labeling data, an inter-class confusion measurement index and an intra-class distance penalty value relative to the given label in a set loss function for the audio data stream through the depth feature, and updating a parameter in the neural network by using a loss function value obtained through fusion.

8.

发明申请
TRAINING METHOD AND DEVICE FOR AUDIO SEPARATION NETWORK, AUDIO SEPARATION METHOD AND DEVICE, AND MEDIUM 有权

公开(公告)号：US20220180882A1

公开(公告)日：2022-06-09

申请号：US17682399

申请日：2022-02-28

Applicant: TENCENT TECHNOLOGY(SHENZHEN) COMPANY LIMITED

Inventor： Jun WANG , Wing Yip Lam , Dan Su , Dong Yu

IPC: G10L19/06 , G10L25/30 , G06N3/08 , G06N3/04

Abstract: A method of training an audio separation network is provided. The method includes obtaining a first separation sample set, the first separation sample set including at least two types of audio with dummy labels, obtaining a first sample set by performing interpolation on the first separation sample set based on perturbation data, obtaining a second separation sample set by separating the first sample set using an unsupervised network, determining losses of second separation samples in the second separation sample set, and adjusting network parameters of the unsupervised network based on the losses of the second separation samples, such that a first loss of a first separation result outputted by an adjusted unsupervised network meets a convergence condition.

9.

发明授权
Speech keyword recognition method and apparatus, computer-readable storage medium, and computer device 有权

公开(公告)号：US11222623B2

公开(公告)日：2022-01-11

申请号：US16884350

申请日：2020-05-27

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Jun Wang , Dan Su , Dong Yu

IPC: G10L15/08 , G10L15/05 , G10L15/06

Abstract: A speech keyword recognition method includes: obtaining first speech segments based on a to-be-recognized speech signal; obtaining first probabilities respectively corresponding to the first speech segments by using a preset first classification model. A first probability of a first speech segment is obtained from probabilities of the first speech segment respectively corresponding to pre-determined word segmentation units of a pre-determined keyword. The method also includes obtaining second speech segments based on the to-be-recognized speech signal, and respectively generating first prediction characteristics of the second speech segments based on first probabilities of first speech segments that correspond to each second speech segment; performing classification based on the first prediction characteristics by using a preset second classification model, to obtain second probabilities respectively corresponding to the second speech segments related to the pre-determined keyword; and determining, based on the second probabilities, whether the pre-determined keyword exists in the to-be-recognized speech signal.

10.

发明授权
Adaptive permutation invariant training with auxiliary information for monaural multi-talker speech recognition 有权

公开(公告)号：US10699698B2

公开(公告)日：2020-06-30

申请号：US15940246

申请日：2018-03-29

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Yanmin Qian , Dong Yu

IPC: G10L15/00 , G10L15/06

Abstract: Provided are a speech recognition training processing method and an apparatus including the same. The speech recognition training processing method includes acquiring a stream of speech data from one or more speakers, extracting an auxiliary feature corresponding to a speech characteristic of the one or more speaker and updating an acoustic model by performing permutation invariant training (PIT) model training based on the auxiliary feature.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification