专利检索 ap:("TENCENT America LLC") AND inv:"Dan Su" 第 1 页

1.

发明授权
Multi-task training architecture and strategy for attention-based speech recognition system 有权

公开(公告)号：US11972754B2

公开(公告)日：2024-04-30

申请号：US17559617

申请日：2021-12-22

申请人： TENCENT AMERICA LLC

发明人： Jia Cui , Chao Weng , Guangsen Wang , Jun Wang , Chengzhu Yu , Dan Su , Dong Yu

IPC分类号： G10L15/06 , G10L15/10 , G10L25/03 , G10L25/54

CPC分类号： G10L15/063 , G10L15/10 , G10L25/03 , G10L25/54

摘要： Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.

2.

发明授权
Multi-task training architecture and strategy for attention-based speech recognition system 有权

公开(公告)号：US11257481B2

公开(公告)日：2022-02-22

申请号：US16169512

申请日：2018-10-24

申请人： TENCENT AMERICA LLC

发明人： Jia Cui , Chao Weng , Guangsen Wang , Jun Wang , Chengzhu Yu , Dan Su , Dong Yu

IPC分类号： G10L15/06 , G10L25/03 , G10L25/54 , G10L15/10

摘要： Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.

3.

发明授权
N-best softmax smoothing for minimum bayes risk training of attention based sequence-to-sequence models 有权

公开(公告)号：US11803618B2

公开(公告)日：2023-10-31

申请号：US17989536

申请日：2022-11-17

申请人： TENCENT AMERICA LLC

发明人： Chao Weng , Jia Cui , Guangsen Wang , Jun Wang , Chengzhu Yu , Dan Su , Dong Yu

IPC分类号： G06F18/20 , G06V10/70 , G10L15/06 , G06N3/044 , G06N3/045 , G06N3/08 , G06F18/2415 , G06N20/00 , G06F40/47

CPC分类号： G06F18/24155 , G06F18/29 , G06F40/47 , G06N20/00 , G06V10/768 , G10L15/063

摘要： A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.

4.

发明授权
Input-feeding architecture for attention based end-to-end speech recognition 审中-公开

公开(公告)号：US10672382B2

公开(公告)日：2020-06-02

申请号：US16160352

申请日：2018-10-15

申请人： TENCENT AMERICA LLC

发明人： Chao Weng , Jia Cui , Guangsen Wang , Jun Wang , Chengzhu Yu , Dan Su , Dong Yu

IPC分类号： G10L15/06 , G10L15/14 , G10L15/183 , G10L15/22

摘要： Methods and apparatuses are provided for performing end-to-end speech recognition training performed by at least one processor. The method includes receiving, by the at least one processor, one or more input speech frames, generating, by the at least one processor, a sequence of encoder hidden states by transforming the input speech frames, computing, by the at least one processor, attention weights based on each of the sequence of encoder hidden states and a current decoder hidden state, performing, by the at least one processor, a decoding operation based on a previous embedded label prediction information and a previous attentional hidden state information generated based on the attention weights; and generating a current embedded label prediction information based on a result of the decoding operation and the attention weights.

5.

发明申请
INPUT-FEEDING ARCHITECTURE FOR ATTENTION BASED END-TO-END SPEECH RECOGNITION 审中-公开

公开(公告)号：US20200118547A1

公开(公告)日：2020-04-16

申请号：US16160352

申请日：2018-10-15

申请人： TENCENT AMERICA LLC

发明人： Chao WENG , Jia Cui , Guangsen WANG , Jun Wang , Chengzhu Yu , Dan Su , Dong Yu

IPC分类号： G10L15/06 , G10L15/22

摘要： Methods and apparatuses are provided for performing end-to-end speech recognition training performed by at least one processor. The method includes receiving, by the at least one processor, one or more input speech frames, generating, by the at least one processor, a sequence of encoder hidden states by transforming the input speech frames, computing, by the at least one processor, attention weights based on each of the sequence of encoder hidden states and a current decoder hidden state, performing, by the at least one processor, a decoding operation based on a previous embedded label prediction information and a previous attentional hidden state information generated based on the attention weights; and generating a current embedded label prediction information based on a result of the decoding operation and the attention weights.

6.

发明授权
N-best softmax smoothing for minimum bayes risk training of attention based sequence-to-sequence models 有权

公开(公告)号：US11551136B2

公开(公告)日：2023-01-10

申请号：US16191027

申请日：2018-11-14

申请人： TENCENT America LLC

发明人： Chao Weng , Jia Cui , Guangsen Wang , Jun Wang , Chengzhu Yu , Dan Su , Dong Yu

IPC分类号： G06N20/00 , G06N3/04 , G06N3/08 , G06F40/47 , G06V10/70 , G06K9/62 , G10L15/06

摘要： A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.