-
公开(公告)号:US12094481B2
公开(公告)日:2024-09-17
申请号:US17455497
申请日:2021-11-18
申请人: TENCENT AMERICA LLC
发明人: Yong Xu , Meng Yu , Shi-Xiong Zhang , Dong Yu
IPC分类号: G10L21/0208 , G06N3/044 , G06N3/08 , G10L21/0216 , G10L21/0264 , G10L25/30
CPC分类号: G10L21/0208
摘要: There is included a method and apparatus comprising computer code for generating enhanced target speech from audio data, performed by a computing device, the method comprising: receiving audio data corresponding to one or more speakers; generating estimated an target speech, an estimated noise, and an estimated echo simultaneously based on the audio data using a jointly trained complex ratio mask; predicting frame-level multi-tap time-frequency (T-F) spatio-temporal-echo filter weights based on the estimated target speech, the estimated noise, and the estimated echo using a trained neural network model; and predicting enhanced target speech based on the frame-level multi-tap T-F spatio-temporal-echo filter weights.
-
公开(公告)号:US20240211501A1
公开(公告)日:2024-06-27
申请号:US18146765
申请日:2022-12-27
申请人: TENCENT AMERICA LLC
发明人: Hongming Zhang , Xiaoman Pan , Wenlin YAO , Jianshu Chen , Dong Yu
CPC分类号: G06F16/3344 , G06F16/355
摘要: There is included a method and apparatus comprising computer code for instance-wise adaptive knowledge injection in a pre-trained language model (PTLM) including determining a necessity of external knowledge in a plurality of queries of a first dataset based on a likelihood that a respective query is solved by internal knowledge of a target model. Then, the one or more queries determined to need external knowledge may be augmented with pieces of external knowledge. A combined dataset may be generated by combining the first dataset and the one or more augmented queries, and the combined dataset may be applied to the target model.
-
3.
公开(公告)号:US11972754B2
公开(公告)日:2024-04-30
申请号:US17559617
申请日:2021-12-22
申请人: TENCENT AMERICA LLC
发明人: Jia Cui , Chao Weng , Guangsen Wang , Jun Wang , Chengzhu Yu , Dan Su , Dong Yu
CPC分类号: G10L15/063 , G10L15/10 , G10L25/03 , G10L25/54
摘要: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.
-
公开(公告)号:US11682379B2
公开(公告)日:2023-06-20
申请号:US17679790
申请日:2022-02-24
申请人: TENCENT AMERICA LLC
发明人: Chengzhu Yu , Dong Yu
IPC分类号: G10L13/033 , G10L13/047 , G10L13/02 , G10L13/04 , G10L13/07 , G10L25/18 , G10L13/06 , G10L25/24
CPC分类号: G10L13/033 , G10L13/047 , G10L13/06 , G10L25/18 , G10L25/24
摘要: A method, computer program, and computer system is provided for synthesizing speech at one or more speeds. A context associated with one or more phonemes corresponding to a speaking voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a voice sample corresponding to the speaking voice is synthesized using the generated mel-spectrogram features.
-
公开(公告)号:US11430431B2
公开(公告)日:2022-08-30
申请号:US16783807
申请日:2020-02-06
申请人: TENCENT AMERICA LLC
发明人: Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu
摘要: A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.
-
公开(公告)号:US11257480B2
公开(公告)日:2022-02-22
申请号:US16807851
申请日:2020-03-03
申请人: TENCENT AMERICA LLC
发明人: Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu
IPC分类号: G10L25/48 , G10L25/30 , G10L13/00 , G10L13/033 , G10L25/90 , G10L13/047
摘要: A method, a computer readable medium, and a computer system are provided for singing voice conversion. Data corresponding to a singing voice is received. One or more features and pitch data are extracted from the received data using one or more adversarial neural networks. One or more audio samples are generated based on the extracted pitch data and the one or more features.
-
公开(公告)号:US11128435B2
公开(公告)日:2021-09-21
申请号:US16505368
申请日:2019-07-08
申请人: Tencent America LLC
发明人: Shixiong Zhang , Dong Yu
摘要: This disclosure relates to a cloud-local joint or collaborative data analytics framework that provides data analytics models trained and hosted in backend servers for processing data items preprocessed and encrypted by remote terminal devices. The data analytics models are configured to generate encrypted output data items that are then communicated to the local terminal devices for decryption and post-processing. This framework functions without exposing decryption keys of the local terminal devices to the backend servers and the communication network. The encryption/decryption and data analytics in the backend servers are configured to process and communicate data items efficiently to provide real-time or near real-time system response to requests for data analytics from the remote terminal devices.
-
公开(公告)号:US20240220709A1
公开(公告)日:2024-07-04
申请号:US18090132
申请日:2022-12-28
申请人: Tencent America LLC
发明人: Sangwoo Cho , Kaiqiang Song , Xiaoyang Wang , Dong Yu
IPC分类号: G06F40/166 , G06F40/289 , G06N20/00
CPC分类号: G06F40/166 , G06F40/289 , G06N20/00
摘要: A method including receiving an input comprising natural language texts; segmenting the natural language texts into sections; summarizing the natural language texts; developing a first model based on the plurality of sections and the summary of the natural language texts; identifying one or more salient sentences within the natural language texts using the first model; determining a sentence quality score based on how informative a salient sentence is; determining a sentence similarity score based on a salient sentence's similarity to another salient sentence; developing a second model based on the sentence quality score and the sentence similarity score; combining the first model and the second model into a final model; selecting sentences based on the final model; and generating an extractive summarization using the selected sentences.
-
9.
公开(公告)号:US11776556B2
公开(公告)日:2023-10-03
申请号:US17485943
申请日:2021-09-27
申请人: TENCENT AMERICA LLC
IPC分类号: G10L21/0224 , H04R3/04 , G06N3/02 , G10L21/0216 , G10L21/0208
CPC分类号: G10L21/0224 , G06N3/02 , H04R3/04 , G10L2021/02082 , G10L2021/02163
摘要: A method, computer program, and computer system is provided for an all-deep-learning based AEC system by recurrent neural networks. The model consists of two stages, echo estimation stage and echo suppression stage, respectively. Two different schemes for echo estimation are presented herein: linear echo estimation by multi-tap filtering on far-end reference signal and non-linear echo estimation by single-tap masking on microphone signal. A microphone signal waveform and a far-end reference signal waveform are received. An echo signal waveform is estimated based on the microphone signal waveform and a far-end reference signal waveform. A near-end speech signal waveform is output based on subtracting the estimated echo signal waveform from the microphone signal waveform, and echoes are suppressed within the near-end speech signal waveform.
-
公开(公告)号:US20230196087A1
公开(公告)日:2023-06-22
申请号:US17510782
申请日:2021-10-26
申请人: TENCENT AMERICA LLC
发明人: Lifeng Jin , Linfeng Song , Kun Xu , Dong Yu
CPC分类号: G06N3/08 , G06K9/6256 , G06N3/0454
摘要: There is included a method and apparatus comprising computer code for a joint training method using neural networks with noise-robust losses comprising encoding input tokens from a noisy dataset into input vectors using an input encoder; predicting a label based on the input vectors using a classifier model; calculating a beta value based on the input vectors and the label using a label quality predictor model, wherein the beta value is instance-specific for each training instance; and j oint training more than one model using a first modified loss function based on the beta value and an entropy value.
-
-
-
-
-
-
-
-
-