ADL-UFE: all deep learning unified front-end system

    公开(公告)号:US12094481B2

    公开(公告)日:2024-09-17

    申请号:US17455497

    申请日:2021-11-18

    CPC分类号: G10L21/0208

    摘要: There is included a method and apparatus comprising computer code for generating enhanced target speech from audio data, performed by a computing device, the method comprising: receiving audio data corresponding to one or more speakers; generating estimated an target speech, an estimated noise, and an estimated echo simultaneously based on the audio data using a jointly trained complex ratio mask; predicting frame-level multi-tap time-frequency (T-F) spatio-temporal-echo filter weights based on the estimated target speech, the estimated noise, and the estimated echo using a trained neural network model; and predicting enhanced target speech based on the frame-level multi-tap T-F spatio-temporal-echo filter weights.

    INSTANCE-LEVEL ADAPTIVE PROPULSION OF EXTERNAL KNOWLEDGE (IAPEK)

    公开(公告)号:US20240211501A1

    公开(公告)日:2024-06-27

    申请号:US18146765

    申请日:2022-12-27

    IPC分类号: G06F16/33 G06F16/35

    CPC分类号: G06F16/3344 G06F16/355

    摘要: There is included a method and apparatus comprising computer code for instance-wise adaptive knowledge injection in a pre-trained language model (PTLM) including determining a necessity of external knowledge in a plurality of queries of a first dataset based on a likelihood that a respective query is solved by internal knowledge of a target model. Then, the one or more queries determined to need external knowledge may be augmented with pieces of external knowledge. A combined dataset may be generated by combining the first dataset and the one or more augmented queries, and the combined dataset may be applied to the target model.

    Learning singing from speech
    5.
    发明授权

    公开(公告)号:US11430431B2

    公开(公告)日:2022-08-30

    申请号:US16783807

    申请日:2020-02-06

    摘要: A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

    Distributed and collaborative analytics of encrypted data using deep polynomial networks

    公开(公告)号:US11128435B2

    公开(公告)日:2021-09-21

    申请号:US16505368

    申请日:2019-07-08

    摘要: This disclosure relates to a cloud-local joint or collaborative data analytics framework that provides data analytics models trained and hosted in backend servers for processing data items preprocessed and encrypted by remote terminal devices. The data analytics models are configured to generate encrypted output data items that are then communicated to the local terminal devices for decryption and post-processing. This framework functions without exposing decryption keys of the local terminal devices to the backend servers and the communication network. The encryption/decryption and data analytics in the backend servers are configured to process and communicate data items efficiently to provide real-time or near real-time system response to requests for data analytics from the remote terminal devices.

    UNIFYING TEXT SEGMENTATION AND LONG DOCUMENT SUMMARIZATION

    公开(公告)号:US20240220709A1

    公开(公告)日:2024-07-04

    申请号:US18090132

    申请日:2022-12-28

    摘要: A method including receiving an input comprising natural language texts; segmenting the natural language texts into sections; summarizing the natural language texts; developing a first model based on the plurality of sections and the summary of the natural language texts; identifying one or more salient sentences within the natural language texts using the first model; determining a sentence quality score based on how informative a salient sentence is; determining a sentence similarity score based on a salient sentence's similarity to another salient sentence; developing a second model based on the sentence quality score and the sentence similarity score; combining the first model and the second model into a final model; selecting sentences based on the final model; and generating an extractive summarization using the selected sentences.

    Unified deep neural network model for acoustic echo cancellation and residual echo suppression

    公开(公告)号:US11776556B2

    公开(公告)日:2023-10-03

    申请号:US17485943

    申请日:2021-09-27

    发明人: Meng Yu Dong Yu

    摘要: A method, computer program, and computer system is provided for an all-deep-learning based AEC system by recurrent neural networks. The model consists of two stages, echo estimation stage and echo suppression stage, respectively. Two different schemes for echo estimation are presented herein: linear echo estimation by multi-tap filtering on far-end reference signal and non-linear echo estimation by single-tap masking on microphone signal. A microphone signal waveform and a far-end reference signal waveform are received. An echo signal waveform is estimated based on the microphone signal waveform and a far-end reference signal waveform. A near-end speech signal waveform is output based on subtracting the estimated echo signal waveform from the microphone signal waveform, and echoes are suppressed within the near-end speech signal waveform.

    INSTANCE ADAPTIVE TRAINING WITH NOISE ROBUST LOSSES AGAINST NOISY LABELS

    公开(公告)号:US20230196087A1

    公开(公告)日:2023-06-22

    申请号:US17510782

    申请日:2021-10-26

    IPC分类号: G06N3/08 G06N3/04 G06K9/62

    摘要: There is included a method and apparatus comprising computer code for a joint training method using neural networks with noise-robust losses comprising encoding input tokens from a noisy dataset into input vectors using an input encoder; predicting a label based on the input vectors using a classifier model; calculating a beta value based on the input vectors and the label using a label quality predictor model, wherein the beta value is instance-specific for each training instance; and j oint training more than one model using a first modified loss function based on the beta value and an entropy value.