Audio enhancement through supervised latent variable representation of target speech and noise

    公开(公告)号:US11763832B2

    公开(公告)日:2023-09-19

    申请号:US16865111

    申请日:2020-05-01

    摘要: Systems and methods for generating an enhanced audio signal comprise a trained neural network configured to receive an input audio signal and generate an enhanced target signal, the trained neural network comprising a pre-processing neural network configured to receive a segment of the input audio signal and output an audio classification, the pre-processing neural network including at least one hidden layer comprising an embedding vector, and a noise reduction neural network configured to receive the segment of the input audio signal, and the embedding vector and generate the enhanced target signal. The pre-processing neural network may comprise a target signal pre-processing neural network configured to output a target signal classification and comprising at least one hidden layer comprising a target embedding vector. The pre-processing neural network may comprise a noise pre-processing neural network configured output a noise classification and comprising at least one hidden layer comprising a noise embedding vector.

    MULTIPLE-SOURCE TRACKING AND VOICE ACTIVITY DETECTIONS FOR PLANAR MICROPHONE ARRAYS

    公开(公告)号:US20210314701A1

    公开(公告)日:2021-10-07

    申请号:US17349589

    申请日:2021-06-16

    摘要: Embodiments described herein provide a combined multi-source time difference of arrival (TDOA) tracking and voice activity detection (VAD) mechanism that is applicable for generic array geometries, e.g., a microphone array that lies on a plane. The combined multi-source TDOA tracking and VAD mechanism scans the azimuth and elevation angles of the microphone array in microphone pairs, based on which a planar locus of physically admissible TDOAs can be formed in the multi-dimensional TDOA space of multiple microphone pairs. In this way, the multi-dimensional TDOA tracking reduces the number of calculations that was usually involved in traditional TDOA by performing the TDOA search for each dimension separately.

    Efficient connectionist temporal classification for binary classification

    公开(公告)号:US10762417B2

    公开(公告)日:2020-09-01

    申请号:US15894872

    申请日:2018-02-12

    摘要: A classification system and method for training a neural network includes receiving a stream of segmented, labeled training data having a sequence of frames, computing a stream of input features data for the sequence of frames, and generating neural network outputs for the sequence of frames in a forward pass through the training data and in accordance weights and biases. The weights and biases are updated in a backward pass through the training data, including determining Region of Target (ROT) information from the segmented, labeled training data, computing modified forward and backward variables based on the neural network outputs and the ROT information, deriving a signal error for each frame within the sequence of frames based on the modified forward and backward variables, and updating the weights and biases based on the derived signal error. An adaptive learning module is provided to improve a convergence rate of the neural network.

    Robust acoustic echo cancellation for loosely paired devices based on semi-blind multichannel demixing

    公开(公告)号:US10038795B2

    公开(公告)日:2018-07-31

    申请号:US15701374

    申请日:2017-09-11

    IPC分类号: H04R9/08 H04M9/08

    CPC分类号: H04M9/082

    摘要: A method for echo cancellation in multichannel audio signals includes receiving a plurality of time-domain signals, including multichannel audio signals and at least one reference signal, transforming the time-domain signals to K under-sampled complex-valued subband signals using an analysis filter bank. A probability of acoustic echo dominance is produced using a single-double talk estimator, and a multichannel source separation is performed based on the probability to decompose the audio signals into a near-end source signal and a residual echoes using source separation. The residual echo components are removed from the near-end source signal using a spectral filter bank, and the subband audio signals are reconstructed to a multichannel time-domain audio signal using a subband synthesis filter.

    ROBUST ACOUSTIC ECHO CANCELLATION FOR LOOSELY PAIRED DEVICES BASED ON SEMI-BLIND MULTICHANNEL DEMIXING

    公开(公告)号:US20170374201A1

    公开(公告)日:2017-12-28

    申请号:US15701374

    申请日:2017-09-11

    IPC分类号: H04M9/08

    CPC分类号: H04M9/082

    摘要: A method for echo cancellation in multichannel audio signals includes receiving a plurality of time-domain signals, including multichannel audio signals and at least one reference signal, transforming the time-domain signals to K under-sampled complex-valued subband signals using an analysis filter bank. A probability of acoustic echo dominance is produced using a single-double talk estimator, and a multichannel source separation is performed based on the probability to decompose the audio signals into a near-end source signal and a residual echoes using source separation. The residual echo components are removed from the near-end source signal using a spectral filter bank, and the subband audio signals are reconstructed to a multichannel time-domain audio signal using a subband synthesis filter.

    Robust speaker localization in presence of strong noise interference systems and methods

    公开(公告)号:US11264017B2

    公开(公告)日:2022-03-01

    申请号:US16900790

    申请日:2020-06-12

    IPC分类号: G10L15/20 H04S3/00

    摘要: Systems and methods include a plurality of audio input components configured to generate a plurality of audio input signals, and a logic device configured to receive the plurality of audio input signals, determine whether the plurality of audio signals comprise target audio associated with an audio source, estimate a relative location of the audio source with respect to the plurality of audio input components based on the plurality of audio signals and a determination of whether the plurality of audio signals comprise the target audio, and process the plurality of audio signals to generate an audio output signal by enhancing the target audio based on the estimated relative location. The logic device is further configured to use relative transfer-based covariance to construct directional covariance matrix aligned across frequency bands and find a direction that minimizes beam power subject to distortionless criteria.