-
1.
公开(公告)号:US11763832B2
公开(公告)日:2023-09-19
申请号:US16865111
申请日:2020-05-01
发明人: Francesco Nesta , Minje Kim , Sanna Wager
IPC分类号: G10L21/00 , G10L15/16 , G10L21/0264 , G10L21/0216 , G06N3/08
CPC分类号: G10L21/0264 , G06N3/08 , G10L21/0216
摘要: Systems and methods for generating an enhanced audio signal comprise a trained neural network configured to receive an input audio signal and generate an enhanced target signal, the trained neural network comprising a pre-processing neural network configured to receive a segment of the input audio signal and output an audio classification, the pre-processing neural network including at least one hidden layer comprising an embedding vector, and a noise reduction neural network configured to receive the segment of the input audio signal, and the embedding vector and generate the enhanced target signal. The pre-processing neural network may comprise a target signal pre-processing neural network configured to output a target signal classification and comprising at least one hidden layer comprising a target embedding vector. The pre-processing neural network may comprise a noise pre-processing neural network configured output a noise classification and comprising at least one hidden layer comprising a noise embedding vector.
-
公开(公告)号:US20210314701A1
公开(公告)日:2021-10-07
申请号:US17349589
申请日:2021-06-16
摘要: Embodiments described herein provide a combined multi-source time difference of arrival (TDOA) tracking and voice activity detection (VAD) mechanism that is applicable for generic array geometries, e.g., a microphone array that lies on a plane. The combined multi-source TDOA tracking and VAD mechanism scans the azimuth and elevation angles of the microphone array in microphone pairs, based on which a planar locus of physically admissible TDOAs can be formed in the multi-dimensional TDOA space of multiple microphone pairs. In this way, the multi-dimensional TDOA tracking reduces the number of calculations that was usually involved in traditional TDOA by performing the TDOA search for each dimension separately.
-
公开(公告)号:US10762417B2
公开(公告)日:2020-09-01
申请号:US15894872
申请日:2018-02-12
摘要: A classification system and method for training a neural network includes receiving a stream of segmented, labeled training data having a sequence of frames, computing a stream of input features data for the sequence of frames, and generating neural network outputs for the sequence of frames in a forward pass through the training data and in accordance weights and biases. The weights and biases are updated in a backward pass through the training data, including determining Region of Target (ROT) information from the segmented, labeled training data, computing modified forward and backward variables based on the neural network outputs and the ROT information, deriving a signal error for each frame within the sequence of frames based on the modified forward and backward variables, and updating the weights and biases based on the derived signal error. An adaptive learning module is provided to improve a convergence rate of the neural network.
-
公开(公告)号:US10679617B2
公开(公告)日:2020-06-09
申请号:US15833977
申请日:2017-12-06
IPC分类号: G10L21/00 , G10L15/20 , G10L15/22 , G10L21/038 , G10L25/84 , H04R3/00 , G10L21/0232 , G10L25/21 , H04R27/00 , H04R1/40 , G10L25/78 , H04R5/04 , G10L21/0216 , H04R5/02
摘要: A real-time audio signal processing system includes an audio signal processor configured to process audio signals using a modified generalized eigenvalue (GEV) beamforming technique to generate an enhanced target audio output signal. The digital signal processor includes a sub-band decomposition circuitry configured to decompose the audio signal into sub-band frames in the frequency domain and a target activity detector configured to detect whether a target audio is present in the sub-band frames. Based on information related to the sub-band frames and the determination of whether the target audio is present in the sub-band frames, the digital signal processor is configured to use the modified GEV technique to estimate the relative transfer function (RTF) of the target audio source, and generate a filter based on the estimated RTF. The filter may then be applied to the audio signals to generate the enhanced audio output signal.
-
公开(公告)号:US20180308503A1
公开(公告)日:2018-10-25
申请号:US15957829
申请日:2018-04-19
IPC分类号: G10L21/0232 , G10L21/038
CPC分类号: G10L21/0232 , G10L21/038 , G10L2021/02082
摘要: Systems and methods for processing an audio signal include an audio input operable to receive an input signal comprising a time-domain, single-channel audio signal, a subband analysis block operable to transform the input signal to a frequency domain input signal comprising a plurality of k-spaced under-sampled subband signals, a reverberation reduction block operable to reduce reverberation effect, including late reverberation, in the plurality of k-spaced under-sampled subband signals, a noise reduction block operable to reduce background noise from the plurality of k-spaced under-sampled subband signals, and a subband synthesis block operable to transform the subband signals to the time-domain, thereby producing an enhanced output signal.
-
公开(公告)号:US10038795B2
公开(公告)日:2018-07-31
申请号:US15701374
申请日:2017-09-11
CPC分类号: H04M9/082
摘要: A method for echo cancellation in multichannel audio signals includes receiving a plurality of time-domain signals, including multichannel audio signals and at least one reference signal, transforming the time-domain signals to K under-sampled complex-valued subband signals using an analysis filter bank. A probability of acoustic echo dominance is produced using a single-double talk estimator, and a multichannel source separation is performed based on the probability to decompose the audio signals into a near-end source signal and a residual echoes using source separation. The residual echo components are removed from the near-end source signal using a spectral filter bank, and the subband audio signals are reconstructed to a multichannel time-domain audio signal using a subband synthesis filter.
-
7.
公开(公告)号:US20170374201A1
公开(公告)日:2017-12-28
申请号:US15701374
申请日:2017-09-11
IPC分类号: H04M9/08
CPC分类号: H04M9/082
摘要: A method for echo cancellation in multichannel audio signals includes receiving a plurality of time-domain signals, including multichannel audio signals and at least one reference signal, transforming the time-domain signals to K under-sampled complex-valued subband signals using an analysis filter bank. A probability of acoustic echo dominance is produced using a single-double talk estimator, and a multichannel source separation is performed based on the probability to decompose the audio signals into a near-end source signal and a residual echoes using source separation. The residual echo components are removed from the near-end source signal using a spectral filter bank, and the subband audio signals are reconstructed to a multichannel time-domain audio signal using a subband synthesis filter.
-
公开(公告)号:US11694710B2
公开(公告)日:2023-07-04
申请号:US17484208
申请日:2021-09-24
IPC分类号: G10L21/0364 , G10L25/60 , G10L15/22 , G10L25/84 , H04R1/40 , H04R3/00 , H04S3/00 , H04L65/60
CPC分类号: G10L21/0364 , G10L15/22 , G10L25/60 , G10L25/84 , H04R1/406 , H04R3/005 , H04S3/008 , H04L65/60 , H04S2400/01
摘要: Audio processing systems and methods include an audio sensor array configured to receive a multichannel audio input and generate a corresponding multichannel audio signal and target-speech detection logic and an automatic speech recognition engine or VoIP application. An audio processing device includes a target speech enhancement engine configured to analyze a multichannel audio input signal and generate a plurality of enhanced target streams, a multi-stream target-speech detection generator comprising a plurality of target-speech detector engines each configured to determine a probability of detecting a specific target-speech of interest in the stream, wherein the multi-stream target-speech detection generator is configured to determine a plurality of weights associated with the enhanced target streams, and a fusion subsystem configured to apply the plurality of weights to the enhanced target streams to generate an enhancement output signal.
-
公开(公告)号:US11373667B2
公开(公告)日:2022-06-28
申请号:US15957829
申请日:2018-04-19
IPC分类号: H04B15/00 , G10L21/0232 , G10L21/038 , G10L21/0208 , G10L25/18 , G10L21/00
摘要: Systems and methods for processing an audio signal include an audio input operable to receive an input signal comprising a time-domain, single-channel audio signal, a subband analysis block operable to transform the input signal to a frequency domain input signal comprising a plurality of k-spaced under-sampled subband signals, a reverberation reduction block operable to reduce reverberation effect, including late reverberation, in the plurality of k-spaced under-sampled subband signals, a noise reduction block operable to reduce background noise from the plurality of k-spaced under-sampled subband signals, and a subband synthesis block operable to transform the subband signals to the time-domain, thereby producing an enhanced output signal.
-
10.
公开(公告)号:US11264017B2
公开(公告)日:2022-03-01
申请号:US16900790
申请日:2020-06-12
摘要: Systems and methods include a plurality of audio input components configured to generate a plurality of audio input signals, and a logic device configured to receive the plurality of audio input signals, determine whether the plurality of audio signals comprise target audio associated with an audio source, estimate a relative location of the audio source with respect to the plurality of audio input components based on the plurality of audio signals and a determination of whether the plurality of audio signals comprise the target audio, and process the plurality of audio signals to generate an audio output signal by enhancing the target audio based on the estimated relative location. The logic device is further configured to use relative transfer-based covariance to construct directional covariance matrix aligned across frequency bands and find a direction that minimizes beam power subject to distortionless criteria.
-
-
-
-
-
-
-
-
-