-
公开(公告)号:US11277685B1
公开(公告)日:2022-03-15
申请号:US16180890
申请日:2018-11-05
Applicant: Amazon Technologies, Inc.
Inventor: Robert Ayrapetian , Philip Ryan Hilmes , Mohamed Mansour , Carlo Murgia
IPC: G10L21/02 , H04R3/00 , H04R5/04 , H04R5/027 , G10L21/0224 , G06F3/16 , G10L21/0272 , G10L21/0208 , G10L21/0216 , G10L25/93 , G10L25/51 , H03H21/00 , G10L25/78
Abstract: Techniques for improving adaptive interference cancellation (AIC) using cascaded AIC algorithms are described. To improve an accuracy of detecting speech, a device may perform a first stage of AIC to generate isolated audio data and may generate speech mask data indicating time windows when speech is detected in the isolated audio data. Based on the speech mask data, the device may perform second AIC to generate output audio data, with adaptation of the adaptive filter enabled when the speech is not detected and disabled when the speech is detected. Thus, the first AIC improves the accuracy with which the device detects that speech is present and the second AIC reduces distortion in the output audio data by not updating filter coefficient values when the speech is present. The first AIC may use playback audio data, microphone audio data or beamformed audio data as reference signals.
-
公开(公告)号:US11107488B1
公开(公告)日:2021-08-31
申请号:US16662696
申请日:2019-10-24
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour , Shobha Devi Kuruba Buchannagari
IPC: G10L21/0232 , H04R1/40 , H04R3/00 , G10L21/0208 , H04R3/12 , G10L21/0216
Abstract: A system configured to perform echo cancellation using a reduced number of reference signals. The system may perform multi-channel acoustic echo cancellation (MCAEC) processing on a first portion of a microphone audio signal that corresponds to early reflections and may perform single-channel acoustic echo cancellation (AEC) processing on a second portion of the microphone audio signal that corresponds to late reverberations. For example, the system may use MCAEC processing on a plurality of reference audio signals to generate a first echo estimate signal and may subtract the first echo estimate signal from the microphone audio signal to generate a residual audio signal. The system may delay the first echo estimate signal, perform the AEC processing to generate a second echo estimate signal, and subtract the second echo estimate signal from the residual audio signal to generate an output audio signal. This reduces an overall complexity associated with performing echo cancellation.
-
公开(公告)号:US10147439B1
公开(公告)日:2018-12-04
申请号:US15474197
申请日:2017-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Trausti Thor Kristjansson , Mohamed Mansour , Amit Singh Chhetri , Ludger Solbach
IPC: G06F3/00 , G10L21/0364 , G10L15/22 , G10L21/0232 , G10L13/00 , G10L21/0216
Abstract: A speech-capturing device that can modulate its output audio data volume based on environmental sound conditions at the location of a user speaking to the device. The device detects the sound pressure of a spoken utterance at the device location and determines the distance of the user from the device. The device also detects the sound pressure of noise at the device and uses information about the location of the noise source and user to determine the sound pressure of noise at the location of the talker. The device can then adjust the gain for output audio (such as a spoken response to the utterance) to ensure that the output audio is at a certain desired sound pressure when it reaches the location of the user.
-
公开(公告)号:US12101599B1
公开(公告)日:2024-09-24
申请号:US17952806
申请日:2022-09-26
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour
Abstract: Disclosed are techniques for an improved method for performing sound source localization (SSL) to determine a direction of arrival of an audible sound using a combination of timing information and amplitude information. For example, a device may decompose an observed sound field into directional components, then estimate a time-delay likelihood value and an energy-based likelihood value for each of the directional components. Using a combination of these likelihood values, the device can determine the direction of arrival corresponding to a maximum likelihood value. In some examples, the device may perform Acoustic Wave Decomposition processing to determine the directional components. In order to reduce a processing consumption associated with performing AWD processing, the device splits this process into two phases: a search phase that selects a subset of a device dictionary to reduce a complexity, and a decomposition phase that solves an optimization problem using the subset of the device dictionary.
-
公开(公告)号:US11785409B1
公开(公告)日:2023-10-10
申请号:US17529560
申请日:2021-11-18
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour
IPC: H04S7/00 , G10L19/008 , H04R3/00
CPC classification number: H04S7/302 , G10L19/008 , H04R3/005
Abstract: Disclosed are techniques for an improved method for performing Acoustic Wave Decomposition (AWD) processing that reduces a complexity and processing consumption. The improved method enables a device to perform AWD processing to decompose an observed sound field into directional components, enabling the device to perform additional processing such as sound source separation, dereverberation, sound source localization, sound field reconstruction, and/or the like. The improved method splits the solution to two phases: a search phase that selects a subset of a device dictionary to reduce a complexity, and a decomposition phase that solves an optimization problem using the subset of the device dictionary.
-
公开(公告)号:US11483644B1
公开(公告)日:2022-10-25
申请号:US17222275
申请日:2021-04-05
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour
Abstract: A system that performs early reflections filtering to suppress early reflections and improve sound source localization (SSL). During music playback and/or when a device is placed in a corner, acoustic reflections from nearby surfaces get boosted due to constructive interference, negatively impacting SSL and other processing of the device. To suppress these early reflections, the device uses an Early Reflections Filter (ERF) that makes use of Linear Prediction Coding (LPC), which is already being performed during speech processing. For example, the device generates raw audio signals using multi-channel LPC coefficients and then uses single-channel LPC coefficients for each raw audio signal in order to generate a filter that estimates the reflections. The device then uses this filter to suppress the early reflections and generate filtered audio signals, thus resulting in better audio processing and better overall device performance.
-
公开(公告)号:US11380312B1
公开(公告)日:2022-07-05
申请号:US16447550
申请日:2019-06-20
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour
IPC: G10L15/22 , G10L15/20 , G10L25/21 , G10L25/60 , G10L15/06 , G10L15/08 , G10L21/0232 , G10L25/84 , G10L21/0208
Abstract: A system configured to improve wakeword detection. The system may selectively rectify (e.g., attenuate) a portion of an audio signal based on energy statistics corresponding to a keyword (e.g., wakeword). For example, a device may perform echo cancellation to generate isolated audio data, may use the energy statistics to calculate signal quality metric values for a plurality of frequency bands of the isolated audio data, and may select a fixed number of frequency bands (e.g., 5-10%) associated with lowest signal quality metric values. To detect a specific keyword, the system determines a threshold λ(f) corresponding to an expected energy value at each frequency band. During runtime, the device determines signal quality metric values by subtracting residual music from the expected energy values. Thus, the device attenuates only a portion of the total number of frequency bands that include more energy than expected based on the energy statistics of the wakeword.
-
公开(公告)号:US10978081B2
公开(公告)日:2021-04-13
申请号:US16141578
申请日:2018-09-25
Applicant: Amazon Technologies, Inc.
Inventor: Yuan-Yen Tai , Mohamed Mansour , Parind Shah
IPC: G10L19/018 , G10L19/16 , G10L13/08 , G10L15/22 , G10L15/05
Abstract: A system may embed audio watermarks in audio data using a sign sequence. The system may detect audio watermarks in audio data despite the effects of reverberation. For example, the system may embed multiple repetitions of an audio watermark before generating output audio using loudspeaker(s). To detect the audio watermark in audio data generated by a microphone, the system may perform a self-correlation that indicates where the audio watermark is repeated. In some examples, the system may encode the audio watermark using multiple repetitions of a multi-segment Eigenvector. Additionally or alternatively, the system may encode the audio watermark using a binary sequence of positive and negative values, which may be used as a shared key for encoding/decoding the audio watermark. The audio watermark can be embedded in output audio data to enable wakeword suppression (e.g., avoid cross-talk between devices) and/or local signal transmission between devices in proximity to each other.
-
公开(公告)号:US20250016499A1
公开(公告)日:2025-01-09
申请号:US18889896
申请日:2024-09-19
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour
Abstract: Disclosed are techniques for an improved method for performing sound source localization (SSL) to determine a direction of arrival of an audible sound using a combination of timing information and amplitude information. For example, a device may decompose an observed sound field into directional components, then estimate a time-delay likelihood value and an energy-based likelihood value for each of the directional components. Using a combination of these likelihood values, the device can determine the direction of arrival corresponding to a maximum likelihood value. In some examples, the device may perform Acoustic Wave Decomposition processing to determine the directional components. In order to reduce a processing consumption associated with performing AWD processing, the device splits this process into two phases: a search phase that selects a subset of a device dictionary to reduce a complexity, and a decomposition phase that solves an optimization problem using the subset of the device dictionary.
-
公开(公告)号:US12143789B1
公开(公告)日:2024-11-12
申请号:US17825613
申请日:2022-05-26
Applicant: Amazon Technologies, Inc.
Inventor: Spencer Russell , Carlos Renato Nakagawa , Mohamed Mansour
Abstract: A system configured to improve user localization used to determine a listening position and/or user orientation for a device map. Multiple devices may generate audio data representing user speech and the system may use the audio data to determine a first spatial likelihood function (SLF) based on angle measurements, determine a second SLF based on timing information, and determine a location of the user based on a combination of the two SLFs. The SLFs represent the environment using a grid comprising a plurality of grid cells, and each grid cell has a value indicating a likelihood that the grid cell corresponds to the location of the user. An individual device may generate a portion of the angle measurements based on multi-channel audio data generated using multiple microphones of the device, while the system may generate the timing information based on single-channel audio data received from each of the multiple devices.
-
-
-
-
-
-
-
-
-