-
公开(公告)号:US11830471B1
公开(公告)日:2023-11-28
申请号:US17007681
申请日:2020-08-31
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour , Wontak Kim , Yuancheng Luo
CPC classification number: G10K11/36 , G06T15/06 , H04R3/005 , G10K2210/505 , H04R2203/12
Abstract: Disclosed are techniques for performing ray-based acoustic modeling that models scattering of acoustic waves by a surface of a device. The acoustic modeling uses two parameters, a room response representing acoustics and geometry of a room and a device response representing acoustics and geometry of the device. The room response is determined using ray-based acoustic modeling, such as ray tracing. The device response can be measured in an actual environment or simulated and represents an acoustic response of the device to individual acoustic plane waves. The device applies a superposition of the room response and the plane wave scattering from the device response to determine acoustic pressure values and generate microphone audio data. The device can estimate room impulse response (RIR) data using data from the microphones, and can use the RIR data to perform audio processing such as sound equalization, acoustic echo cancellation, audio beamforming, and/or the like.
-
公开(公告)号:US11222647B2
公开(公告)日:2022-01-11
申请号:US16934668
申请日:2020-07-21
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour , Shobha Devi Kuruba Buchannagari
IPC: H04B15/00 , H04R3/02 , G10L21/0232 , G10L25/21 , G10L25/51 , G10L15/22 , G10L15/08 , G10L21/0208
Abstract: A system configured to perform cascade echo cancellation processing to improve a performance when reference signals are asymmetric (e.g., dominant reference signal(s) overshadow weak reference signal(s)). The system may perform cascade echo cancellation processing to separately adapt filter coefficients between the dominant reference signal(s) and the weak reference signal(s). For example, the system may use a dominant reference signal to process a microphone audio signal and generate a residual audio signal, using the residual audio signal to adapt first filter coefficient values corresponding to the dominant reference signal. Separately, the system may use a weak reference signal to process the residual audio signal and generate an output audio signal, using the output audio signal to adapt second filter coefficient values corresponding to the weak reference signal.
-
公开(公告)号:US10582299B1
公开(公告)日:2020-03-03
申请号:US16216599
申请日:2018-12-11
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour , Guangdong Pan
Abstract: Techniques for simulating a microphone array and generating synthetic audio data to analyze the microphone array geometry. This reduces the development cost of new microphone arrays by enabling an evaluation of performance metrics (False Rejection Rate (FRR), Word Error Rate (WER), etc.) without building device hardware or collecting data. To generate the synthetic audio data, the system performs acoustic modeling to determine a room impulse response associated with a prototype device (e.g., potential microphone array) in a room. The acoustic modeling is based on two parameters—a device response (information about acoustics and geometry of the prototype device) and a room response (information about acoustics and geometry of the room). The device response can be simulated based on the microphone array geometry, and the room response can be determined using a specialized microphone and a plane wave decomposition algorithm.
-
公开(公告)号:US11425495B1
公开(公告)日:2022-08-23
申请号:US17234233
申请日:2021-04-19
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour
IPC: H04R1/40
Abstract: A system that performs sound source localization (SSL) using acoustic wave decomposition (AWD) or an approximation. When a device detects a wakeword represented in audio data, the device performs SSL processing in order to determine a position of the user relative to the device (e.g., estimate angle of the user). The device calculates noise statistics based on first audio data representing the wakeword and second audio data preceding the wakeword. Thus, upon detecting the wakeword, the device calculates the noise statistics and a signal quality metric corresponding to the wakeword. In addition, the device uses Multi-Channel Linear Prediction Coding (MCLPC) coefficients to average out the room impulse response. Using the noise statistics, the MCLPC coefficients, and the audio data, the device performs AWD processing to decompose the sound field to disjoint acoustic plane waves, enabling the device to identify the most likely direction for the line-of-sight component of speech.
-
公开(公告)号:US10986444B2
公开(公告)日:2021-04-20
申请号:US16798706
申请日:2020-02-24
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour , Guangdong Pan
Abstract: Techniques for simulating a microphone array and generating synthetic audio data to analyze the microphone array geometry. This reduces the development cost of new microphone arrays by enabling an evaluation of performance metrics (False Rejection Rate (FRR), Word Error Rate (WER), etc.) without building device hardware or collecting data. To generate the synthetic audio data, the system performs acoustic modeling to determine a room impulse response associated with a prototype device (e.g., potential microphone array) in a room. The acoustic modeling is based on two parameters—a device response (information about acoustics and geometry of the prototype device) and a room response (information about acoustics and geometry of the room). The device response can be simulated based on the microphone array geometry, and the room response can be determined using a specialized microphone and a plane wave decomposition algorithm.
-
公开(公告)号:US10887709B1
公开(公告)日:2021-01-05
申请号:US16582820
申请日:2019-09-25
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour , Carlos Renato Nakagawa
IPC: H04R29/00 , G10K11/34 , H04R1/32 , G10L21/0216 , H04R3/00
Abstract: A system configured to perform aligned beam merger (ABM) processing to combine multiple beamformed signals. The system may capture audio data and perform beamforming to generate beamformed audio signals corresponding to a plurality of directions. The system may apply an ABM algorithm to select a number of the beamformed audio signals, align the selected audio signals, and merge the selected audio signals to generate a distortionless output audio signal. The system may scale the selected audio signals based on relative magnitude and apply a complex correction factor to compensate for a phase error for each of the selected audio signals.
-
公开(公告)号:US10811029B1
公开(公告)日:2020-10-20
申请号:US16669980
申请日:2019-10-31
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour , Shobha Devi Kuruba Buchannagari
IPC: G10K15/08 , H04B15/00 , G10L21/0232 , G10L25/21 , G10L25/51 , G10L15/22 , G10L15/08 , G10L21/0208
Abstract: A system configured to perform cascade echo cancellation processing to improve a performance when reference signals are asymmetric (e.g., dominant reference signal(s) overshadow weak reference signal(s)). The system may perform cascade echo cancellation processing to separately adapt filter coefficients between the dominant reference signal(s) and the weak reference signal(s). For example, the system may use a dominant reference signal to process a microphone audio signal and generate a residual audio signal, using the residual audio signal to adapt first filter coefficient values corresponding to the dominant reference signal. Separately, the system may use a weak reference signal to process the residual audio signal and generate an output audio signal, using the output audio signal to adapt second filter coefficient values corresponding to the weak reference signal.
-
公开(公告)号:US20200098380A1
公开(公告)日:2020-03-26
申请号:US16141578
申请日:2018-09-25
Applicant: Amazon Technologies, Inc.
Inventor: Yuan-Yen Tai , Mohamed Mansour , Parind Shah
IPC: G10L19/018 , G10L19/16 , G10L15/05 , G10L13/08 , G10L15/22
Abstract: A system may embed audio watermarks in audio data using a sign sequence. The system may detect audio watermarks in audio data despite the effects of reverberation. For example, the system may embed multiple repetitions of an audio watermark before generating output audio using loudspeaker(s). To detect the audio watermark in audio data generated by a microphone, the system may perform a self-correlation that indicates where the audio watermark is repeated. In some examples, the system may encode the audio watermark using multiple repetitions of a multi-segment Eigenvector. Additionally or alternatively, the system may encode the audio watermark using a binary sequence of positive and negative values, which may be used as a shared key for encoding/decoding the audio watermark. The audio watermark can be embedded in output audio data to enable wakeword suppression (e.g., avoid cross-talk between devices) and/or local signal transmission between devices in proximity to each other.
-
公开(公告)号:US11812237B2
公开(公告)日:2023-11-07
申请号:US17553976
申请日:2021-12-17
Applicant: Amazon Technologies, Inc.
Inventor: Robert Ayrapetian , Philip Ryan Hilmes , Mohamed Mansour , Carlo Murgia
IPC: G10L21/02 , H04R3/00 , H04R5/04 , H04R5/027 , G10L21/0224 , G06F3/16 , G10L21/0272 , G10L21/0208 , G10L21/0216 , G10L25/93 , G10L25/51 , H03H21/00 , G10L25/78
CPC classification number: H04R3/005 , G06F3/167 , G10L21/02 , G10L21/0208 , G10L21/0224 , G10L21/0272 , H04R5/027 , H04R5/04 , G10L25/51 , G10L25/78 , G10L25/93 , G10L2021/02082 , G10L2021/02166 , H03H21/0012
Abstract: Techniques for improving adaptive interference cancellation (AIC) using cascaded AIC algorithms are described. To improve an accuracy of detecting speech, a device may perform a first stage of AIC to generate isolated audio data and may generate speech mask data indicating time windows when speech is detected in the isolated audio data. Based on the speech mask data, the device may perform second AIC to generate output audio data, with adaptation of the adaptive filter enabled when the speech is not detected and disabled when the speech is detected. Thus, the first AIC improves the accuracy with which the device detects that speech is present and the second AIC reduces distortion in the output audio data by not updating filter coefficient values when the speech is present. The first AIC may use playback audio data, microphone audio data or beamformed audio data as reference signals.
-
公开(公告)号:US11714157B2
公开(公告)日:2023-08-01
申请号:US17174941
申请日:2021-02-12
Applicant: AMAZON TECHNOLOGIES, INC.
CPC classification number: G01S3/8003 , G01S3/7864 , H04R1/406 , H04R3/005
Abstract: A device has a microphone array that acquires sound data and a camera that acquires image data. A portion of the device may be moveable by one or more actuators. Responsive to the user, the portion of the device is moved toward an estimated direction of the user. The estimated direction is based on sensor data including the sound data and the image data. First variance values for individual sound direction values are calculated. Data derived from the image data or data from other sensors may be used to modify the first variance values and determine second data comprising second variances. The second data may be processed to determine the estimated direction of the user. For example, the second data may be processed by both a forward and a backward Kalman filter, and the output combined to determine an estimated direction toward the user.
-
-
-
-
-
-
-
-
-