-
公开(公告)号:US11830471B1
公开(公告)日:2023-11-28
申请号:US17007681
申请日:2020-08-31
Applicant: Amazon Technologies, Inc.
Inventor: Mohamed Mansour , Wontak Kim , Yuancheng Luo
CPC classification number: G10K11/36 , G06T15/06 , H04R3/005 , G10K2210/505 , H04R2203/12
Abstract: Disclosed are techniques for performing ray-based acoustic modeling that models scattering of acoustic waves by a surface of a device. The acoustic modeling uses two parameters, a room response representing acoustics and geometry of a room and a device response representing acoustics and geometry of the device. The room response is determined using ray-based acoustic modeling, such as ray tracing. The device response can be measured in an actual environment or simulated and represents an acoustic response of the device to individual acoustic plane waves. The device applies a superposition of the room response and the plane wave scattering from the device response to determine acoustic pressure values and generate microphone audio data. The device can estimate room impulse response (RIR) data using data from the microphones, and can use the RIR data to perform audio processing such as sound equalization, acoustic echo cancellation, audio beamforming, and/or the like.
-
公开(公告)号:US20210120332A1
公开(公告)日:2021-04-22
申请号:US17001854
申请日:2020-08-25
Applicant: Amazon Technologies, Inc.
Inventor: Yuancheng Luo , Wontak Kim , Mihir Dhananjay Shetye
IPC: H04R1/40 , H04R25/00 , H04S3/00 , G10L19/008 , H04R5/02
Abstract: A system configured to improve spatial coverage of output audio and a corresponding user experience by performing upmixing and loudspeaker beamforming to stereo input signals. The system can perform upmixing to the stereo (e.g., two channel) input signal to extract a center channel and generate three-channel audio data. The system may then perform loudspeaker beamforming to the three-channel audio data to enable two loudspeakers to generate output audio having three distinct beams. The user may interpret the three distinct beams as originating from three separate locations, resulting in the user perceiving a wide virtual sound stage despite the loudspeakers being spaced close together on the device.
-
公开(公告)号:US12288566B1
公开(公告)日:2025-04-29
申请号:US17849823
申请日:2022-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Anshuman Ganguly , Srivatsan Kandadai , Trausti Thor Kristjansson , Wontak Kim
IPC: G10L21/0216 , G10L21/0264 , G10L25/51 , G10L25/78
Abstract: A device capable of using data from multiple sensors to determine an estimated position/direction of a user with respect to the device. The device may use estimated position data, along with confidence data, that originated from a plurality of sensors to fuse the data to determine the user's estimated position and comprehensive confidence of the estimated position. The system may use the location information to perform beamforming/beam steering and/or other downstream operations using the comprehensive estimated position.
-
公开(公告)号:US11762052B1
公开(公告)日:2023-09-19
申请号:US17475888
申请日:2021-09-15
Applicant: Amazon Technologies, Inc.
Inventor: Anshuman Ganguly , Mrudula V. Athi , Spencer Russell , Alexander M. Epstein , Wontak Kim
CPC classification number: G01S3/8083 , G01S5/20 , G06T7/70 , G10L15/22 , G10L2015/223
Abstract: Techniques for improving sound source localization (SSL) are provided. A method for probabilistic SSL using a deep neural network (DNN) may include receiving audio data including a representation of audio such as a wakeword from a microphone array. The audio data may be processed by a DNN to output a plurality of values where each value indicates a probability that the audio originated from a direction corresponding to that value. A sensor may provide computer vision or other data which may be used to inform the plurality of values based on detecting presence of a human or obstacle. A probability that the audio originated from one of the directions of the plurality of directions may be determined based at least in part on the DNN output and the computer vision or other data.
-
公开(公告)号:US12192695B1
公开(公告)日:2025-01-07
申请号:US17708607
申请日:2022-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Andrew Jackson Stockton X , Mihir Dhananjay Shetye , Wontak Kim , Zhen Sun
IPC: H04R1/10 , G10K11/178
Abstract: A wearable audio output device (e.g., headphones) having an open design that allows ambient noise to pass to a listener without physically isolating the listener from a surrounding environment. The device may include an open earcup design that may partially or completely surround the listener's ear, and in some examples a portion of the listener's head may be uncovered by the open earcup. To improve comfort, the device includes a floating audio component configured to generate output audio in a direction of the listener's ear without contacting the listener's ear. To reduce an amount of ambient noise, the device may be configured to perform active noise cancellation (ANC) processing using feedforward and/or feedback microphones. The device may include an acoustic structure configured to direct the output audio in the direction of the listener's ear and/or position the feedback microphone(s) closer to the listener's ear.
-
公开(公告)号:US11128953B2
公开(公告)日:2021-09-21
申请号:US17001854
申请日:2020-08-25
Applicant: Amazon Technologies, Inc.
Inventor: Yuancheng Luo , Wontak Kim , Mihir Dhananjay Shetye
IPC: H04R1/40 , H04R25/00 , H04S3/00 , G10L19/008 , H04R5/02
Abstract: A system configured to improve spatial coverage of output audio and a corresponding user experience by performing upmixing and loudspeaker beamforming to stereo input signals. The system can perform upmixing to the stereo (e.g., two channel) input signal to extract a center channel and generate three-channel audio data. The system may then perform loudspeaker beamforming to the three-channel audio data to enable two loudspeakers to generate output audio having three distinct beams. The user may interpret the three distinct beams as originating from three separate locations, resulting in the user perceiving a wide virtual sound stage despite the loudspeakers being spaced close together on the device.
-
公开(公告)号:US10764676B1
公开(公告)日:2020-09-01
申请号:US16573472
申请日:2019-09-17
Applicant: Amazon Technologies, Inc.
Inventor: Yuancheng Luo , Wontak Kim , Mihir Dhananjay Shetye
IPC: H04R1/40 , H04R25/00 , H04S3/00 , G10L19/008 , H04R5/02
Abstract: A system configured to improve spatial coverage of output audio and a corresponding user experience by performing upmixing and loudspeaker beamforming to stereo input signals. The system can perform upmixing to the stereo (e.g., two channel) input signal to extract a center channel and generate three-channel audio data. The system may then perform loudspeaker beamforming to the three-channel audio data to enable two loudspeakers to generate output audio having three distinct beams. The user may interpret the three distinct beams as originating from three separate locations, resulting in the user perceiving a wide virtual sound stage despite the loudspeakers being spaced close together on the device.
-
公开(公告)号:US12200449B1
公开(公告)日:2025-01-14
申请号:US18081477
申请日:2022-12-14
Applicant: Amazon Technologies, Inc.
Inventor: Mahathir Monjur , Mrudula V Athi , Md Tamzeed Islam , Wontak Kim
Abstract: A system configured to perform user orientation estimation to determine a direction a user is facing using a deep neural network (DNN). As a directionality of human speech increases with frequency, the DNN may estimate the user orientation by comparing high-frequency components detected by each of the multiple devices. For example, a group of devices may individually generate feature data, which represents audio features and spatial information, and send the feature data to the other devices. Thus, each device in the group receives feature data generated by the other devices and processes this feature data using a DNN to determine an estimate of user orientation. In some examples, the DNN may also generate sound source localization (SSL) data and/or a confidence score associated with the user orientation estimate. A post-processing step may process the individual user orientation estimates generated by the individual devices and determine a final user orientation estimate.
-
公开(公告)号:US11158335B1
公开(公告)日:2021-10-26
申请号:US16368107
申请日:2019-03-28
Applicant: Amazon Technologies, Inc.
Inventor: Anshuman Ganguly , Srivatsan Kandadai , Wontak Kim
Abstract: A voice-controlled device includes a beamformer for determining audio data corresponding to one or more directions and a beam selector for selecting in which direction a source of target audio lies. The device determines magnitude spectrums for each beam and for each frequency bin in each beam for each frame of audio data. The device determines frame-by-frame changes in the magnitude and filters the changes to smooth them. The device selects the beam having the greatest smoothed change in magnitude as corresponding to speech.
-
公开(公告)号:US10986437B1
公开(公告)日:2021-04-20
申请号:US16014275
申请日:2018-06-21
Applicant: Amazon Technologies, Inc.
Inventor: Guangdong Pan , Chad Jackman , Wontak Kim
Abstract: A beamformer system isolates a desired direction of an audio signal received from a first microphone array disposed on a first plane of the system and a second microphone array disposed on a second plane of the system. A spatial covariance matrix (SCM) defines the spatial covariance between pairs of microphones. A diagonal of the SCM is varied based on the placement of the microphones; values corresponding to one microphone array are increased, and values corresponding to the other microphone array are decreased.
-
-
-
-
-
-
-
-
-