-
公开(公告)号:US11514928B2
公开(公告)日:2022-11-29
申请号:US16708296
申请日:2019-12-09
Applicant: Apple Inc.
Inventor: Mehrez Souden , Ante Jukic , Jason Wung , Ashrith Deshpande , Joshua D. Atkins
IPC: G10L25/78 , G10L25/81 , G10L25/18 , G10L21/0232 , G10L15/22 , G06N7/00 , G06N20/00 , G10L15/25 , G06V40/16 , G10L21/0208 , G10L21/0216 , G10L17/00
Abstract: A device implementing a system for processing speech in an audio signal includes at least one processor configured to receive an audio signal corresponding to at least one microphone of a device, and to determine, using a first model, a first probability that a speech source is present in the audio signal. The at least one processor is further configured to determine, using a second model, a second probability that an estimated location of a source of the audio signal corresponds to an expected position of a user of the device, and to determine a likelihood that the audio signal corresponds to the user of the device based on the first and second probabilities.
-
公开(公告)号:US11012774B2
公开(公告)日:2021-05-18
申请号:US16575178
申请日:2019-09-18
Applicant: Apple Inc.
Inventor: Jonathan D. Sheaffer , Joshua D. Atkins , Peter A. Raffensperger , Symeon Delikaris Manias
Abstract: A method for producing a target directivity function that includes a set of spatially biased HRTFs. A set of left ear and right ear head related transfer functions (HRTFs) are selected. The left ear and right ear head HRTFs are multiplied with an on-camera emphasis function (OCE), to produce the spatially biased HRTFs. The OCE may be designed to shape the sound profile of the HRTFs to provide emphasis in a desired location or direction that is a function of the specific orientation of the device as it is being used to make a video recording. Other aspects are also described and claimed.
-
公开(公告)号:US10546593B2
公开(公告)日:2020-01-28
申请号:US15830955
申请日:2017-12-04
Applicant: Apple Inc.
Inventor: Jason Wung , Mehrez Souden , Ramin Pishehvar , Joshua D. Atkins
IPC: G10L21/00 , G10L19/00 , G10L21/02 , G10L15/02 , G10L21/0232 , G10L25/30 , H04R1/40 , G10L25/03 , G10L21/0208
Abstract: A number of features are extracted from a current frame of a multi-channel speech pickup and from side information that is a linear echo estimate, a diffuse signal component, or a noise estimate of the multi-channel speech pickup. A DNN-based speech presence probability is produced for the current frame, where the SPP value is produced in response to the extracted features being input to the DNN. The DNN-based SPP value is applied to configure a multi-channel filter whose input is the multi-channel speech pickup and whose output is a single audio signal. In one aspect, the system is designed to run online, at low enough latency for real time applications such voice trigger detection. Other aspects are also described and claimed.
-
24.
公开(公告)号:US10403299B2
公开(公告)日:2019-09-03
申请号:US15613127
申请日:2017-06-02
Applicant: Apple Inc.
Inventor: Jason Wung , Joshua D. Atkins , Ramin Pishehvar , Mehrez Souden
IPC: H04M9/08 , G10L21/02 , G10L21/0208 , G10L21/0216 , G10L21/0232 , G10L21/0272 , G10L21/038
Abstract: A digital speech enhancement system that performs a specific chain of digital signal processing operations upon multi-channel sound pick up, to result in a single, enhanced speech signal. The operations are designed to be computationally less complex yet as a whole yield an enhanced speech signal that produces accurate voice trigger detection and low word error rates by an automatic speech recognizer. The constituent operations or components of the system have been chosen so that the overall system is robust to changing acoustic conditions, and can deliver the enhanced speech signal with low enough latency so that the system can be used online (enabling real-time, voice trigger detection and streaming ASR.) Other embodiments are also described and claimed.
-
公开(公告)号:US10390131B2
公开(公告)日:2019-08-20
申请号:US15721654
申请日:2017-09-29
Applicant: Apple Inc.
Inventor: Jonathan D. Sheaffer , Darius A. Satongar , Joshua D. Atkins , Martin E. Johnson
Abstract: A microphone array included in a portable electronic device is used to generate various virtual studio microphones by combining one or more microphone signals to produce one or more acoustic pickup beams. An error is determined in a position of the microphone array relative to an audio source to be recorded. An interface is displayed to instruct a user on repositioning the microphone array relative to the instrument and the instrument is recorded using the repositioned microphone array.
-
公开(公告)号:US20190104357A1
公开(公告)日:2019-04-04
申请号:US15721644
申请日:2017-09-29
Applicant: Apple Inc.
Inventor: Joshua D. Atkins , Mehrez Souden , Symeon Delikaris-Manias , Peter Raffensperger
CPC classification number: H04R1/406 , G06F16/61 , G06N3/08 , H04R2201/405 , H04R2430/23 , H04R2499/11 , H04S2400/01 , H04S2400/15
Abstract: Impulse responses of a device are measured. A database of sound files is generated by convolving source signals with the impulse responses of the device. The sound files from the database are transformed into time-frequency domain. One or more sub-band directional features is estimated at each sub-band of the time-frequency domain. A deep neural network (DNN) is trained for each sub-band based on the estimated one or more sub-band directional features and a target directional feature.
-
公开(公告)号:US10034092B1
公开(公告)日:2018-07-24
申请号:US15273396
申请日:2016-09-22
Applicant: Apple Inc.
Inventor: Ismael H. Nawfal , Joshua D. Atkins , Stephen J. Nimick , Guy C. Nicholson , Jason M. Harlow
Abstract: Digital audio signal processing techniques used to provide an acoustic transparency function in a pair of headphones. A number of transparency filters can be computed at once, using optimization techniques or using a closed form solution, that are based on multiple re-seatings of the headphones and that are as a result robust for a population of wearers. In another embodiment, a transparency hearing filter of a headphone is computed by an adaptive system that takes into consideration the changing acoustic to electrical path between an earpiece speaker and an interior microphone of that headphone while worn by a user. Other embodiments are also described and claimed.
-
28.
公开(公告)号:US20180040333A1
公开(公告)日:2018-02-08
申请号:US15227885
申请日:2016-08-03
Applicant: Apple Inc.
Inventor: Jason Wung , Ramin Pishehvar , Daniele Giacobello , Joshua D. Atkins
IPC: G10L21/0232 , G10L25/87 , G10L25/30
CPC classification number: G10L21/0232 , G10L25/30 , G10L25/87 , G10L2021/02082
Abstract: Method for performing speech enhancement using a Deep Neural Network (DNN)-based signal starts with training DNN offline by exciting a microphone using target training signal that includes signal approximation of clean speech. Loudspeaker is driven with a reference signal and outputs loudspeaker signal. Microphone then generates microphone signal based on at least one of: near-end speaker signal, ambient noise signal, or loudspeaker signal. Acoustic-echo-canceller (AEC) generates AEC echo-cancelled signal based on reference signal and microphone signal. Loudspeaker signal estimator generates estimated loudspeaker signal based on microphone signal and AEC echo-cancelled signal. DNN receives microphone signal, reference signal, AEC echo-cancelled signal, and estimated loudspeaker signal and generates a speech reference signal that includes signal statistics for residual echo or for noise. Noise suppressor generates a clean speech signal by suppressing noise or residual echo in the microphone signal based on speech reference signal. Other embodiments are described.
-
公开(公告)号:US12141347B1
公开(公告)日:2024-11-12
申请号:US18055600
申请日:2022-11-15
Applicant: Apple Inc.
Inventor: Mehrez Souden , Symeon Delikaris Manias , Ante Jukic , John Woodruff , Joshua D. Atkins
Abstract: An audio processing device may generate a plurality of microphone signals from a plurality of microphones of the audio processing device. The audio processing device may determine a gaze of a user who is wearing a playback device that is separate from the audio processing device, the gaze of the user being determined relative to the audio processing device. The audio processing device may extract speech that correlates to the gaze of the user, from the plurality of microphone signals of the audio processing device by applying the plurality of microphone signals of the audio processing device and the gaze of the user to a machine learning model. The extracted speech may be played to the user through the playback device.
-
公开(公告)号:US20240236610A1
公开(公告)日:2024-07-11
申请号:US18611407
申请日:2024-03-20
Applicant: Apple Inc.
Inventor: Christopher T. Eubank , Joshua D. Atkins , Soenke Pelzer , Dirk Schroeder
IPC: H04S7/00
CPC classification number: H04S7/305 , H04S2420/01
Abstract: Processing sound in an enhanced reality environment can include generating, based on an image of a physical environment, an acoustic model of the physical environment. Audio signals captured by a microphone array, can capture a sound in the physical environment. Based on these audio signals, one or more measured acoustic parameters of the physical environment can be generated. A target audio signal can be processed using the model of the physical environment and the measured acoustic parameters, resulting in a plurality of output audio channels having a virtual sound source with a virtual location. The output audio channels can be used to drive a plurality of speakers. Other aspects are also described and claimed.
-
-
-
-
-
-
-
-
-