Learning-based distance estimation
    11.
    发明授权

    公开(公告)号:US11222652B2

    公开(公告)日:2022-01-11

    申请号:US16516780

    申请日:2019-07-19

    Applicant: Apple Inc.

    Abstract: A learning based system such as a deep neural network (DNN) is disclosed to estimate a distance from a device to a speech source. The deep learning system may estimate the distance of the speech source at each time frame based on speech signals received by a compact microphone array. Supervised deep learning may be used to learn the effect of the acoustic environment on the non-linear mapping between the speech signals and the distance using multi-channel training data. The deep learning system may estimate the direct speech component that contains information about the direct signal propagation from the speech source to the microphone array and the reverberant speech signal that contains the reverberation effect and noise. The deep learning system may extract signal characteristics of the direct signal component and the reverberant signal component and estimate the distance based on the extracted signal characteristics using the learned mapping.

    Machine learning based sound field analysis

    公开(公告)号:US10334357B2

    公开(公告)日:2019-06-25

    申请号:US15721644

    申请日:2017-09-29

    Applicant: Apple Inc.

    Abstract: Impulse responses of a device are measured. A database of sound files is generated by convolving source signals with the impulse responses of the device. The sound files from the database are transformed into time-frequency domain. One or more sub-band directional features is estimated at each sub-band of the time-frequency domain. A deep neural network (DNN) is trained for each sub-band based on the estimated one or more sub-band directional features and a target directional feature.

    DEEP LEARNING DRIVEN MULTI-CHANNEL FILTERING FOR SPEECH ENHANCEMENT

    公开(公告)号:US20190172476A1

    公开(公告)日:2019-06-06

    申请号:US15830955

    申请日:2017-12-04

    Applicant: Apple Inc.

    Abstract: A number of features are extracted from a current frame of a multi-channel speech pickup and from side information that is a linear echo estimate, a diffuse signal component, or a noise estimate of the multi-channel speech pickup. A DNN-based speech presence probability is produced for the current frame, where the SPP value is produced in response to the extracted features being input to the DNN. The DNN-based SPP value is applied to configure a multi-channel filter whose input is the multi-channel speech pickup and whose output is a single audio signal. In one aspect, the system is designed to run online, at low enough latency for real time applications such voice trigger detection. Other aspects are also described and claimed.

    Spatial Audio Upscaling Using Machine Learning

    公开(公告)号:US20240312468A1

    公开(公告)日:2024-09-19

    申请号:US18605688

    申请日:2024-03-14

    Applicant: Apple Inc.

    CPC classification number: G10L19/008 H04S7/30 H04S2420/11

    Abstract: A sound scene is represented as first order Ambisonics (FOA) audio. A processor formats each signal of the FOA audio to a stream of audio frames, provides the formatted FOA audio to a machine learning model that reformats the formatted FOA audio in a target or desired higher order Ambisonics (HOA) format, and obtains output audio of the sound scene in the desired HOA format from the machine learning model. The output audio in the desired HOA format may then be rendered according to a playback audio format of choice. Other aspects are also described and claimed.

    Extracting Ambience From A Stereo Input
    20.
    发明公开

    公开(公告)号:US20240314509A1

    公开(公告)日:2024-09-19

    申请号:US18605701

    申请日:2024-03-14

    Applicant: Apple Inc.

    CPC classification number: H04S7/30 H04S1/007 H04S2420/11

    Abstract: A sound scene is represented as first order Ambisonics (FOA) audio. A processor formats each signal of the FOA audio to a stream of audio frames, provides the formatted FOA audio to a machine learning model that reformats the formatted FOA audio in a target or desired higher order Ambisonics (HOA) format, and obtains output audio of the sound scene in the desired HOA format from the machine learning model. The output audio in the desired HOA format may then be rendered according to a playback audio format of choice. Other aspects are also described and claimed.

Patent Agency Ranking