Learning-based distance estimation
    12.
    发明授权

    公开(公告)号:US11222652B2

    公开(公告)日:2022-01-11

    申请号:US16516780

    申请日:2019-07-19

    Applicant: Apple Inc.

    Abstract: A learning based system such as a deep neural network (DNN) is disclosed to estimate a distance from a device to a speech source. The deep learning system may estimate the distance of the speech source at each time frame based on speech signals received by a compact microphone array. Supervised deep learning may be used to learn the effect of the acoustic environment on the non-linear mapping between the speech signals and the distance using multi-channel training data. The deep learning system may estimate the direct speech component that contains information about the direct signal propagation from the speech source to the microphone array and the reverberant speech signal that contains the reverberation effect and noise. The deep learning system may extract signal characteristics of the direct signal component and the reverberant signal component and estimate the distance based on the extracted signal characteristics using the learned mapping.

    Machine learning based sound field analysis

    公开(公告)号:US10334357B2

    公开(公告)日:2019-06-25

    申请号:US15721644

    申请日:2017-09-29

    Applicant: Apple Inc.

    Abstract: Impulse responses of a device are measured. A database of sound files is generated by convolving source signals with the impulse responses of the device. The sound files from the database are transformed into time-frequency domain. One or more sub-band directional features is estimated at each sub-band of the time-frequency domain. A deep neural network (DNN) is trained for each sub-band based on the estimated one or more sub-band directional features and a target directional feature.

    DEEP LEARNING DRIVEN MULTI-CHANNEL FILTERING FOR SPEECH ENHANCEMENT

    公开(公告)号:US20190172476A1

    公开(公告)日:2019-06-06

    申请号:US15830955

    申请日:2017-12-04

    Applicant: Apple Inc.

    Abstract: A number of features are extracted from a current frame of a multi-channel speech pickup and from side information that is a linear echo estimate, a diffuse signal component, or a noise estimate of the multi-channel speech pickup. A DNN-based speech presence probability is produced for the current frame, where the SPP value is produced in response to the extracted features being input to the DNN. The DNN-based SPP value is applied to configure a multi-channel filter whose input is the multi-channel speech pickup and whose output is a single audio signal. In one aspect, the system is designed to run online, at low enough latency for real time applications such voice trigger detection. Other aspects are also described and claimed.

    End-to-end time-domain multitask learning for ML-based speech enhancement

    公开(公告)号:US11996114B2

    公开(公告)日:2024-05-28

    申请号:US17321411

    申请日:2021-05-15

    Applicant: Apple Inc.

    CPC classification number: G10L21/0216 G06N20/00 G10L15/16 G10L2021/02166

    Abstract: Disclosed is a multi-task machine learning model such as a time-domain deep neural network (DNN) that jointly generate an enhanced target speech signal and target audio parameters from a mixed signal of target speech and interference signal. The DNN may encode the mixed signal, determine masks used to jointly estimate the target signal and the target audio parameters based on the encoded mixed signal, apply the mask to separate the target speech from the interference signal to jointly estimate the target signal and the target audio parameters, and decode the masked features to enhance the target speech signal and to estimate the target audio parameters. The target audio parameters may include a voice activity detection (VAD) flag of the target speech. The DNN may leverage multi-channel audio signal and multi-modal signals such as video signals of the target speaker to improve the robustness of the enhanced target speech signal.

    Spatial Blending of Audio
    17.
    发明公开

    公开(公告)号:US20240098442A1

    公开(公告)日:2024-03-21

    申请号:US18458077

    申请日:2023-08-29

    Applicant: Apple Inc.

    CPC classification number: H04S7/302 H04S2400/11

    Abstract: An audio processing system may obtain a size of a visual object to present to a display. The audio processing system may determine a virtual placement for each of a plurality of virtual speakers at least based on the size of the visual object. Each of the plurality of virtual speakers may be spatially rendered at each virtual placement through binaural audio, for playback through head-worn speakers. Other aspects are also described and claimed.

Patent Agency Ranking