ROTATION OF SOUND COMPONENTS FOR ORIENTATION-DEPENDENT CODING SCHEMES

    公开(公告)号:US20240013793A1

    公开(公告)日:2024-01-11

    申请号:US18255232

    申请日:2021-12-02

    摘要: Method for encoding scene-based audio is provided. In some implementations, the method involves determining, by an encoder, a spatial direction of a dominant sound component in a frame of an input audio signal. In some implementations, the method involves determining rotation parameters based on the determined spatial direction and a direction preference of a coding scheme to be used to encode the input audio signal. In some implementations, the method involves rotating sound components of the frame based on the rotation parameters such that, after being rotated, the dominant sound component has a spatial direction that aligns with the direction preference of the coding scheme. In some implementations, the method involves encoding the rotated sound components of the frame of the input audio signal using the coding scheme in connection with an indication of the rotation parameters or an indication of the spatial direction of the dominant sound component.

    FRAME-LEVEL PERMUTATION INVARIANT TRAINING FOR SOURCE SEPARATION

    公开(公告)号:US20240005942A1

    公开(公告)日:2024-01-04

    申请号:US18248801

    申请日:2021-10-13

    IPC分类号: G10L21/028

    CPC分类号: G10L21/028

    摘要: Described is a method of training a deep-learning-based system for sound source separation. The system comprises a separation stage for frame-wise extraction of representations of sound sources from a representation of an audio signal, and a clustering stage for generating, for each frame, a vector indicative of an assignment permutation of extracted frames of representations of sound sources to respective sound sources. The representation of the audio signal is a waveform-based representation. The separation stage is trained using frame-level permutation invariant training. Further, the clustering stage is trained to generate embedding vectors for the frames of the audio signal that allow to determine estimates of respective assignment permutations between extracted sound signals and labels of sound sources that had been used for the frames. Also described is a method of using the deep-learning-based system for sound source separation.