Projection-Based Audio Object Extraction from Audio Content

    公开(公告)号:US20170344852A1

    公开(公告)日:2017-11-30

    申请号:US15538306

    申请日:2015-12-18

    Abstract: A method is disclosed for audio object extraction from an audio content which includes identifying a first set of projection spaces including a first subset for a first channel and a second subset for a second channel of the plurality of channels. The method may further include determining a first set of correlations between the first and second channels, each of the first set of correlations corresponding to one of the first subset of projection spaces and one of the second subset of projection spaces. Still further, the method may include extracting an audio object from an audio signal of the first channel at least in part based on a first correlation among the first set of correlations and the projection space from the first subset corresponding to the first correlation, the first correlation being greater than a first predefined threshold. Corresponding system and computer program products are also disclosed.

    AUDIO OBJECT CLUSTERING BY UTILIZING TEMPORAL VARIATIONS OF AUDIO OBJECTS
    45.
    发明申请
    AUDIO OBJECT CLUSTERING BY UTILIZING TEMPORAL VARIATIONS OF AUDIO OBJECTS 有权
    使用音频对象的时间变化的音频对象聚类

    公开(公告)号:US20160358618A1

    公开(公告)日:2016-12-08

    申请号:US15117647

    申请日:2015-02-23

    Abstract: Embodiments of the present invention relate to audio object clustering by utilizing temporal variation of audio objects. There is provided a method of estimating temporal variation of an audio object for use in audio object clustering. The method comprises obtaining at least one segment of an audio track associated with the audio object, the at least one segment containing the audio object; estimating variation of the audio object over a time duration of the at least one segment based on at least one property of the audio object and adjusting, at least partially based on the estimated variation of the audio object, a contribution of the audio object to the determination of a centroid in the audio object clustering. Corresponding system and computer program product are disclosed.

    Abstract translation: 本发明的实施例涉及通过利用音频对象的时间变化的音频对象聚类。 提供了一种估计用于音频对象聚类的音频对象的时间变化的方法。 所述方法包括获得与所述音频对象相关联的音轨的至少一个段,所述至少一个段包含所述音频对象; 基于所述音频对象的至少一个属性来估计所述音频对象在所述至少一个段的持续时间上的变化,并且至少部分地基于所估计的所述音频对象的变化来调整所述音频对象对所述音频对象的贡献 确定音频对象聚类中的质心。 披露了相应的系统和计算机程序产品。

    DETERMINING DIALOG QUALITY METRICS OF A MIXED AUDIO SIGNAL

    公开(公告)号:US20240071411A1

    公开(公告)日:2024-02-29

    申请号:US18259848

    申请日:2022-01-04

    CPC classification number: G10L25/60 G10L21/0272

    Abstract: Disclosed is a method for determining one or more dialog quality metrics of a mixed audio signal comprising a dialog component and a noise component, the method comprising separating an estimated dialog component from the mixed audio signal by means of a dialog separator using a dialog separating model determined by training the dialog separator based on the one or more quality metrics; providing the estimated dialog component from the dialog separator to a quality metrics estimator; and determining the one or more quality metrics by means of the quality metrics estimator based on the mixed signal and the estimated dialog component. Further disclosed is a method for training a dialog separator, a system comprising circuitry configured to perform the method, and a non-transitory computer-readable storage medium.

    METHODS, APPARATUS, AND SYSTEMS FOR DETECTION AND EXTRACTION OF SPATIALLY-IDENTIFIABLE SUBBAND AUDIO SOURCES

    公开(公告)号:US20230245671A1

    公开(公告)日:2023-08-03

    申请号:US18009501

    申请日:2021-06-11

    CPC classification number: G10L21/0272

    Abstract: In an embodiment, a method comprises: transforming one or more frames of a two-channel time domain audio signal into a time-frequency domain representation including a plurality of time-frequency tiles, wherein the frequency domain of the time-frequency domain representation includes a plurality of frequency bins grouped into subbands. For each time-frequency tile, the method comprises: calculating spatial parameters and a level for the time-frequency tile; modifying the spatial parameters using shift and squeeze parameters; obtaining a softmask value for each frequency bin using the modified spatial parameters, the level and subband information; and applying the softmask values to the time-frequency tile to generate a modified time-frequency tile of an estimated audio source. In an embodiment, a plurality of frames of the time-frequency tiles are assembled into a plurality of chunks, wherein each chunk includes a plurality of subbands, and the method described above is performed on each subband of each chunk.

Patent Agency Ranking