Adjusting Spatial Congruency in a Video Conferencing System

    公开(公告)号:US20170374317A1

    公开(公告)日:2017-12-28

    申请号:US15527272

    申请日:2015-11-18

    CPC classification number: H04N7/147 H04S7/30 H04S2420/01 H04S2420/11

    Abstract: Example embodiments disclosed herein relate to spatial congruency adjustment. A method for adjusting spatial congruency in a video conference is disclosed. The method in unwarping a visual scene captured by a video endpoint device into at least one rectilinear scene, the video endpoint device being configured to capture the visual scene in an omnidirectional manner, detecting spatial congruency between the at least one rectilinear scene and an auditory scene captured by an audio endpoint device that is positioned in relation to the video endpoint device. The spatial congruency being a degree of alignment between the auditory scene and the at least one rectilinear scene and in response to the detected spatial congruency being below the threshold, adjusting the spatial congruency. Corresponding system and computer program products are also disclosed.

    METHODS AND DEVICES FOR ENCODING AND/OR DECODING SPATIAL BACKGROUND NOISE WITHIN A MULTI-CHANNEL INPUT SIGNAL

    公开(公告)号:US20230215445A1

    公开(公告)日:2023-07-06

    申请号:US18000862

    申请日:2021-06-10

    CPC classification number: G10L19/008 G10L25/78 G10L21/0224

    Abstract: The present document describes a method (600) for encoding a multi-channel input signal (101) which comprises N different channels. The method (600) comprises, for a current frame of a sequence of frames, determining (601) whether the current frame is an active frame or an inactive frame using a signal and/or a voice activity detector, and determining (602) a downmix signal (103) based on the multi-channel input signal (101), wherein the downmix signal (103) comprises N channels or less. In addition, the method (600) comprises determining (603) upmixing metadata (105) comprising a set of parameters for generating, based on the downmix signal (103), a reconstructed multi-channel signal (111) comprising N channels, wherein the upmixing metadata (105) is determined in dependence of whether the current frame is an active frame or an inactive frame. The method (600) further comprises encoding (604) the upmixing metadata (105) into a bitstream.

    SCALABLE VOICE SCENE MEDIA SERVER

    公开(公告)号:US20220197592A1

    公开(公告)日:2022-06-23

    申请号:US17601199

    申请日:2020-04-03

    Abstract: A communication system, method, and computer-readable medium therefor comprise a media server configured to receive a plurality of audio streams from a corresponding plurality of client devices, the media server including circuitry configured to rank the plurality of audio streams based on a predetermined metric, group a first portion of the plurality of audio streams into a first set, the first portion of the plurality of audio streams being the N highest-ranked audio streams, group a second portion of the plurality of audio streams into a second set, the second portion of the plurality of audio streams being the M lowest-ranked audio streams, forward respective audio streams of the first set to a receiver device, and discard respective audio streams of the second set, wherein N and M are independent integers.

    PLACEMENT OF TALKERS IN 2D OR 3D CONFERENCE SCENE
    4.
    发明申请
    PLACEMENT OF TALKERS IN 2D OR 3D CONFERENCE SCENE 有权
    二维或三维会议场景中的演员放置

    公开(公告)号:US20150296086A1

    公开(公告)日:2015-10-15

    申请号:US14384780

    申请日:2013-03-21

    CPC classification number: H04M3/568 H04S5/00 H04S2400/11

    Abstract: The present document relates to setting up and managing two-dimensional or three-dimensional scenes for audio conferences. A conference controller (111, 175) configured to place an upstream audio signal (123, 173) associated with a conference participant within a 2D or 3D conference scene to be rendered to a listener (211) is described. An X-point conference scene with X different spatial talker locations (212) is set up within the conference scene, wherein the X talker locations (212) are positioned within a cone around a midline (215) in front of a head of the listener (211). A generatrix (216) of the cone and the midline (215) form an angle which is smaller than or equal to a pre-determined maximum cone angle. The upstream audio signal (123, 173) is assigned to one of the talker locations (212) and metadata identifying the assigned talker location (212) are generated, thus enabling a spatialized audio signal.

    Abstract translation: 本文件涉及设置和管理用于音频会议的二维或三维场景。 被配置为将与会议参与者相关联的上游音频信号(123,173)放置在要呈现给收听者(211)的2D或3D会议场景内的会议控制器(111,175)。 在会议场景内设置具有X个不同空间讲话者位置(212)的X点会议场景,其中X个讲话者位置(212)位于听众头部前方的中线(215)周围的锥体内 (211)。 锥体和中线(215)的母线(216)形成小于或等于预定最大锥角的角度。 上游音频信号(123,173)被分配给一个讲话者位置(212),并且生成了识别所分配的讲话者位置(212)的元数据,从而实现空间化的音频信号。

    METHODS AND DEVICES FOR ENCODING AND/OR DECODING IMMERSIVE AUDIO SIGNALS

    公开(公告)号:US20240005933A1

    公开(公告)日:2024-01-04

    申请号:US18349427

    申请日:2023-07-10

    CPC classification number: G10L19/167 G10L19/008 G10L19/18

    Abstract: The present document describes a method (700) for encoding a multi-channel input signal (201). The method (700) comprises determining (701) a plurality of downmix channel signals (203) from the multi-channel input signal (201) and performing (702) energy compaction of the plurality of downmix channel signals (203) to provide a plurality of compacted channel signals (404). Furthermore, the method (700) comprises determining (703) joint coding metadata (205) based on the plurality of compacted channel signals (404) and based on the multi-channel input signal (201), wherein the joint coding metadata (205) is such that it allows upmixing of the plurality of compacted channel signals (404) to an approximation of the multi-channel input signal (201). In addition, the method (700) comprises encoding (704) the plurality of compacted channel signals (404) and the joint coding metadata (205).

    SELECTIVE FORWARD ERROR CORRECTION FOR SPATIAL AUDIO CODECS

    公开(公告)号:US20190237086A1

    公开(公告)日:2019-08-01

    申请号:US16228690

    申请日:2018-12-20

    Abstract: Systems and methods for providing forward error correction for a multi-channel audio signal are described. Blocks of an audio stream are buffered into a frame. A transformation can be applied that compacts the energy of each block into a plurality of transformed channels. The energy compaction transform may compact the most energy of a block into the first transformed channel and to compact decreasing amounts of energy into each subsequent transformed channel. The transformed frame may be encoded using any suitable codec and transmitted in a packet over a network. Improved forward error correction may be provided by attaching a low bit rate encoding of the first transformed channel to a subsequent packet. To reconstruct a lost packet, the low bit rate encoding of the first channel for the lost packet may be combined with a packet loss concealment version of the other channels, constructed from a previously-received packet.

Patent Agency Ranking