-
公开(公告)号:US20170374317A1
公开(公告)日:2017-12-28
申请号:US15527272
申请日:2015-11-18
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Xuejing SUN , Michael ECKERT
CPC classification number: H04N7/147 , H04S7/30 , H04S2420/01 , H04S2420/11
Abstract: Example embodiments disclosed herein relate to spatial congruency adjustment. A method for adjusting spatial congruency in a video conference is disclosed. The method in unwarping a visual scene captured by a video endpoint device into at least one rectilinear scene, the video endpoint device being configured to capture the visual scene in an omnidirectional manner, detecting spatial congruency between the at least one rectilinear scene and an auditory scene captured by an audio endpoint device that is positioned in relation to the video endpoint device. The spatial congruency being a degree of alignment between the auditory scene and the at least one rectilinear scene and in response to the detected spatial congruency being below the threshold, adjusting the spatial congruency. Corresponding system and computer program products are also disclosed.
-
2.
公开(公告)号:US20230215445A1
公开(公告)日:2023-07-06
申请号:US18000862
申请日:2021-06-10
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Michael ECKERT , Rishabh TYAGI
IPC: G10L19/008 , G10L25/78 , G10L21/0224
CPC classification number: G10L19/008 , G10L25/78 , G10L21/0224
Abstract: The present document describes a method (600) for encoding a multi-channel input signal (101) which comprises N different channels. The method (600) comprises, for a current frame of a sequence of frames, determining (601) whether the current frame is an active frame or an inactive frame using a signal and/or a voice activity detector, and determining (602) a downmix signal (103) based on the multi-channel input signal (101), wherein the downmix signal (103) comprises N channels or less. In addition, the method (600) comprises determining (603) upmixing metadata (105) comprising a set of parameters for generating, based on the downmix signal (103), a reconstructed multi-channel signal (111) comprising N channels, wherein the upmixing metadata (105) is determined in dependence of whether the current frame is an active frame or an inactive frame. The method (600) further comprises encoding (604) the upmixing metadata (105) into a bitstream.
-
公开(公告)号:US20220197592A1
公开(公告)日:2022-06-23
申请号:US17601199
申请日:2020-04-03
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Glenn N. DICKINS , Feng DENG , Michael ECKERT , Craig JOHNSTON , Paul HOLMBERG
IPC: G06F3/16 , H04M3/56 , H04L65/403
Abstract: A communication system, method, and computer-readable medium therefor comprise a media server configured to receive a plurality of audio streams from a corresponding plurality of client devices, the media server including circuitry configured to rank the plurality of audio streams based on a predetermined metric, group a first portion of the plurality of audio streams into a first set, the first portion of the plurality of audio streams being the N highest-ranked audio streams, group a second portion of the plurality of audio streams into a second set, the second portion of the plurality of audio streams being the M lowest-ranked audio streams, forward respective audio streams of the first set to a receiver device, and discard respective audio streams of the second set, wherein N and M are independent integers.
-
公开(公告)号:US20150296086A1
公开(公告)日:2015-10-15
申请号:US14384780
申请日:2013-03-21
Applicant: DOLBY LABORATORIES LICENSING CORPORATION
Inventor: Michael ECKERT , Gary SPITTLE , Michael P. HOLLIER
CPC classification number: H04M3/568 , H04S5/00 , H04S2400/11
Abstract: The present document relates to setting up and managing two-dimensional or three-dimensional scenes for audio conferences. A conference controller (111, 175) configured to place an upstream audio signal (123, 173) associated with a conference participant within a 2D or 3D conference scene to be rendered to a listener (211) is described. An X-point conference scene with X different spatial talker locations (212) is set up within the conference scene, wherein the X talker locations (212) are positioned within a cone around a midline (215) in front of a head of the listener (211). A generatrix (216) of the cone and the midline (215) form an angle which is smaller than or equal to a pre-determined maximum cone angle. The upstream audio signal (123, 173) is assigned to one of the talker locations (212) and metadata identifying the assigned talker location (212) are generated, thus enabling a spatialized audio signal.
Abstract translation: 本文件涉及设置和管理用于音频会议的二维或三维场景。 被配置为将与会议参与者相关联的上游音频信号(123,173)放置在要呈现给收听者(211)的2D或3D会议场景内的会议控制器(111,175)。 在会议场景内设置具有X个不同空间讲话者位置(212)的X点会议场景,其中X个讲话者位置(212)位于听众头部前方的中线(215)周围的锥体内 (211)。 锥体和中线(215)的母线(216)形成小于或等于预定最大锥角的角度。 上游音频信号(123,173)被分配给一个讲话者位置(212),并且生成了识别所分配的讲话者位置(212)的元数据,从而实现空间化的音频信号。
-
公开(公告)号:US20240005933A1
公开(公告)日:2024-01-04
申请号:US18349427
申请日:2023-07-10
Inventor: David S. MCGRATH , Michael ECKERT , Heiko PURNHAGEN , Stefan BRUHN
IPC: G10L19/16 , G10L19/008 , G10L19/18
CPC classification number: G10L19/167 , G10L19/008 , G10L19/18
Abstract: The present document describes a method (700) for encoding a multi-channel input signal (201). The method (700) comprises determining (701) a plurality of downmix channel signals (203) from the multi-channel input signal (201) and performing (702) energy compaction of the plurality of downmix channel signals (203) to provide a plurality of compacted channel signals (404). Furthermore, the method (700) comprises determining (703) joint coding metadata (205) based on the plurality of compacted channel signals (404) and based on the multi-channel input signal (201), wherein the joint coding metadata (205) is such that it allows upmixing of the plurality of compacted channel signals (404) to an approximation of the multi-channel input signal (201). In addition, the method (700) comprises encoding (704) the plurality of compacted channel signals (404) and the joint coding metadata (205).
-
公开(公告)号:US20180167581A1
公开(公告)日:2018-06-14
申请号:US15838728
申请日:2017-12-12
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Erwin GOESNAR , Hannes MUESCH , David GUNAWAN , Michael ECKERT , Glenn N. DICKINS
CPC classification number: H04N7/15 , G01S3/80 , G06K9/00664 , G06K9/3275 , G06K2009/3225 , G06K2209/03 , G06T7/70 , H04L12/1827 , H04M3/56 , H04M3/567 , H04N7/142 , H04N7/147 , H04R3/12 , H04R2499/11 , H04S7/303 , H04S2400/15
Abstract: Systems and methods are described for determining orientation of an external audio device in a video conference, which may be used to provide congruent multimodal representation for a video conference. A camera of a video conferencing system may be used to detect a potential location of an external audio device within a room in which the video conferencing system is providing a video conference. Within the detected potential location, a visual pattern associated with the external audio device may be identified. Using the identified visual pattern, the video conferencing system may estimate an orientation of the external audio device, the orientation being used by the video conferencing system to provide spatial audio video congruence to a far end audience.
-
公开(公告)号:US20220375482A1
公开(公告)日:2022-11-24
申请号:US17882900
申请日:2022-08-08
Inventor: Stefan BRUHN , Michael ECKERT , Juan Felix TORRES , Stefanie BROWN , David S. MCGRATH
IPC: G10L19/008
Abstract: The disclosed embodiments enable converting audio signals captured in various formats by various capture devices into a limited number of formats that can be processed by an audio codec (e.g., an Immersive Voice and Audio Services (IVAS) codec). In an embodiment, a simplification unit of the audio device receives an audio signal captured by one or more audio capture devices coupled to the audio device. The simplification unit determines whether the audio signal is in a format that is supported/not supported by an encoding unit of the audio device. Based on the determining, the simplification unit, converts the audio signal into a format that is supported by the encoding unit. In an embodiment, if the simplification unit determines that the audio signal is in a spatial format, the simplification unit can convert the audio signal into a spatial “mezzanine” format supported by the encoding.
-
公开(公告)号:US20210272574A1
公开(公告)日:2021-09-02
申请号:US16973030
申请日:2019-10-07
Inventor: Stefan BRUHN , Michael ECKERT , Juan Felix TORRES , Stefanie BROWN , David S. MCGRATH
IPC: G10L19/008 , H04S3/00
Abstract: The disclosed embodiments enable converting audio signals captured in various formats by various capture devices into a limited number of formats that can be processed by an audio codec (e.g., an Immersive Voice and Audio Services (IVAS) codec). In an embodiment, a simplification unit of the audio device receives an audio signal captured by one or more audio capture devices coupled to the audio device. The simplification unit determines whether the audio signal is in a format that is supported/not supported by an encoding unit of the audio device. Based on the determining, the simplification unit, converts the audio signal into a format that is supported by the encoding unit. In an embodiment, if the simplification unit determines that the audio signal is in a spatial format, the simplification unit can convert the audio signal into a spatial “mezzanine” format supported by the encoding.
-
公开(公告)号:US20190342521A1
公开(公告)日:2019-11-07
申请号:US16518887
申请日:2019-07-22
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Erwin GOESNAR , Hannes MUESCH , David GUNAWAN , Michael ECKERT , Glenn N. DICKINS
Abstract: Systems and methods are described for determining orientation of an external audio device in a video conference, which may be used to provide congruent multimodal representation for a video conference. A camera of a video conferencing system may be used to detect a potential location of an external audio device within a room in which the video conferencing system is providing a video conference. Within the detected potential location, a visual pattern associated with the external audio device may be identified. Using the identified visual pattern, the video conferencing system may estimate an orientation of the external audio device, the orientation being used by the video conferencing system to provide spatial audio video congruence to a far end audience.
-
公开(公告)号:US20190237086A1
公开(公告)日:2019-08-01
申请号:US16228690
申请日:2018-12-20
Applicant: DOLBY LABORATORIES LICENSING CORPORATION
Inventor: Shen HUANG , Michael ECKERT , Glenn N. DICKINS
IPC: G10L19/005 , G10L19/008 , H04S3/00 , G10L19/02 , H04L1/00
CPC classification number: G10L19/005 , G10L19/008 , G10L19/0212 , H04L1/0011 , H04L1/0041 , H04S3/008 , H04S2400/01
Abstract: Systems and methods for providing forward error correction for a multi-channel audio signal are described. Blocks of an audio stream are buffered into a frame. A transformation can be applied that compacts the energy of each block into a plurality of transformed channels. The energy compaction transform may compact the most energy of a block into the first transformed channel and to compact decreasing amounts of energy into each subsequent transformed channel. The transformed frame may be encoded using any suitable codec and transmitted in a packet over a network. Improved forward error correction may be provided by attaching a low bit rate encoding of the first transformed channel to a subsequent packet. To reconstruct a lost packet, the low bit rate encoding of the first channel for the lost packet may be combined with a packet loss concealment version of the other channels, constructed from a previously-received packet.
-
-
-
-
-
-
-
-
-