Abstract:
Some implementations involve analyzing audio packets received during a time interval that corresponds with a conversation analysis segment to determine network jitter dynamics data and conversational interactivity data. The network jitter dynamics data may provide an indication of jitter in a network that relays the audio data packets. The conversational interactivity data may provide an indication of interactivity between participants of a conversation represented by the audio data. A jitter buffer size may be controlled according to the network jitter dynamics data and the conversational interactivity data. The time interval may include a plurality of talkspurts.
Abstract:
In an audio conferencing environment, including multiple users participating by means of a series of associated audio input devices for the provision of audio input, and a series of audio output devices for the output of audio output streams to the multiple users, with the audio input and output devices being interconnected to a mixing control server for the control and mixing of the audio inputs from each audio input devices to present a series of audio streams to the audio output devices, a method of reducing the effects of cross talk pickup of at least a first audio conversation by multiple audio input devices, the method including the steps of: (a) monitoring the series of audio input devices for the presence of a duplicate audio conversation input from at least two input audio sources in an audio output stream; and (b) where a duplicate audio conversation input is detected, suppressing the presence of the duplicate audio conversation input in the audio output stream.
Abstract:
Example embodiments disclosed herein relate to spatial congruency adjustment. A method for adjusting spatial congruency in a video conference is disclosed. The method includes detecting spatial congruency between a visual scene captured by a video endpoint device and an auditory scene captured by an audio endpoint device that is positioned in relation to the video endpoint device, the spatial congruency being a degree of alignment between the auditory scene and the visual scene, comparing the detected spatial congruency with a predefined threshold and in response to the detected spatial congruency being below the threshold, adjusting the spatial congruency. Corresponding system and computer program products are also disclosed.
Abstract:
A system according to the invention is highly scalable and includes at least one server cluster comprising a plurality of voice conferencing servers. The at least one cluster further includes one or more performance management systems for handling various tasks such as licensing tasks, real-time and historical performance monitoring, cascading, availability, failover, load balancing, and/or performance optimization.
Abstract:
Some implementations involve analyzing audio packets received during a time interval that corresponds with a conversation analysis segment to determine network jitter dynamics data and conversational interactivity data. The network jitter dynamics data may provide an indication of jitter in a network that relays the audio data packets. The conversational interactivity data may provide an indication of interactivity between participants of a conversation represented by the audio data. A jitter buffer size may be controlled according to the network jitter dynamics data and the conversational interactivity data. The time interval may include a plurality of talkspurts.
Abstract:
Some implementations involve analyzing audio packets received during a time interval that corresponds with a conversation analysis segment to determine network jitter dynamics data and conversational interactivity data. The network jitter dynamics data may provide an indication of jitter in a network that relays the audio data packets. The conversational interactivity data may provide an indication of interactivity between participants of a conversation represented by the audio data. A jitter buffer size may be controlled according to the network jitter dynamics data and the conversational interactivity data. The time interval may include a plurality of talkspurts.
Abstract:
An audio apparatus is configured to switch, when there exists a first audio interface between the audio apparatus and a computer apparatus, to using a second audio interface between the audio apparatus and the computer apparatus, the second audio interface being different from the first audio interface. The switching comprises: receiving, via the first audio interface, combined audio data and non-audio data, the non-audio data comprising a request to switch to using the second audio interface; obtaining the request from the data; and, in response to obtaining the request, transmitting to the computer apparatus a confirmation of switching to using the second audio interface. The audio apparatus and the computer apparatus are described and claimed.
Abstract:
An audio spatial rendering apparatus and method are disclosed. In one embodiment, The audio spatial rendering apparatus includes a rendering unit for spatially rendering an audio stream so that the reproduced far-end sound is perceived by a listener as originating from at least one virtual spatial position, a real position obtaining unit for obtaining a real spatial position of a real sound source, a comparator for comparing the real spatial position with the at least one virtual spatial position; and an adjusting unit for, where the real spatial position is within a predetermined range around at least one virtual spatial position, or vice versa, adjusting the parameters of the rendering unit so that the at least one virtual spatial position is changed.
Abstract:
The present document relates to setting up and managing two-dimensional or three-dimensional scenes for audio conferences. A conference controller (111, 175) configured to place an upstream audio signal (123, 173) associated with a conference participant within a 2D or 3D conference scene to be rendered to a listener (211) is described. An X-point conference scene with X different spatial talker locations (212) is set up within the conference scene, wherein the X talker locations (212) are positioned within a cone around a midline (215) in front of a head of the listener (211). A generatrix (216) of the cone and the midline (215) form an angle which is smaller than or equal to a pre-determined maximum cone angle. The upstream audio signal (123, 173) is assigned to one of the talker locations (212) and metadata identifying the assigned talker location (212) are generated, thus enabling a spatialized audio signal.