Abstract:
A method in a soundfield-capturing endpoint and the capturing endpoint that comprises a microphone array capturing soundfield, and an input processor pre-processing and performing auditory scene analysis to detect local sound objects and positions, de-clutter the sound objects, and integrate with auxiliary audio signals to form a de-cluttered local auditory scene that has a measure of plausibility and perceptual continuity. The input processor also codes the resulting de-cluttered auditory scene to form coded scene data comprising mono audio and additional scene data to send to others. The endpoint includes an output processor generating signals for a display unit that displays a summary of the de-cluttered local auditory scene and/or a summary of activity in the communication system from received data, the display including a shaped ribbon display element that has an extent with locations on the extent representing locations and other properties of different sound objects.
Abstract:
A conference controller (111, 175) configured to place an upstream audio signal (123, 173) associated with a conference participant and a sound signal within a 2D or 3D conference scene to be rendered to a listener (211) is described. The conference controller (111, 175) is configured to set up a X-point conference scene with X different spatial talker locations (212) within the conference scene, X being an integer, X>0; assign the upstream audio signal (123, 173) to one of the talker locations (212); place a sound signal at a spatial sound location (503) within the X-point conference scene; and generate metadata identifying the assigned talker location (212) and the spatial sound location and enabling an audio processing unit (121, 171) to generate a spatialized audio signal based on a set of downstream audio signals (124, 174) comprising the upstream audio signal (123, 173) and the sound signal.
Abstract:
An audio apparatus is configured to switch, when there exists a first audio interface between the audio apparatus and a computer apparatus, to using a second audio interface between the audio apparatus and the computer apparatus, the second audio interface being different from the first audio interface. The switching comprises: receiving, via the first audio interface, combined audio data and non-audio data, the non-audio data comprising a request to switch to using the second audio interface; obtaining the request from the data; and, in response to obtaining the request, transmitting to the computer apparatus a confirmation of switching to using the second audio interface. The audio apparatus and the computer apparatus are described and claimed.
Abstract:
A method in a soundfield-capturing endpoint and the capturing endpoint that comprises a microphone array capturing soundfield, and an input processor pre-processing and performing auditory scene analysis to detect local sound objects and positions, de-clutter the sound objects, and integrate with auxiliary audio signals to form a de-cluttered local auditory scene that has a measure of plausibility and perceptual continuity. The input processor also codes the resulting de-cluttered auditory scene to form coded scene data comprising mono audio and additional scene data to send to others. The endpoint includes an output processor generating signals for a display unit that displays a summary of the de-cluttered local auditory scene and/or a summary of activity in the communication system from received data, the display including a shaped ribbon display element that has an extent with locations on the extent representing locations and other properties of different sound objects.
Abstract:
A method of outputting audio in a teleconferencing environment includes receiving audio streams, processing the audio streams according to information regarding effective spatial positions, and outputting, by at least three speakers arranged in more than one dimension, the audio streams having been processed. The information regarding the plurality of effective spatial positions corresponds to a perceived spatial scene that extends beyond the speakers in at least two dimensions. In this manner, participants in the teleconference perceive the audio from the remote participants as originating at different positions in the teleconference room.
Abstract:
A conference controller (111, 175) configured to place an upstream audio signal (123, 173) associated with a conference participant and a sound signal within a 2D or 3D conference scene to be rendered to a listener (211) is described. The conference controller (111, 175) is configured to set up a X-point conference scene with X different spatial talker locations (212) within the conference scene, X being an integer, X>0; assign the upstream audio signal (123, 173) to one of the talker locations (212); place a sound signal at a spatial sound location (503) within the X-point conference scene; and generate metadata identifying the assigned talker location (212) and the spatial sound location and enabling an audio processing unit (121, 171) to generate a spatialized audio signal based on a set of downstream audio signals (124, 174) comprising the upstream audio signal (123, 173) and the sound signal.
Abstract:
The present document relates to methods and systems for setting up and managing two-dimensional or three-dimensional scenes for audio conferences. A conference controller (111, 175) configured to place L upstream audio signals (123, 173) within a 2D or 3D conference scene to be rendered to a listener (211) is described. The conference controller (111, 175) is configured to set up a X-point conference scene; assign L upstream audio signals (123, 173) to X talker locations (212); determine a maximum number N of downstream audio signals (124, 174) to be transmitted to the listener (211); determine N downstream audio signals (124, 174) from the L assigned upstream audio signals (123, 173); determine N updated talker locations for the N downstream audio signals (124, 174); and generate metadata identifying the updated talker locations and enabling an audio processing unit (121, 171) to generate a spatialized audio signal.
Abstract:
Some implementations involve analyzing audio packets received during a time interval that corresponds with a conversation analysis segment to determine network jitter dynamics data and conversational interactivity data. The network jitter dynamics data may provide an indication of jitter in a network that relays the audio data packets. The conversational interactivity data may provide an indication of interactivity between participants of a conversation represented by the audio data. A jitter buffer size may be controlled according to the network jitter dynamics data and the conversational interactivity data. The time interval may include a plurality of talkspurts.
Abstract:
In an audio conferencing environment, including multiple users participating by means of a series of associated audio input devices for the provision of audio input, and a series of audio output devices for the output of audio output streams to the multiple users, with the audio input and output devices being interconnected to a mixing control server for the control and mixing of the audio inputs from each audio input devices to present a series of audio streams to the audio output devices, a method of reducing the effects of cross talk pickup of at least a first audio conversation by multiple audio input devices, the method including the steps of: (a) monitoring the series of audio input devices for the presence of a duplicate audio conversation input from at least two input audio sources in an audio output stream; and (b) where a duplicate audio conversation input is detected, suppressing the presence of the duplicate audio conversation input in the audio output stream.
Abstract:
The present document relates to setting up and managing two-dimensional or three-dimensional scenes for audio conferences. A conference controller (111, 175) configured to place an upstream audio signal (123, 173) associated with a conference participant within a 2D or 3D conference scene to be rendered to a listener (211) is described. An X-point conference scene with X different spatial talker locations (212) is set up within the conference scene, wherein the X talker locations (212) are positioned within a cone around a midline (215) in front of a head of the listener (211). A generatrix (216) of the cone and the midline (215) form an angle which is smaller than or equal to a pre-determined maximum cone angle. The upstream audio signal (123, 173) is assigned to one of the talker locations (212) and metadata identifying the assigned talker location (212) are generated, thus enabling a spatialized audio signal.