摘要:
Aspects of the invention provide methods, computer-readable media, and apparatuses for re-panning multiple audio signals by applying spatial cue coding. Sound sources in each of the signals may be re-panned before the signals are mixed to a combined signal. Processing may be applied in a conference bridge that receives two omni-directionally recorded audio signals. The conference bridge subsequently re-pans one of the signals to the listeners left side and the signal to the right side. The source image mapping and panning may further be adaptively based on the content and use case. Mapping may be done by manipulating the directional parameters prior to directional decoding or before directional mixing. Directional information that is associated with an audio input signal is remapped order to compress input source positions into virtual source positions. The virtual sources may be placed with respect to actual loudspeakers using binaural cue panning.
摘要:
An apparatus for utilizing spatial information for audio signal enhancement in a multiple distributed network may include a processor. The processor may be configured to receive representations of a plurality of audio signals including at least one audio signal received at a first device and at least a second audio signal received at a second device. The first and second devices may be part of a common acoustic space network and may be arbitrarily positioned with respect to each other. The processor may be further configured to combine the first and second audio signals to form a composite audio signal, and provide for communication of the composite audio signal along with spatial information relating to a sound source of at least one of the plurality of audio signals to another device.
摘要:
An apparatus for utilizing spatial information for audio signal enhancement in a multiple distributed network may include a processor. The processor may be configured to receive representations of a plurality of audio signals including at least one audio signal received at a first device and at least a second audio signal received at a second device. The first and second devices may be part of a common acoustic space network and may be arbitrarily positioned with respect to each other. The processor may be further configured to combine the first and second audio signals to form a composite audio signal, and provide for communication of the composite audio signal along with spatial information relating to a sound source of at least one of the plurality of audio signals to another device.
摘要:
A method including: obtaining phase information dependent upon a time-varying phase difference between captured audio channels; obtaining sampling information relating to time-varying spatial sampling of the captured audio channels; and processing the phase information and the sampling information to determine audio control information for controlling spatial rendering of the captured audio channels.
摘要:
A method and related apparatus comprising: buffering an encoded audio input signal comprising at least one combined signal of a plurality of audio channels and one or more corresponding sets of side information parameters describing a multi-channel sound image; changing the length of at least one audio frame of said combined signal by adding or removing a segment of said combined signal; modifying said one or more sets of side information parameters with a change corresponding to the change in the length of said at least one audio frame of said combined signal; and transferring said at least one audio frame of said combined signal with a changed length and said modified one or more sets of side information parameters to a further processing unit.
摘要:
A network entity, method and computer program product are provided for effectuating a conference session. The method may include receiving a plurality of signals representative of voice communication of the participants. In this regard, the signals may be received from a plurality of terminals of a respective plurality of participants at one of the locations, each of at least some of the terminals otherwise being configured for voice communication independent of at least some of the other terminals. The method of this aspect also includes classifying speech activity of the conference session according to a speech pause, or one or more actively-speaking participants, during the conference session. The signals of the respective participants may then be mixed into a at least one mixed signal for output to one or more other participants at one or more other locations, the signals being mixed based upon classification of the speech activity.
摘要:
Techniques for positioning participants of a conference call in a three dimensional (3D) audio space are described. Aspects of a system for positioning include a client component that extracts speech frames of a currently speaking participant of a conference call from a transmission signal. A speech analysis component determines a voice fingerprint of the currently speaking participant based upon any of a number of factors, such as a pitch value of the participant. A control component determines a category position of the currently speaking participant in a three dimensional audio space based upon the voice fingerprint. An audio engine outputs audio signals of the speech frame based upon the determined category position of the currently speaking participant. The category position of one or more participants may be changed as new participants are added to the conference call.
摘要:
A method for distinguishing speakers in a conference call of a plurality of participants, in which method speech frames of the conference call are received in a receiving unit, which speech frames include encoded speech parameters. At least one parameter of the received speech frames is examined in an audio codec of the receiving unit, and the speech frames are classified to belong to one of the participants, the classification being carried out according to differences in the examined at least one speech parameter. These functions may be carried out in a speaker identification block, which is applicable in various positions of a teleconferencing processing chain. Finally, a spatialization effect is created in a terminal reproducing the audio signal according to notified differences by placing the participants at distinct positions in an acoustical space of the audio signal.
摘要:
In accordance with an example embodiment of the present invention, an apparatus is disclosed. The apparatus includes a camera system and an optimization system. The optimization system is configured to communicate with the camera system. At least one microphone is connected to the optimization system. The optimization system is configured to adjust a beamform of the at least one microphone based, at least in part, on camera focus information of the camera system.
摘要:
An apparatus comprising at least one processor and at least one memory including computer program code the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform processing at least one control parameter dependent on at least one sensor input parameter, processing at least one audio signal dependent on the processed at least one control parameter, and outputting the processed at least one audio signal.