摘要:
A network entity, method and computer program product are provided for effectuating a conference session. The method may include receiving a plurality of signals representative of voice communication of the participants. In this regard, the signals may be received from a plurality of terminals of a respective plurality of participants at one of the locations, each of at least some of the terminals otherwise being configured for voice communication independent of at least some of the other terminals. The method of this aspect also includes classifying speech activity of the conference session according to a speech pause, or one or more actively-speaking participants, during the conference session. The signals of the respective participants may then be mixed into a at least one mixed signal for output to one or more other participants at one or more other locations, the signals being mixed based upon classification of the speech activity.
摘要:
A network entity, method and computer program product are provided for effectuating a conference session. The method may include receiving a plurality of signals representative of voice communication of the participants. In this regard, the signals may be received from a plurality of terminals of a respective plurality of participants at one of the locations, each of at least some of the terminals otherwise being configured for voice communication independent of at least some of the other terminals. The method of this aspect also includes classifying speech activity of the conference session according to a speech pause, or one or more actively-speaking participants, during the conference session. The signals of the respective participants may then be mixed into a at least one mixed signal for output to one or more other participants at one or more other locations, the signals being mixed based upon classification of the speech activity.
摘要:
Provided are multichannel architectures, systems, methods, and computer program products for distributed teleconferencing using one or more master devices and/or a centralized conferencing switch. Multichannels enhance functionality of a master device in distributed teleconferencing and allow for compatibility with 3D capable teleconferencing. Multichannel distributed teleconferencing involves multichannel, monophonic, and/or a fixed number of uplink and downlink channels. A multichannel distributed teleconferencing system may perform active talker detection of near-end participants and communicate an ID signal on an uplink channel identifying the active near-end participants. A multichannel distributed teleconferencing system may also receive an ID signal on a downlink channel identifying the active far-end participants. A multichannel distributed teleconferencing system may perform various uplink and downlink processing. Uplink processing may involve multimixing and spatialization. Multimixing may be used to separate speech signals of near-end participants. Spatialization, also used in downlink processing, introduces spatial separation of active participants.
摘要:
Techniques for applying artificial bandwidth expansion to a multichannel signal are described. Aspects of a system for applying artificial bandwidth expansion to a multichannel signal include an estimation component for receiving a multichannel signal and estimating delay and energy level differences for each channel of the multichannel signal. An artificial bandwidth expansion component artificially expands the bandwidth of each of the channels of the multichannel signal separately. Each one of a plurality of adjustment components are configured to modify a different one of the artificial bandwidth expanded channels of the multichannel signal based upon the estimated delay and energy level differences. The multichannel signal may be a binaural speech signal.
摘要:
A method including: obtaining phase information dependent upon a time-varying phase difference between captured audio channels; obtaining sampling information relating to time-varying spatial sampling of the captured audio channels; and processing the phase information and the sampling information to determine audio control information for controlling spatial rendering of the captured audio channels.
摘要:
A method and related apparatus comprising: buffering an encoded audio input signal comprising at least one combined signal of a plurality of audio channels and one or more corresponding sets of side information parameters describing a multi-channel sound image; changing the length of at least one audio frame of said combined signal by adding or removing a segment of said combined signal; modifying said one or more sets of side information parameters with a change corresponding to the change in the length of said at least one audio frame of said combined signal; and transferring said at least one audio frame of said combined signal with a changed length and said modified one or more sets of side information parameters to a further processing unit.
摘要:
Techniques for positioning participants of a conference call in a three dimensional (3D) audio space are described. Aspects of a system for positioning include a client component that extracts speech frames of a currently speaking participant of a conference call from a transmission signal. A speech analysis component determines a voice fingerprint of the currently speaking participant based upon any of a number of factors, such as a pitch value of the participant. A control component determines a category position of the currently speaking participant in a three dimensional audio space based upon the voice fingerprint. An audio engine outputs audio signals of the speech frame based upon the determined category position of the currently speaking participant. The category position of one or more participants may be changed as new participants are added to the conference call.
摘要:
A method for distinguishing speakers in a conference call of a plurality of participants, in which method speech frames of the conference call are received in a receiving unit, which speech frames include encoded speech parameters. At least one parameter of the received speech frames is examined in an audio codec of the receiving unit, and the speech frames are classified to belong to one of the participants, the classification being carried out according to differences in the examined at least one speech parameter. These functions may be carried out in a speaker identification block, which is applicable in various positions of a teleconferencing processing chain. Finally, a spatialization effect is created in a terminal reproducing the audio signal according to notified differences by placing the participants at distinct positions in an acoustical space of the audio signal.
摘要:
In accordance with an example embodiment of the present invention, an apparatus is disclosed. The apparatus includes a camera system and an optimization system. The optimization system is configured to communicate with the camera system. At least one microphone is connected to the optimization system. The optimization system is configured to adjust a beamform of the at least one microphone based, at least in part, on camera focus information of the camera system.
摘要:
An apparatus for utilizing spatial information for audio signal enhancement in a multiple distributed network may include a processor. The processor may be configured to receive representations of a plurality of audio signals including at least one audio signal received at a first device and at least a second audio signal received at a second device. The first and second devices may be part of a common acoustic space network and may be arbitrarily positioned with respect to each other. The processor may be further configured to combine the first and second audio signals to form a composite audio signal, and provide for communication of the composite audio signal along with spatial information relating to a sound source of at least one of the plurality of audio signals to another device.