Abstract:
Use of a scalable audio codec to implement distributed mixing and/or sender bit rate regulation in a multipoint conference is disclosed. The scalable audio codec allows the audio signal from each endpoint to be split into one or more frequency bands and for the transform coefficients within such bands to be prioritized such that usable audio may be decoded from a subset of the entire signal. The subset may be created by omitting certain frequency bands and/or by omitting certain coefficients within the frequency bands. By providing various rules for each endpoint in a conference, the endpoint can determine the importance of its signal to the conference and can select an appropriate bit rate, thereby conserving bandwidth and/or processing power throughout the conference.
Abstract:
Audio from a near-end that has been acoustically coupled at the far-end and returned to the near-end unit is detected and suppressed at the near-end of a conference. First and second energy outputs for separate bands are determined for the near-end audio being sent from the near-end unit and for the far-end audio being received at the near-end unit. The near-end unit compares the first and second energy outputs to one another for each of the bands over a time delay range and detects the return of the sent near-end audio in the received far-end audio based on the comparison. The comparison can use a cross-correlation to find an estimated time delay used for further analysis of the near and far-end energies. The near-end unit suppresses any detected return by muting or reducing what far-end audio is output at its loudspeaker.
Abstract:
In accordance with the present invention, a system and method for computing a location of an acoustic source is disclosed. The method includes steps of processing a plurality of microphone signals in frequency space to search a plurality of candidate acoustic source locations for a maximum normalized signal energy. The method uses phase-delay look-up tables to efficiently determine phase delays for a given frequency bin number k based upon a candidate source location and a microphone location, thereby reducing system memory requirements. Furthermore, the method compares a maximum signal energy for each frequency bin number k with a threshold energy Et(k) to improve accuracy in locating the acoustic source.
Abstract:
A system, such as a video conferencing system, is provided which includes an image pickup device, an audio pickup device, and an audio source locator. The image pickup device generates image signals representative of an image, while the audio pickup device generates audio signals representative of sound from an audio source, such as speaking person. The audio source locator processes the image signals and audio signals to determine a direction of the audio source relative to a reference point. The system can further determine a location of the audio source relative to the reference point. The reference point can be a camera. The system can use the direction or location information to frame a proper camera shot which would include the audio source.
Abstract:
An end fire microphone array having reduced analog-to-digital converter requirements is disclosed. Analog filters are used to band-limit at least two secondary microphone elements which are spaced from a primary microphone element a distance respective of their band limited outputs. The band-limited secondary microphone outputs are combined by an analog summer and the primary microphone and combined secondary microphone signals are digitized by an analog-to-digital converter. A signal processor performs a super-directive analysis of the primary microphone signal and the combined secondary microphone signals. A steerable superdirective microphone array is disclosed. A plurality of microphones are arranged in a ring. The microphone outputs are digitized, split into frequency bands, and weighted sums are formed for each of a plurality of directions. A steering control circuit evaluates the relative energy of each directional signal in each band and selects a microphone direction for further processing and output.
Abstract:
An improved echo cancelling device for reducing the effects of acoustic feedback between a loudspeaker and microphone in a communication system. The device includes an adjustable filter for receiving a loudspeaker signal and generating in response thereto an echo estimation signal. The device subtracts the echo estimation signal from the microphone signal to produce an echo corrected microphone signal. During periods of time when the microphone signal is substantially derived from acoustic feedback between the loudspeaker and the microphone, the device adjusts transfer characteristics of the filter to reduce the echo corrected microphone signal. The improvement includes estimating from the adjusted transfer characteristics an energy transfer ratio representative of the ratio of the energy of the microphone signal to the energy of the loudspeaker signal. The device compares the microphone signal to the energy transfer ratio multiplied by the loudspeaker signal to identify periods of time when the microphone signal is substantially derived from acoustic feedback between the loudspeaker and the microphone.
Abstract:
Audio from a near-end that has been acoustically coupled at the far-end and returned to the near-end unit is detected and suppressed at the near-end of a conference. First and second energy outputs for separate bands are determined for the near-end audio being sent from the near-end unit and for the far-end audio being received at the near-end unit. The near-end unit compares the first and second energy outputs to one another for each of the bands over a time delay range and detects the return of the sent near-end audio in the received far-end audio based on the comparison. The comparison can use a cross-correlation to find an estimated time delay used for further analysis of the near and far-end energies. The near-end unit suppresses any detected return by muting or reducing what far-end audio is output at its loudspeaker.
Abstract:
A videoconferencing system has a videoconferencing unit that use portable devices as peripherals for the system. The portable devices obtain near-end audio and send the audio to the videoconferencing unit via a wireless connection. In turn, the videoconferencing unit sends the near-end audio from the loudest portable device along with near-end video to the far-end. The portable devices can control the videoconferencing unit and can initially establish the videoconference by connecting with the far-end and then transferring operations to the videoconferencing unit. To deal with acoustic coupling between the unit's loudspeaker and the portable device's microphone, the unit uses an echo canceller that is compensated for differences in the clocks used in the A/D and D/A converters of the loudspeaker and microphone.
Abstract:
Stereo to mono voice conferencing conversion is performed during a voice conference. Conferencing equipment receives audio for right and left channels and filters each of the channels into a plurality of bands. For each band of each channel, the equipment determines an energy level and compares each energy level for each band of the right channel to each energy level for each corresponding band of the left channel. Based on the comparison, the equipment determines which channel has more audio resulting from speech. Based on the determination, the equipment adjusts delivery of the audio from the right and left channels to a mono channel for transmission to endpoints only capable of mono audio in the voice conference.
Abstract:
An arbitrarily positioned cluster of three microphones can be used for stereo input of a videoconferencing system. To produce stereo input, right and left weightings for signal inputs from each of the microphones are determined. The right and left weightings correspond to preferred directive patterns for stereo input of the system. The determined right weightings are applied to the signal inputs from each of the microphones, and the weighted inputs are summed to product the right input. The same is done for the left input using the determined left weightings. The three microphones are preferably first-order, cardioid microphone capsules spaced close together in an audio unit, where each faces radially outward at 120-degrees. The orientation of the arbitrarily positioned cluster relative to the system can be determined by directly detecting the orientation or by using stored arrangements.