摘要:
The disclosed architecture employs signal processing techniques to provide audio perception only, or audio perception that matches the visual perception. This also provides spatial audio reproduction for multiparty teleconferencing such that the teleconferencing participants perceive themselves as if they were sitting in the same room. The solution is based on the premise that people perceive sounds as a reconstructed wavefront, and hence, the wavefronts are used to provide the spatial perceptual cues. The differences between the spatial perceptual cues derived from the reconstructed wavefront of sound waves and the ideal wavefront of sound waves form an objective metric for spatial perceptual quality, and provide the means of evaluating the overall system performance. Additionally, compensation filters are employed to improve the spatial perceptual quality of stereophonic systems by optimizing the objective metrics.
摘要:
The disclosed architecture employs signal processing techniques to provide audio perception only, or audio perception that matches the visual perception. This also provides spatial audio reproduction for multiparty teleconferencing such that the teleconferencing participants perceive themselves as if they were sitting in the same room. The solution is based on the premise that people perceive sounds as a reconstructed wavefront, and hence, the wavefronts are used to provide the spatial perceptual cues. The differences between the spatial perceptual cues derived from the reconstructed wavefront of sound waves and the ideal wavefront of sound waves form an objective metric for spatial perceptual quality, and provide the means of evaluating the overall system performance. Additionally, compensation filters are employed to improve the spatial perceptual quality of stereophonic systems by optimizing the objective metrics.
摘要:
Speech quality estimation technique embodiments are described which generally involve estimating the human speech quality of an audio frame in a single-channel audio signal. A representation of a harmonic component of the frame is synthesized and used to compute a non-harmonic component of the frame. The synthesized harmonic component representation and the non-harmonic component are then used to compute a harmonic to non-harmonic ratio (HnHR). This HnHR is indicative of the quality of a user's speech and is designated as an estimate of the speech quality of the frame. In one implementation, the HnHR is used to establish a minimum speech quality threshold below which the quality of the user's speech is considered unacceptable. Feedback to the user is then provided based on whether the HnHR falls below the threshold.
摘要:
A spatial element is added to communications, including over telephone conference calls heard through headphones or a stereo speaker setup. Functions are created to modify signals from different callers to create the illusion that the callers are speaking from different parts of the room.
摘要:
Speech quality estimation technique embodiments are described which generally involve estimating the human speech quality of an audio frame in a single-channel audio signal. A representation of a harmonic component of the frame is synthesized and used to compute a non-harmonic component of the frame. The synthesized harmonic component representation and the non-harmonic component are then used to compute a harmonic to non-harmonic ratio (HnHR). This HnHR is indicative of the quality of a user's speech and is designated as an estimate of the speech quality of the frame. In one implementation, the HnHR is used to establish a minimum speech quality threshold below which the quality of the user's speech is considered unacceptable. Feedback to the user is then provided based on whether the HnHR falls below the threshold.
摘要:
Stereophonic teleconferencing system embodiments are described which advantageously employ a microphone array at a remote conference site having multiple conferencees to produce a separate output channel from the each microphone in the array. Audio data streams each representing one of the audio output channels from the microphone array are then sent to a local conference site where a local conferencee is in attendance. The voices of the aforementioned remote conferencees are spatialized within a sound-field of the local site using multiple loudspeakers. Generally, this involves receiving the monophonic audio data streams from the remote site, and processing them to generate an audio signal for each loudspeaker. Each of the generated audio signals is then played through its respective loudspeaker to produce a spatial audio sound-field which is audibly perceived by the local conferencee as having the voice of each of the remote conferencees coming from a different location.
摘要:
A spatial element is added to communications, including over telephone conference calls heard through headphones or a stereo speaker setup. Functions are created to modify signals from different callers to create the illusion that the callers are speaking from different parts of the room.
摘要:
A communication end device of a two-way communication system is shown. The device includes an audio signal capture device for capturing local audio to be transmitted to another end device, an audio signal rendering device for playing remote audio received from the other end device, and buffers for buffering the captured and rendered audio signals. The device also includes an audio echo canceller operating to predict echo from the rendered audio signal at a calculated relative offset in the captured audio signal based on an adaptive filter, and subtract the predicted echo from the signal transmitted to the other end device The calculated relative offset that is used by the audio echo canceller for a current signal sample is adjusted if a difference between it and an adjusted relative offset of a preceding sample exceeds a threshold value.
摘要:
An audio encoder performs adaptive entropy encoding of audio data. For example, an audio encoder switches between variable dimension vector Huffman coding of direct levels of quantized audio data and run-level coding of run lengths and levels of quantized audio data. The encoder can use, for example, context-based arithmetic coding for coding run lengths and levels. The encoder can determine when to switch between coding modes by counting consecutive coefficients having a predominant value (e.g., zero). An audio decoder performs corresponding adaptive entropy decoding.
摘要:
A transformation method provides a multi-dimensional affine transformation for representing motion between corresponding image components of successive video image frames. The multi-dimensional affine transformations can represent complex motion that includes any or all of translation, rotation, magnification, and shear. The transformation method of this invention includes determining motion transformations between corresponding pixels in the image components of the first and second video image frames. From the motion transformations between corresponding pixels, multi-dimensional affine motion transformations between the corresponding image components are determined. This transformation method increases the accuracy with which complex motion is represented and results in fewer compression or encoding errors in comparison to conventional methods, thereby increasing compression efficiency.