Abstract:
Improved systems and methods for psychoacoustic adaptive notch filtering are provided. By accounting for psychoacoustic properties of an audio signal as well as finer characteristics of noise which may be present in the audio signal (e.g., the shape of the spectral density of the noise), more effective strategies for dealing with undesirable components of the audio signal may be realized.
Abstract:
System of improving sound quality includes loudspeaker, microphone, accelerometer, acoustic-echo-cancellers (AEC), and double-talk detector (DTD). Loudspeaker outputs loudspeaker signal including downlink audio signal from far-end speaker. Microphone generates microphone uplink signal and receives at least one of: near-end speaker, ambient noise, and loudspeaker signals. Accelerometer generates accelerometer-uplink signal and receives at least one of: near-end speaker, ambient noise, and loudspeaker signals. First AEC receives downlink audio, microphone-uplink and double talk control signals, and generates AEC-microphone linear echo estimate and corrected AEC-microphone uplink signal. Second AEC receives downlink audio, accelerometer uplink and double talk control signals, and generates AEC-accelerometer linear echo estimate and corrected AEC-accelerometer uplink signal. DTD receives downlink audio signal, uplink signals, corrected uplink signals, linear echo estimates, and generates double-talk control signal. Uplink audio signal including at least one of corrected microphone-uplink signal and corrected accelerometer-uplink signal is generated. Other embodiments are described.
Abstract:
A method performed a local device that is communicatively coupled with several remote devices, the method includes: receiving, from each remote device with which the local device is engaged in a communication session, an input audio stream; receiving, for each remote device, a set parameters; determining, for each input audio stream, whether the input audio stream is to be 1) rendered individually or 2) rendered as a mix of input audio streams based on the set of parameters; for each input audio stream that is determined to be rendered individually, spatially rendering the input audio stream as an individual virtual sound source that contains only that input audio stream; and for input audio streams that are determined to be rendered as the mix of input audio streams, spatially rendering the mix of input audio streams as a single virtual sound source that contains the mix of input audio streams.
Abstract:
A method performed by a processor of an audio source device. The method drives an audio output device of the audio source device to output a sound with an audio output signal. The method obtains a microphone signal from a microphone of the audio source device, the microphone signal capturing the outputted sound. The method determines whether the audio output device is a headset or a loudspeaker based on the microphone signal and configures an acoustic dosimetry process based on the determination.
Abstract:
A first device obtains, from the array, several audio signals and processes the audio signals to produce a speech signal and one or more ambient signals. The first device processes the ambient signals to produce a sound-object sonic descriptor that has metadata describing a sound object within an acoustic environment. The first device transmits, over a communication data link, the speech signal and the descriptor to a second electronic device that is configured to spatially reproduce the sound object using the descriptor mixed with the speech signal, to produce several mixed signals to drive several speakers.
Abstract:
A first device obtains, from the array, several audio signals and processes the audio signals to produce a speech signal and one or more ambient signals. The first device processes the ambient signals to produce a sound-object sonic descriptor that has metadata describing a sound object within an acoustic environment. The first device transmits, over a communication data link, the speech signal and the descriptor to a second electronic device that is configured to spatially reproduce the sound object using the descriptor mixed with the speech signal, to produce several mixed signals to drive several speakers.
Abstract:
An audio system has a housing in which are integrated a number of microphones. A programmed processor accesses the microphone signals and produces a number of acoustic pick up beams based groups of microphones, an estimation of voice activity and an estimation of noise characteristics on each beam. Two or more beams including a voice beam that is used to pick up a desired voice and a noise beam that is used to provide information to estimate ambient noise are adaptively selected from among the plurality of beams, based on thresholds for voice separation and thresholds for noise-matching. Other embodiments are also described and claimed.
Abstract:
A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.
Abstract:
Embodiments of the present disclosure can provide systems, methods, and computer-readable medium for adjusting audio and/or video information of a video clip based at least in part on facial feature and/or voice feature characteristics extracted from hardware components. For example, in response to detecting a request to generate an avatar video clip of a virtual avatar, a video signal associated with a face in a field of view of a camera and an audio signal may be captured. Voice feature characteristics and facial feature characteristics may be extracted from the audio signal and the video signal, respectively. In some examples, in response to detecting a request to preview the avatar video clip, an adjusted audio signal may be generated based at least in part on the facial feature characteristics and the voice feature characteristics, and a preview of the video clip of the virtual avatar using the adjusted audio signal may be displayed.
Abstract:
Embodiments of the present disclosure can provide systems, methods, and computer-readable medium for providing audio and/or video effects based at least in part on facial features and/or voice feature characteristics of the user. For example, video and/or an audio signal of the user may be recorded by a device. Voice audio features and facial feature characteristics may be extracted from the voice audio signal and the video, respectively. The facial features of the user may be used to modify features of a virtual avatar to emulate the facial feature characteristics of the user. The extracted voice audio features may modified to generate an adjusted audio signal or an audio signal may be composed from the voice audio features. The adjusted/composed audio signal may simulate the voice of the virtual avatar. A preview of the modified video/audio may be provided at the user's device.