Abstract:
A vehicle includes an interface device, an in-vehicle control unit, a functional unit, and a processing circuitry. The interface device receives a spoken command to identify an in-cabin vehicle zone of two or more in-cabin vehicle zones of the vehicle, and receives background audio data concurrently with a portion of the spoken command. The in-cabin vehicle control unit separates the background audio data from the spoken command, and selects which in-cabin vehicle zone of the two or more in-cabin vehicle zones is identified by the spoken command. The functional unit controls a function within the vehicle. The processing circuitry stores, to a command buffer, data processed from the received spoken command, and controls, based on the data processed from the received spoken command, the functional unit using audio input received from the selected in-cabin vehicle zone.
Abstract:
A method for speech modeling by an electronic device is described. The method includes obtaining a real-time noise reference based on a noisy speech signal. The method also includes obtaining a real-time noise dictionary based on the real-time noise reference. The method further includes obtaining a first speech dictionary and a second speech dictionary. The method additionally includes reducing residual noise based on the real-time noise dictionary and the first speech dictionary to produce a residual noise-suppressed speech signal at a first modeling stage. The method also includes generating a reconstructed speech signal based on the residual noise-suppressed speech signal and the second speech dictionary at a second modeling stage.
Abstract:
A multichannel acoustic system (MAS) comprises an arrangement of microphones and loudspeakers and a multichannel acoustic processor (MAP) to together enhance conversational speech between two or more persons in a shared acoustic space such as an automobile. The enhancements are achieved by receiving sound signals substantially originating from relatively near sound sources; filtering the sound signals to cancel at least one echo signal detected for at least one microphone from among the plurality of microphones; filtering the sound signals received by the plurality of microphones to cancel at least one feedback signal detected for at least one microphone from among the plurality of microphones; and reproducing the filtered sound signals for each microphone from among the plurality of microphones on a subset of loudspeakers corresponding that are relatively far from the source microphone.
Abstract:
A method for multi-channel echo cancellation and noise suppression is described. One of multiple echo estimates is selected for non-linear echo cancellation. Echo notch masking is performed on a noise-suppressed signal based on an echo direction of arrival (DOA) to produce an echo-suppressed signal. Non-linear echo cancellation is performed on the echo-suppressed signal based, at least in part, on the selected echo estimate.
Abstract:
An apparatus includes multiple microphones to generate audio signals based on sound of a far-field acoustic environment. The apparatus also includes a signal processing system to process the audio signals to generate at least one processed audio signal. The signal processing system is configured to update one or more processing parameters while operating in a first operational mode and is configured to use a static version of the one or more processing parameters while operating in the second operational mode. The apparatus further includes a keyword detection system to perform keyword detection based on the at least one processed audio signal to determine whether the sound includes an utterance corresponding to a keyword and, based on a result of the keyword detection, to send a control signal to the signal processing system to change an operational mode of the signal processing system.
Abstract:
A headset device includes a first earpiece configured to receive a reference sound and to generate a first reference audio signal based on the reference sound. The headset device further includes a second earpiece configured to receive the reference sound and to generate a second reference audio signal based on the reference sound. The headset device further includes a controller coupled to the first earpiece and to the second earpiece. The controller is configured to generate a first signal and a second signal based on a phase relationship between the first reference audio signal and the second reference audio signal. The controller is further configured to output the first signal to the first earpiece and output the second signal to the second earpiece.
Abstract:
Disclosed is an application interface that takes into account the user's gaze direction relative to who is speaking in an interactive multi-participant environment where audio-based contextual information and/or visual-based semantic information is being presented. Among these various implementations, two different types of microphone array devices (MADs) may be used. The first type of MAD is a steerable microphone array (a.k.a. a steerable array) which is worn by a user in a known orientation with regard to the user's eyes, and wherein multiple users may each wear a steerable array. The second type of MAD is a fixed-location microphone array (a.k.a. a fixed array) which is placed in the same acoustic space as the users (one or more of which are using steerable arrays).
Abstract:
A system which tracks a social interaction between a plurality of participants, includes a fixed beamformer that is adapted to output a first spatially filtered output and configured to receive a plurality of second spatially filtered outputs from a plurality of steerable beamformers. Each steerable beamformer outputs a respective one of the second spatially filtered outputs associated with a different one of the participants. The system also includes a processor capable of determining a similarity between the first spatially filtered output and each of the second spatially filtered outputs. The processor determines the social interaction between the participants based on the similarity between the first spatially filtered output and each of the second spatially filtered outputs.
Abstract:
A crosstalk cancelation technique reduces feedback in a shared acoustic space by canceling out some or all parts of sound signals that would otherwise be produced by a loudspeaker to only be captured by a microphone that, recursively, would cause these sounds signals to be reproduced again on the loudspeaker as feedback. Crosstalk cancelation can be used in a multichannel acoustic system (MAS) comprising an arrangement of microphones, loudspeakers, and a processor to together enhance conversational speech between in a shared acoustic space. To achieve crosstalk cancelation, a processor analyzes the inputs of each microphone, compares it to the output of far loudspeaker(s) relative to each such microphone, and cancels out any portion of a sound signal received by the microphone that matches signals that were just produced by the far loudspeaker(s) and sending only the remaining sound signal (if any) to such far loudspeakers.
Abstract:
Techniques for processing directionally-encoded audio to account for spatial characteristics of a listener playback environment are disclosed. The directionally-encoded audio data includes spatial information indicative of one or more directions of sound sources in an audio scene. The audio data is modified based on input data identifying the spatial characteristics of the playback environment. The spatial characteristics may correspond to actual loudspeaker locations in the playback environment. The directionally-encoded audio may also be processed to permit focusing/defocusing on sound sources or particular directions in an audio scene. The disclosed techniques may allow a recorded audio scene to be more accurately reproduced at playback time, regardless of the output loudspeaker setup. Another advantage is that a user may dynamically configure audio data so that it better conforms to the user's particular loudspeaker layouts and/or the user's desired focus on particular subjects or areas in an audio scene.