Abstract:
A method, a system, and a computer program product for preventing initiation of a voice recognition session. The method includes monitoring at least one audio output channel for at least one audio trigger phrase that initiates a voice recognition session. The method further includes in response to detecting the at least one audio trigger phrase on the at least one audio output channel, setting a logic state of at least one output trigger detector of the at least one audio output channel to a first state. The method further includes gating a logic state of at least one input trigger detector of at least one audio input channel to the first state for a time period and preventing initiation of a voice recognition session by the at least one audio trigger phrase on the at least one audio input channel while the logic state is the first state.
Abstract:
A method, a system, and a computer program product for detecting an audio trigger phrase at a particular audio input channel and initiating a voice recognition session. The method includes capturing audio content by a plurality of microphone pairs of an audio capturing device, wherein each microphone pair of the plurality of microphone pairs is associated with an audio input channel of a plurality of audio input channels of the audio capturing device. The method further includes simultaneously monitoring, by a processor of the audio capturing device, audio content on each of the audio input channels. The method further includes: independently detecting, by the processor, an audio trigger phrase on at least one audio input channel of the plurality of audio input channels; and in response to detecting the audio trigger phrase, commencing a voice recognition session using the at least one audio input channel as an audio source.
Abstract:
Three-dimensional (3D) audio content creation and rendering systems and methodologies are presented here. A disclosed method of processing 3D audio assigns audio source objects to 3D video objects, links audio tracks to assigned audio source objects, and performs wave field synthesis on the linked audio tracks to generate 3D audio data representing a 3D spatial sound field. A disclosed method of processing 3D audio during playback of 3D video content obtains 3D audio data and 3D video data for a frame of 3D video content, applies device-specific parameters to the 3D audio data to obtain transformed 3D audio data scaled to a presentation device, and processes the transformed 3D audio data to render audio information for an array of speakers associated with the presentation device.
Abstract:
An electronic apparatus is provided that has a rear-side and a front-side, a first microphone that generates a first signal, and a second microphone that generates a second signal. An automated balance controller generates a balancing signal based on a proximity sensor signal. A processor processes the first and second signals to generate at least one beamformed audio signal, where an audio level difference between a front-side gain and a rear-side gain of the beamformed audio signal is controlled during processing based on the balancing signal.
Abstract:
A method and apparatus for determining a motion environment profile to adapt voice recognition processing includes a device receiving an acoustic signal including a speech signal, which is provided to a voice recognition module. The method also includes determining a motion profile for the device, determining a temperature profile for the device, and determining a noise profile for the acoustic signal. The method further includes determining, from the motion, temperature, and noise profiles, a motion environment profile for the device and adapting voice recognition processing for the speech signal based on the motion environment profile.
Abstract:
An electronic device digitally combines a single voice input with each of a series of noise samples. Each noise sample is taken from a different audio environment (e.g., street noise, babble, interior car noise). The voice input/noise sample combinations are used to train a voice recognition model database without the user having to repeat the voice input in each of the different environments. In one variation, the electronic device transmits the user's voice input to a server that maintains and trains the voice recognition model database.
Abstract:
A method and apparatus for voice recognition performed in a voice recognition block comprising a plurality of voice recognition stages. The method includes receiving a first plurality of voice inputs, corresponding to a first phrase, into a first voice recognition stage of the plurality of voice recognition stages, wherein multiple ones of the voice recognition stages includes a plurality of voice recognition modules and multiples ones of the voice recognition stages perform a different type of voice recognition processing, wherein the first voice recognition stage processes the first plurality of voice inputs to generate a first plurality of outputs for receipt by a subsequent voice recognition stage. The method further includes, receiving by each subsequent voice recognition stage a plurality of outputs from a preceding voice recognition stage, wherein a plurality of final outputs is generated by a final voice recognition stage from which to approximate the first phrase.
Abstract:
A method and apparatus for adapting acoustic processing in a communication device, and capturing at least one acoustic signal using acoustic hardware of the communication device, characterizing an acoustic environment external to the communication device using the at least one captured acoustic signal, adapting acoustic processing within the communication device based on the characterized acoustic environment.
Abstract:
A method for controlling the orientation of a virtual microphone, which is carried out on an electronic device, includes combining and processing signals from a microphone array to create a virtual microphone; receiving data from a sensor of the electronic device; determining, based on the received data, a mode in which the electronic device is being used; and based on the determined mode, directionally orienting the virtual microphone. Possible use modes include a) a stowed use mode, in which the criterion is the electronic device being substantially enclosed by surrounding material; b) a handset (alternately, private) use mode, in which the criterion is the electronic device being held proximate to a user; and c) a handheld (alternately, speakerphone) use mode, in which the criterion is the electronic device being held away from a user.
Abstract:
A method and apparatus provide two-dimensional to three-dimensional image conversion. The apparatus can include an input configured to receive a first image. The apparatus can include a controller configured to segment the first image into a plurality of regions, configured to perform a Fast Fourier Transform on at least one of the regions, and configured to determine a relative horizontal displacement distance between a first frame and a second frame of at least one region based on performing the Fast Fourier Transform.