Abstract:
Systems and methods are described for storing and reusing previously generated/calculated acoustic environment data. By reusing acoustic environment data, the systems and methods described herein may avoid the increased overhead in generating/calculating acoustic environment data for a location when this data has already been generated and is likely accurate. In particular, the time and complexity involved in determining reverberation/echo levels, noise levels, and noise types may be avoided when this information is available in storage. This previously stored acoustic environment data may not be limited to data generated/calculated by the same audio device. Instead, in some embodiments an audio device may access a centralized repository to leverage acoustic environment data generated/calculated by other audio devices.
Abstract:
An electronic device for buzz reduction is described. The electronic device is to be used with a speaker driver that is built onto an enclosure and is to be driven by an audio signal which could cause the enclosure to produce buzz. The electronic device includes a filter that is to attenuate a frequency component of the audio signal before driving the speaker driver. The electronic device also includes a controller that is to configure the filter to attenuate the frequency component of the audio signal in response to determining that strength of the audio signal at the frequency component exceeds a threshold. Other embodiments are also described and claimed.
Abstract:
A dictation computer that includes a fan speed regulator is described. The fan speed regulator monitors a speech recognition unit to determine when the speech recognition unit is activated. Upon detection that the speech recognition unit is activated, the fan speed regulator ducks the speed of a cooling fan embedded within the dictation computer to an optimized speed of rotation over a delay time interval. The fan speed regulator may include components to adapt the optimized speed and delay time to the characteristics of the dictation computer and the user. Other embodiments are also described.
Abstract:
An audio system and method of using the audio system to augment spatial audio rendition is described. The audio system can include a device to receive user inputs designating positions on an augmented reality view of a listening environment. Sound source icons can be presented in the augmented reality view at the designated positions. The sound source icons can visually represent sound sources at locations in the listening environment that correspond to, but are different than, the positions. One or more processors of the audio system can apply head-related transfer functions, which correspond to the locations in the listening environment, to audio input signals to generate binaural audio signals. The audio system can include a headset that uses the binaural audio signals to render spatialized audio localizing sounds to the locations in the listening environment. Other aspects are also described and claimed.
Abstract:
A method for controlling a speech enhancement process in a far-end device, while engaged in a voice or video telephony communication session over a communication link with a near-end device. A near-end user speech signal is produced, using a microphone to pick up speech of a near-end user, and is analyzed by an automatic speech recognizer (ASR) without being triggered by an ASR trigger phrase or button. The recognized words are compared to a library of phrases to select a matching phrase, where each phrase is associated with a message that represents an audio signal processing operation. The message associated with the matching phrase is sent to the far-end device, which is used to configure the far-end device to adjust the speech enhancement process that produces the far-end speech signal. Other embodiments are also described.
Abstract:
An automatic speech recognition (ASR) triggering system, and a method of providing an ASR trigger signal, is described. The ASR triggering system can include a microphone to generate an acoustic signal representing an acoustic vibration and an accelerometer worn in an ear canal of a user to generate a non-acoustic signal representing a bone conduction vibration. A processor of the ASR triggering system can receive an acoustic trigger signal based on the acoustic signal and a non-acoustic trigger signal based on the non-acoustic signal, and combine the trigger signals to gate an ASR trigger signal. For example, the ASR trigger signal may be provided to an ASR server only when the trigger signals are simultaneously asserted. Other embodiments are also described and claimed.
Abstract:
A device and a corresponding method are provided to tune parameters of an echo control process without re-initializing the echo control process and without interrupting a playback process. A state of the device and environment around the device is computed during use of the device given information from sensors. Such sensors can give information on the position of the device, the orientation of the device, the presence of a proximate object, or handling of the device resulting in occlusion of microphones and loudspeakers, among other things. The computed state of the device is mapped to an associated device state code from among a plurality of device state codes. The parameters of the echo control process are tuned either according to the associated device state code, or a change in such a code, during use of the device.
Abstract:
A method for automatically producing a video and audio mix at a first portable electronic device. The method receives a request to capture video and audio, performs a network discovery process to find a second portable electronic device, and sends a message to the second device indicating when to start recording audio for a double system recording session. The method initiates the recording session, such that both devices record concurrently. In response to the first device stopping the recording of audio and sound, signaling the second device to stop recording for the identified recording session. In response to the first device receiving a first audio track from the second device that contains an audio signal recorded during the recording session, automatically generating a mix of video and audio, such that one of the audio signals from the first and second tracks is ducked relative to the other.
Abstract:
A speech recognition system for resolving impaired utterances can have a speech recognition engine configured to receive a plurality of representations of an utterance and concurrently to determine a plurality of highest-likelihood transcription candidates corresponding to each respective representation of the utterance. The recognition system can also have a selector configured to determine a most-likely accurate transcription from among the transcription candidates. As but one example, the plurality of representations of the utterance can be acquired by a microphone array, and beamforming techniques can generate independent streams of the utterance across various look directions using output from the microphone array.
Abstract:
An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command.