摘要:
The present invention discloses a voice recognition method, a voice recognition device, and an electronic device. In this method, first determining is performed by using a sample environment corresponding to a detection voice and a previous environment type, so as to output a corresponding voice correction instruction to a voice engine; then, a to-be-recognized voice is input to the voice engine and a noise type detection engine at the same time, and the voice engine corrects the to-be-recognized voice by using the voice correction instruction, so that quality of an original voice is not impaired by noise processing, and a corresponding initial recognition result is output; the noise type detection engine determines a current environment type by using the to-be-recognized voice and a voice training sample under a different environment; finally, confidence of the initial recognition result is adjusted by using the current environment type, so as to ensure that a recognition effect of a finally output voice recognition result can provide good user experience for a user under a current environment.
摘要:
The present specification relates to a speech recognition apparatus and method capable of accurately recognizing the speech of a user in an easy and convenient manner without the user having to operate a speech recognition start button or the like. The speech recognition apparatus according to embodiments of the present specification comprises: a camera for capturing a user image; a microphone; a control unit for detecting a preset user gesture from the user image, and, if a nonlexical word is detected from the speech signal which is input through the microphone from the point in time at which the user gesture was detected, determining the speed signal detected after the detected nonlexical word as an effective speech signal; and a speech recognition unit for recognizing the effective speech signal.
摘要:
A computer-implemented method is described for front end speech processing for automatic speech recognition. A sequence of speech features which characterize an unknown speech input provided on an audio input channel and associated meta-data which characterize the audio input channel are received. The speech features are transformed with a computer process that uses a trained mapping function controlled by the meta-data, and automatic speech recognition is performed of the transformed speech features.
摘要:
In a mobile device, a bone conduction or vibration sensor is used to detect the user's speech and the resulting output is used as the source for a low power Voice Trigger (VT) circuit that can activate the Automatic Speech Recognition (ASR) of the host device. This invention is applicable to mobile devices such as wearable computers with head mounted display, mobile phones and wireless headsets and headphones which use speech recognition for the entering of input commands and control. The speech sensor can be a bone conduction microphone used to detect sound vibrations in the skull, or a vibration sensor, used to detect sound pressure vibrations from the user's speech. This VT circuit can be independent of any audio components of the host device and can therefore be designed to consume ultra-low power. Hence, this VT circuit can be active when the host device is in a sleeping state and can be used to wake the host device on detection of speech from the user. This VT circuit will be resistant to outside noise and react solely to the user's voice.
摘要:
A voice processing apparatus includes: a voice receptor configured to collect a user voice, convert the user voice into a first voice signal, and to output the first voice signal; an audio processor configured to process a sound output through a speaker to output an audio signal; a memory unit configured to store the first voice signal output from the voice receptor and the audio signal output from the audio processor; an echo canceller configured to remove an echo from the first voice signal to generate a second voice signal; and a first controller configured to control the echo canceller to generate the second voice signal based on the first voice signal and the audio signal stored in the memory unit.
摘要:
Methods employ sensors in portable devices (e.g., smartphones) both to sense content information (e.g., audio and imagery) and context information. Device processing is desirably dependent on both. For example, some embodiments activate certain processor intensive operations (e.g., content recognition) based on classification of sensed content and context. The context can control the location where information produced from such operations is stored, or control an alert signal indicating, e.g., that sensed speech is being transcribed. Some arrangements post sensor data collected by one device to a cloud repository, for access and processing by other devices. Multiple devices can collaborate in collecting and processing data, to exploit advantages each may have (e.g., in location, processing ability, social network resources, etc.). A great many other features and arrangements are also detailed.