摘要:
A signal bias removal (SBR) method based on the maximum likelihood estimation of the bias for minimizing undesirable effects in speech recognition systems is described. The technique is readily applicable in various architectures including discrete (vector-quantization based), semicontinuous and continuous-density Hidden Markov Model (HMM) systems. For example, the SBR method can be integrated into a discrete density HMM and applied to telephone speech recognition where the contamination due to extraneous signal components is unknown. To enable real-time implementation, a sequential method for the estimation of the bias (SSBR) is disclosed.
摘要:
In a speech recognition system, a recognition processor receives an unknown utterance signal as input. The recognition processor in response to the unknown utterance signal input accesses a recognition database and scores the utterance signal against recognition models in the recognition database to classify the unknown utterance and to generate a hypothesis speech signal. A verification processor receives the hypothesis speech signal as input to be verified. The verification processor accesses a verification database to test the hypothesis speech signal against verification models reflecting a preselected type of training stored in the verification database. Based on the verification test, the verification processor generates a confidence measure signal. The confidence measure signal can be compared against a verification threshold to determine the accuracy of the recognition decision made by the recognition processor.
摘要:
The present invention provides a speech recognizer that creates and updates the equalization vector as input speech is provided to the recognizer. The present invention includes a speech analyzer which transforms an input speech signal into a series of feature vectors or observation sequence. Each feature vector is then provided to a speech recognizer which modifies the feature vector by subtracting a previously determined equalization vector therefrom. The recognizer then performs segmentation and matches the modified feature vector to a stored model vector which is defined as the segmentation vector. The recognizer then, from time to time, determines a new equalization vector, the new equalization vector being defined based on the difference between one or more input feature vectors and their respective segmentation vectors. The new equalization vector may then be used either for performing another segmentation iteration on the same observation sequence or for performing segmentation on subsequent feature vectors.
摘要:
A facility is provided for allowing a caller to place a telephone call by merely uttering a label identifying a desired called destination and to charge the telephone call to a particular billing account by merely uttering a label identifying that account. Alternatively, the caller may place the call by dialing or uttering the telephone number of the called destination or by entering a speed dial code associated with that telephone number. The facility includes a speaker verification system which employs cohort normalized scoring. Cohort normalized scoring provides a dynamic threshold for the verification process making the process more robust to variation in training and verification utterences. Such variation may be caused by, e.g., changes in communication channel characteristics or speaker loudness level.
摘要:
Speech dereverberation is achieved by accepting an observed signal for initialization (1000) and performing likelihood maximization (2000) which includes Fourier Transforms (4000).
摘要:
The present invention is a desktop speakerphone having a base-station and a detachable microphone pod. The base-station includes standard telephone components, as well as a wireless receiver and a housing for a detachable microphone pod. The detachable pod contains at least one microphone and a wireless transmitter. When the pod is attached to the base-station, and the conference mode of operation is activated, the pod microphone's audio signal goes directly to base-station audio circuitry via a wired connection. When the pod is detached and the conference mode activated, the pod microphone's audio signal now goes via the pod's wireless transmitter to the base-station's wireless receiver. This detached, wireless mode allows the microphone to be positioned anywhere in the room, thereby improving the quality of transmitted speech by increasing the speech-signal-to-room-noise ratio, and lessening the potential for room echo by reducing the acoustic coupling between base-station loudspeaker and pod microphone.
摘要:
A speech recognition method comprises the steps of using given speech data and the N-best algorithm to generate alternative pronunciations and then merging the obtained pronunciations into a pronunciation networks structure; using additional parameters to characterize a pronunciation network for a particular word; optimizing the parameters of the pronunciation networks using a minimum classification error criterion that maximizes a discrimination between different pronunciation networks; and adapting parameters of the pronunciation networks by, first, adjusting probabilities of the possible pronunciations that may be generated by the pronunciation network for a word claimed to be a true one and, second, to correct weights for all of the pronunciation networks by using the adjusted probabilities.
摘要:
A method for revising at least a portion of a sequence of speech data segments recognized by an automated speech recognition system. A user is prompted to vocalize the speech data segments sequentially, one speech data segment at a time. When each speech data segment is recognized it is stored as a data element and a confirmation of recognition is issued to the user. The user may then issue a verbal command to delete the last recognized data element if the confirmation indicates that a recognition error has occurred, and then repeat the last speech data element for a second recognition attempt. The user may also issue another verbal command to delete all thus-far recognized data elements in the sequence and to restart the recognition process from the beginning. If no such verbal commands are issued by the user, then the user may continue to vocalize the next sequential speech data segment.
摘要:
Speech dereverberation is achieved by accepting an observed signal for initialization (1000) and performing likelihood maximization (2000) which includes Fourier Transforms (4000).
摘要:
Disclosed are apparatus and methods that employ a modified version of a computational model of the human peripheral and central auditory system, and that provide for automatic pattern recognition using category dependent feature selection. The validity of the output of the model is examined by deriving feature vectors from the dimension expanded cortical response of the central auditory system for use in a conventional phoneme recognition task. In addition, the cortical response may be a place-coded data set where sounds are categorized according to the regions containing their most distinguishing features. This provides for a novel category-dependent feature selection apparatus and methods in which this mechanism may be utilized to better simulate robust human pattern (speech) recognition.