摘要:
Disclosed are apparatus and methods that employ a modified version of a computational model of the human peripheral and central auditory system, and that provide for automatic pattern recognition using category dependent feature selection. The validity of the output of the model is examined by deriving feature vectors from the dimension expanded cortical response of the central auditory system for use in a conventional phoneme recognition task. In addition, the cortical response may be a place-coded data set where sounds are categorized according to the regions containing their most distinguishing features. This provides for a novel category-dependent feature selection apparatus and methods in which this mechanism may be utilized to better simulate robust human pattern (speech) recognition.
摘要:
A key-phrase detection and verification method that can be advantageously used to realize understanding of flexible (i.e., unconstrained) speech. A "multiple pass" procedure is applied to a spoken utterance comprising a sequence of words (i.e., a "sentence"). First, a plurality of key-phrases are detected (i.e., recognized) based on a set of phrase sub-grammars which may, for example, be specific to the state of the dialogue. These key-phrases are then verified by assigning confidence measures thereto and comparing these confidence measures to a threshold, resulting in a set of verified key-phrase candidates. Next, the verified key-phrase candidates are connected into sentence hypotheses based upon the confidence measures and predetermined (e.g., task-specific) semantic information. And, finally, one or more of these sentence hypotheses are verified to produce a verified sentence hypothesis and, from that, a resultant understanding of the spoken utterance.
摘要:
A signal bias removal (SBR) method based on the maximum likelihood estimation of the bias for minimizing undesirable effects in speech recognition systems is described. The technique is readily applicable in various architectures including discrete (vector-quantization based), semicontinuous and continuous-density Hidden Markov Model (HMM) systems. For example, the SBR method can be integrated into a discrete density HMM and applied to telephone speech recognition where the contamination due to extraneous signal components is unknown. To enable real-time implementation, a sequential method for the estimation of the bias (SSBR) is disclosed.
摘要:
A system pattern-based speech recognition, e.g., a hidden Markov model (HMM) based speech recognizer using Viterbi scoring. The principle of minimum recognition error rate is applied by the present invention using discriminative training. Various issues related to the special structure of HMMs are presented. Parameter update expressions for HMMs are provided.
摘要:
Voice signals are transmitted over a voiceband telephone channel with a high degree of security and good voice quality by applying to the transmission channel a first signal which includes digital information derived from the vocal tract response of the signal and a second signal which includes continuous information derived from the excitation component of the voice signal.
摘要:
The present invention provides a speech recognizer that creates and updates the equalization vector as input speech is provided to the recognizer. The present invention includes a speech analyzer which transforms an input speech signal into a series of feature vectors or observation sequence. Each feature vector is then provided to a speech recognizer which modifies the feature vector by subtracting a previously determined equalization vector therefrom. The recognizer then performs segmentation and matches the modified feature vector to a stored model vector which is defined as the segmentation vector. The recognizer then, from time to time, determines a new equalization vector, the new equalization vector being defined based on the difference between one or more input feature vectors and their respective segmentation vectors. The new equalization vector may then be used either for performing another segmentation iteration on the same observation sequence or for performing segmentation on subsequent feature vectors.
摘要:
A facility is provided for allowing a caller to place a telephone call by merely uttering a label identifying a desired called destination and to charge the telephone call to a particular billing account by merely uttering a label identifying that account. Alternatively, the caller may place the call by dialing or uttering the telephone number of the called destination or by entering a speed dial code associated with that telephone number. The facility includes a speaker verification system which employs cohort normalized scoring. Cohort normalized scoring provides a dynamic threshold for the verification process making the process more robust to variation in training and verification utterences. Such variation may be caused by, e.g., changes in communication channel characteristics or speaker loudness level.
摘要:
A content interpolating web proxy server is configured in a computer network for processing retrieved web content so as to place it in a format suitable for presentation on a particular client device such as, e.g., a computer, personal digital assistant (PDA), wireless telephone or voice browser-equipped device. The server processes a client request generated by a client device to determine a particular client type associated with the client device, retrieves web content identified in the client request, retrieves one or more augmentation files associated with the web content and the particular client type, and alters the retrieved web content in accordance with the one or more augmentation files. The altered web content is then delivered to the client device. The one or more augmentation files may be co-located with the web content at a site remote from the proxy server, such that the content owner need not own, maintain or otherwise control the proxy server.
摘要:
A repetitive transmission technique with time diversity which provides improved signal-to-noise ratio (SNR) in the presence of packet loss. Time shifts are introduced between N versions of a particular block of information to be transmitted, and the time-shifted versions are encoded in a set of N encoders and transmitted as N packets. The time shift introduced between a given pair of the N versions corresponds to approximately 1/N of the time duration of a particular one of the versions. The SNR of a composite reconstructed signal generated from the N packets with the introduced time shift in a receiver of the system is approximately the same as would be obtained using a set of N independent encoders to generate the plurality of packets without the introduced time shifts. The gain in the SNR of the composite reconstructed signal attributable to the introduction of the time shifts is 10 log10N′, where N′=1, . . . N is the total number of the N packets actually received at the system receiver. A further improvement in SNR performance may be obtained by introducing quantization error compensation, in which quantization error from the encoding of a given one of the versions is successively combined with subsequent versions prior to encoding of those versions.
摘要:
Disclosed are systems, methods and articles of manufacture for performing high resolution N-best string hypothesization during speech recognition. A received input signal, representing a speech utterance, is processed utilizing a plurality of recognition models to generate one or more string hypotheses of the received input signal. The plurality of recognition models preferably include one or more inter-word context dependent models and one or more language models. A forward partial path map is produced according to the allophonic specifications of at least one of the inter-word context dependent models and the language models. The forward partial path map is traversed in the backward direction as a function of the allophonic specifications to generate the one or more string hypotheses. One or more of the recognition models may represent one phone words.