摘要:
A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals. The synthesized training vector signals are transformed reference feature vector signals representing the values of features of one or more utterances of one or more speakers in a reference set of speakers. The measured training feature vector signals represent the values of features of one or more utterances of a new speaker/user not in the reference set.
摘要:
Modeling a word is done by concatenating a series of elemental models to form a word model. At least one elemental model in the series is a composite elemental model formed by combining the starting states of at least first and second primitive elemental models. Each primitive elemental model represents a speech component. The primitive elemental models are combined by a weighted combination of their parameters in proportion to the values of the weighting factors. To tailor the word model to closely represent variations in the pronunciation of the word, the word is uttered a plurality of times by a plurality of different speakers. Constructing word models from composite elemental models, and constructing composite elemental models from primitive elemental models enables word models to represent many variations in the pronunciation of a word. Providing a relatively small set of primitive elemental models for a relatively large vocabulary of words enables models to be trained to the voice of a new speaker by having the new speaker utter only a small subset of the words in the vocabulary.
摘要:
In speech recognition and speech coding, the values of at least two features of an utterance are measured during a series of time intervals to produce a series of feature vector signals. A plurality of single-dimension prototype vector signals having only one parameter value are stored. At least two single-dimension prototype vector signals having parameter values representing first feature values, and at least two other single-dimension prototype vector signals have parameter values representing second feature values. A plurality of compound-dimension prototype vector signals have unique identification values and comprise one first-dimension and one second-dimension prototype vector signal. At least two compound-dimension prototype vector signals comprise the same first-dimension prototype vector signal. The feature values of each feature vector signal are compared to the parameter values of the compound-dimension prototype vector signals to obtain prototype match scores. The identification values of the compound-dimension prototype vector signals having the best prototype match scores for the feature vectors signals are output as a sequence of coded representations of an utterance to be recognized. A match score, comprising an estimate of the closeness of a match between a speech unit and the sequence of coded representations of the utterance, is generated for each of a plurality of speech units. At least one speech subunit, of one or more best candidate speech units having the best match scores, is displayed.
摘要:
An apparatus for generating a set of acoustic prototype signals for encoding speech includes a memory for storing a training script model comprising a series of word-segment models. Each word-segment model comprises a series of elementary models. An acoustic measure is provided for measuring the value of at least one feature of an utterance of the training script during each of a series of time intervals to produce a series of feature vector signals representing the feature values of the utterance. An acoustic matcher is provided for estimating at least one path through the training script model which would produce the entire series of measured feature vector signals. From the estimated path, the elementary model in the training script model which would produce each feature vector signal is estimated. The apparatus further comprises a cluster processor for clustering the feature vector signals into a plurality of clusters. Each feature vector signal in a cluster corresponds to a single elementary model in a single location in a single word-segment model. Each cluster signal has a cluster value equal to an average of the feature values of all feature vectors in the signal. Finally, the apparatus includes a memory for storing a plurality of prototype vector signals. Each prototype vector signal corresponds to an elementary model, has an identifier, and comprises at least two partition values. The partition values are equal to combinations of the cluster values of one or more cluster signals corresponding to the elementary model.
摘要:
Methods and apparatus are disclosed for recognizing handwritten characters in response to an input signal from a handwriting transducer. A feature extraction and reduction procedure is disclosed that relies on static or shape information, wherein the temporal order in which points are captured by an electronic tablet may be disregarded. A method of the invention generates and processes the tablet data with three independent sets of feature vectors which encode the shape information of the input character information. These feature vectors include horizontal (x-axis) and vertical (y-axis) slices of a bit-mapped image of the input character data, and an additional feature vector to encode an absolute y-axis displacement from a baseline of the bit-mapped image. It is shown that the recognition errors that result from the spatial or static processing are quite different from those resulting from temporal or dynamic processing. Furthermore, it is shown that these differences complement one another. As a result, a combination of these two sources of feature vector information provides a substantial reduction in an overall recognition error rate. Methods to combine probability scores from dynamic and the static character models are also disclosed.
摘要:
Methods and apparatus are disclosed for recognizing handwritten characters in response to an input signal from a handwriting transducer. A feature extraction and reduction procedure is disclosed that relies on static or shape information, wherein the temporal order in which points are captured by an electronic tablet may be disregarded. A method of the invention generates and processes the tablet data with three independent sets of feature vectors which encode the shape information of the input character information. These feature vectors include horizontal (x-axis) and vertical (y-axis) slices of a bit-mapped image of the input character data, and an additional feature vector to encode an absolute y-axis displacement from a baseline of the bit-mapped image. It is shown that the recognition errors that result from the spatial or static processing are quite different from those resulting from temporal or dynamic processing. Furthermore, it is shown that these differences complement one another. As a result, a combination of these two sources of feature vector information provides a substantial reduction in an overall recognition error rate. Methods to combine probability scores from dynamic and the static character models are also disclosed.
摘要:
Methods and apparatus are disclosed for recognizing handwritten characters in response to an input signal from a handwriting transducer. A feature extraction and reduction procedure is disclosed that relies on static or shape information, wherein the temporal order in which points are captured by an electronic tablet may be disregarded. A method of the invention generates and processes the tablet data with three independent sets of feature vectors which encode the shape information of the input character information. These feature vectors include horizontal (x-axis) and vertical (y-axis) slices of a bit-mapped image of the input character data, and an additional feature vector to encode an absolute y-axis displacement from a baseline of the bit-mapped image. It is shown that the recognition errors that result from the spatial or static processing are quite different from those resulting from temporal or dynamic processing. Furthermore, it is shown that these differences complement one another. As a result, a combination of these two sources of feature vector information provides a substantial reduction in an overall recognition error rate. Methods to combine probability scores from dynamic and the static character models are also disclosed.
摘要:
Methods and apparatus are disclosed for recognizing handwritten characters in response to an input signal from a handwriting transducer. A feature extraction and reduction procedure is disclosed that relies on static or shape information, wherein the temporal order in which points are captured by an electronic tablet may be disregarded. A method of the invention generates and processes the tablet data with three independent sets of feature vectors which encode the shape information of the input character information. These feature vectors include horizontal (x-axis) and vertical (y-axis) slices of a bit-mapped image of the input character data, and an additional feature vector to encode an absolute y-axis displacement from a baseline of the bit-mapped image. It is shown that the recognition errors that result from the spatial or static processing are quite different from those resulting from temporal or dynamic processing. Furthermore, it is shown that these differences complement one another. As a result, a combination of these two sources of feature vector information provides a substantial reduction in an overall recognition error rate. Methods to combine probability scores from dynamic and the static character models are also disclosed.
摘要:
Methods and apparatus are disclosed for recognizing handwritten characters in response to an input signal from a handwriting transducer. A feature extraction and reduction procedure is disclosed that relies on static or shape information, wherein the temporal order in which points are captured by an electronic tablet may be disregarded. A method of the invention generates and processes the tablet data with three independent sets of feature vectors which encode the shape information of the input character information. These feature vectors include horizontal (x-axis) and vertical (y-axis) slices of a bit-mapped image of the input character data, and an additional feature vector to encode an absolute y-axis displacement from a baseline of the bit-mapped image. It is shown that the recognition errors that result from the spatial or static processing are quite different from those resulting from temporal or dynamic processing. Furthermore, it is shown that these differences complement one another. As a result, a combination of these two sources of feature vector information provides a substantial reduction in an overall recognition error rate. Methods to combine probability scores from dynamic and the static character models are also disclosed.
摘要:
A computer-based system and method for recognizing handwriting. The present invention includes a pre-processor, a front end, and a modeling component. The present invention operates as follows. First, the present invention identifies the lexemes for all characters of interest. Second, the present invention performs a training phase in order to generate a hidden Markov model for each of the lexemes. Third, the present invention performs a decoding phase to recognize handwritten text. Hidden Markov models for lexemes are produced during the training phase. The present invention performs the decoding phase as follows. The present invention receives test characters to be decoded (that is, to be recognized). The present invention generates sequences of feature vectors for the test characters by mapping in chirographic space. For each of the test characters, the present invention computes probabilities that the test character can be generated by the hidden Markov models. The present invention decodes the test character as the recognized character associated with the hidden Markov model having the greatest probability.